| In the era of big data,deep neural networks have been widely used in voice recognition,image recognition,natural language processing,and other fields because of their ability to extract valuable information from massive amounts of data.In order to improve the performance of deep neural networks when applied,researchers usually need to use a huge amount of data to train the model parameters.However,when training models in the traditional architecture with separate computation cores and storage,the increase in power consumption and data transfer latency due to the movement of a huge amount of data,seriously limits the energy efficiency of the system.Fortunately,the development of NAND flash technology and hardware accelerators making computational storage devices are emerging as one of the solutions that can overcome these problems.As a result,both industry and academia have conducted research on deep neural network training based on computational storage devices.However,we find that all previous work has used homogeneous storage and has not fully considered the need for flash arrays when performing deep neural network training.The impact of the poor performance and endurance of mainstream multi-bit flash memory on computational storage devices is also ignored.To address this problem,this thesis explores the utilization of hybrid flash memory to optimize the performance and lifetime of computational storage when used for deep neural network training.In this thesis,we propose a novel SLC-TLC flash memory called Cop-Flash(Co-Partitioning Flash,Cop-Flash),which utilizes two different hybrid storage partitioning methods to partition the flash array into three different properties of flash cells to match the access characteristics of deep neural network training workload.Meanwhile,three key space management strategies are proposed in Cop-Flash: 1)a lifetime-and-space-aware data allocation is proposed to solve the problem that the hotness of accelerator I/O requests is difficult to identify to make full use of the hybrid flash heterogeneity while minimizing the data migration when performing garbage collection.In addition,the strategy makes full use of the parallelism within the flash array to achieve parallel read/program and erase operations,providing the possibility to recover the failed space in a timely manner;2)We propose a lifetime aware garbage collection to solve the problem of I/O performance degradation after the SLC flash capacity is exhausted,ensuring that there are always free SLC blocks available for writing during the training process;3)An erase-aware dual-zone management is proposed.It solves the problem of channel utilization degradation caused by hard partitioning,while it also can keep the wear balance of three kinds of flash cells.Finally,we implemented Cop-Flash-based computational storage on two widely used emulators.The experimental results show that the Cop-Flash proposed in this thesis improves the performance of flash arrays by 29.1%,38.8%,and 56.6% under multiple DNN model training workloads,and outperforms them by 2.3 times,1.29 times,and 8.3 times in terms of lifetime compared to typical hybrid flash as well as TLC flash. |