| There are about 3 billion base pairs in human genes,and due to the large number of bases,the amount of computation required for sequence alignment during gene sequencing is enormous.The huge amount of computation leads to long computation time and high energy consumption,which significantly prolongs the output cycle of research results in biomedical fields on the one hand,and leads to high sequencing costs on the other.Although more gene sequencing research efforts are available,most of these efforts use distributed parallel CPUs for computation.Since CPU is a generalpurpose processor and its hardware architecture is not adapted to gene sequencing algorithms,the efficiency of running sequence matching algorithms using CPU is low,and not only the computation time is long,but also the energy consumption cost is high.To address the above problems,this paper adopts a heterogeneous computing architecture of a general-purpose CPU processor plus a GPU or FPGA dedicated accelerator to run the tasks of the general-purpose computation part of gene sequence matching on the CPU,while the tasks of the highly intensive computation part are loaded to the GPU or FPGA for acceleration.This heterogeneous computing approach achieves the goal of optimizing the time and energy consumption of gene sequencing sequence alignment.The specific research work is as follows:Firstly,to address the problem that CPU computational parallelism is low and cannot efficiently execute the sequence matching algorithm,this paper designs a computational acceleration scheme that matches the computational characteristics of the sequence matching algorithm based on a GPU with high computational parallelism.The GPU acceleration scheme first analyzes the computational characteristics of the algorithm and proposes the idea of parallel computation along the diagonal.Then,three levels of parallel computing strategies are designed by combining the Grid-BlockThread three-level thread management method of GPU.Finally,the LD score calculation module and the path backtracking module of the sequence matching are implemented,so that the parallel computing characteristics of GPU can be exploited as much as possible to improve the computational speed of sequence matching.Secondly,although GPUs have high parallelism and can achieve sequence matching computation acceleration,they have high power consumption and are not suitable for large-scale deployment.In recent years,FPGAs have been widely used for computational acceleration research of various algorithms due to their high flexibility and low power consumption;therefore,in this paper,we will investigate a dedicated accelerator for gene sequencing sequence matching operations based on FPGAs.However,before designing the FPGA accelerator,the research in this paper finds that the computational characteristics of the sequence matching algorithm are not fully matched with the hardware architecture characteristics of the FPGA.Therefore,it would be difficult to achieve good computational acceleration if the algorithm is deployed directly on FPGAs.To address this problem,this paper optimizes and adjusts the sequence matching algorithm based on the hardware architecture characteristics of FPGAs to make the algorithm more suitable for deployment on FPGAs.First,the algorithm’s calculation method is adjusted,and the data flow and parallel calculation method along the diagonal are proposed,so that the algorithm’s calculation method matches the FPGA pulsating array architecture designed subsequently in this paper;then,the algorithm’s data storage method is optimized,and the data chunking storage and redundant data removal strategies are implemented,so that the algorithm can be deployed on FPGAs with less storage resources.Through the above optimization of the software algorithm,the algorithm is more suitable for efficient deployment on FPGA.Thirdly,after completing the adaptation of the algorithm to the FPGA hardware architecture,this paper further adopts the Verilog hardware description language to design a hardware accelerator dedicated to gene sequence alignment.In order to ensure that the accelerator can achieve high computational speed even on resource-constrained FPGAs,this paper adopts the idea of data-flowing pulsating arrays and dynamic reusability of resources to design a hardware accelerator architecture.First,this paper proposes a number of configurable dataflow PE processing unit that can constitute a data-sharing PE pulsating array according to data dependency,and improve the computational speed of sequence alignment through the hardware implementation of algorithms,the reduction of the number of data accesses,and the improvement of computational parallelism.Then,the PE processing units are dynamically reused through the chunking of gene sequences to reduce the resource consumption and power consumption of the alignment.Through various methods mentioned above,the sequence matching accelerator can be deployed on resource-constrained FPGAs and achieve high computational speed.In the experimental session,this paper implements CPU/GPU and CPU/FPGA heterogeneous computational sequence comparison acceleration systems on servers equipped with NVIDIA Tesla PH402 SKU 200 GPUs and Intel PAC D5005 FPGAs,respectively,and evaluates the performance of the accelerators.First,the designed GPU and FPGA accelerators were compared with the conventional CPU processor in terms of computational speed,respectively.The results show that when the test data length is32768 bp,the computation speed of the GPU is 90 times that of the CPU,while the computation speed of the FPGA is 856 times that of the CPU.Then,the designed GPU accelerator and FPGA accelerator are compared in terms of performance-to-power ratio.The results show that the FPGA accelerator achieves a performance-to-power ratio of19 times that of the GPU accelerator at a test data length of 32768 bp.Finally,the performance of the FPGA accelerator in this paper is compared with that of other sequence alignment accelerators.The results show that the peak performance of the FPGA accelerator in this paper is 40 gcups,which exceeds that of many other FPGA accelerators.Although the peak performance of this paper is worse than that of a small number of other FPGA accelerators,the logic resource consumption of the accelerator designed in this paper is only 9% of that of d5005 FPGA,which is much lower than that of other FPGA accelerators,so it is more suitable for deployment on resource limited FPGA. |