| HMMer is a widely used bioinformatics open source software toolkit that is composed of a serial of tool sets which can be used for classification and analysis of genie and protein. P7Viterbi is the kernel function of HMMer, which is composed of two nested loops that implement Plan7 HMM, a typical compute-intensive algorithm. Although Plan7 HMM has highly potential for operation parallelism, limited by the character of serial instruction execution, it is very time-consuming when running HMMer on CPU based traditional platforms.FPGA technology has been rapidly developed recently, and its processing speed and chip area have reached the requirement of application acceleration. FPGA-CPU heterogeneous cooperative systems are been widespread concerned, and its computational ability, applicable scope, development process, coupling methods, and application prospect are been researched widely. There are two kinds of FPGA-CPU heterogeneous system that can be used to explore the parallelism of Plan7 HMM: (1) parallel executing multiple kernels, and (2) parallel calculating simplified kernel. The latter has fine granularity and better performance, but the simplification results in loss of accuracy. This paper proposed a systolic array based architecture that justifies data dependency of Plan7 HMM, and an external bus coupled system design using PCI Express bus.After analyzing the structure and data dependency of Plan7 HMM, a systolic array based architecture is proposed: The data stream of systolic array deals with data dependency between nodes; A parallel data providing unit is used to provide coherent matching score; A calculation division mechanism is used to divide huge operation into small slices; A auto-recalculation mechanism is developed to handle the feedback edge that hinders the parallelism. Operation parallel, pipeline, and parameterization optimization technique are proposed for high performance and good portability when on FPGA implementation.Experimental results show that using a Virtex 5 110T chipset, compared to Pentium 4 and Core 2 Duo platforms, acceleration ratio per processing unit are 4.4 and 3.7 times respectively, total system acceleration ratio are 109 and 92 times respectively. |