| In recent years,cloud computing has gradually become an important role in the third wave of information technology development,the amount of data that data centers need to process has also increased exponentially,which will lead to serious shortages of storage space and network bandwidth.In order to alleviate the increasing storage demand and reduce the cost of data transmission,the industry and academia usually use data compression technology to solve the corresponding problems.However,the data characteristics of the serial encoding of the compression algorithm seriously hinder the creation of multi-stage pipelines in hardware circuits.Therefore,for the application scenarios of lossless data compression in cloud computing,this paper optimizes the lossless compression algorithm for the hardware platform on the basis of in-depth analysis of the principles of the LZ77 algorithm and the Huffman algorithm,and designed a high-bandwidth,full-pipeline,low-latency,scalable,lossless compression and decompression hardware accelerator.Firstly,this paper realizes the high throughput of the hardware accelerator in the following ways:(1)Perform hash chain building and string matching processes in parallel at multiple consecutive locations in the parallel processing window.(2)Redesign the hash dictionary structure to solve the conflict of multiple strings accessing the hash database at the same time.(3)The bitstream packing module using the pipeline structure aligns the variable-length code stream to the byte boundary.Secondly,this paper improves the compression ratio of hardware accelerators in two ways:(1)Increase the hash chain depth of the hash dictionary to store more strings with the same hash value.(2)A lazy matching algorithm is proposed to find local optimal matching of parallel processing window.Finally,the hardware accelerator alleviates the negative impact of hardware delay by multi-threaded interaction at the collaborative level of CPU and FPGA,and makes full use of the PCI-Express link communication bandwidth and memory system resources.This paper uses Verilog HDL to complete the design of the hardware accelerator,and analyzes the code coverage and functional coverage based on the VCS&Verdi simulation platform.The prototype verification of this design is completed on Intel Stratix 10 FPGA,and according to the Quartus report,the design consumes 88.6k ALMs and 749 M20 k resources.Using the Calgary corpus for testing,the designed single-core compressor has an input throughput rate of 4.0GB/s with a 250 MHz clock and a comparable compression ratio of 2.13,and the designed single-core decompressor has an output throughput rate of164.5MB/s with a 250 MHz clock.Compared with hardware accelerators in similar literatures,the design has excellent performance in both resource efficiency and compression performance.It has a certain application prospect in cloud computing real-time compression and decompression services. |