| With the booming digital communication technology today,it becomes more and more important to transmit information efficiently and reliably in noisy channels.In 1962,Gallager proved that Low Density Parity Check(LDPC)codes have the performance close to the Shannon limit.But due to the limitation of the computer at that time,LDPC codes were not well known to people.Since the 1990 s,LDPC codes have been restudied and gradually become the research hotspot in the field of channel coding.LDPC convolutional codes belong to LDPC codes.The concept of Spatial Coupling(SC)is proposed with the appearance of LDPC convolutional codes.They are widely used for their pipelined decoder and low decoding delay.SC-LDPC's decoding algorithm is parallel,it is suitable for hardware implementation.Therefore,designing an efficient decoder of SC-LDPC is very important for engineering practice.The performance of computer's processor is improving day by day,researchers begin to accelerate the decoding algorithm by using hardware facilities.Decoder designed by Field Programmable Gate Array(FPGA)is the most common.It has made some achievements actually,but the flexibility and expansibility are poor when debugging,and the cost is relatively high.Graphic Processing Unit(GPU)just meet these requirements.A powerful unified computing structure platform named CUDA makes GPU programming simpler and easier.In this paper,a high efficiency decoder based on GPU is proposed for SC-LDPC codes.Through CUDA platform and kernel functions,GPU could accelerate massive parallel computing efficiently and reasonably shorten the time of data access and improve the decoding speed of SC-LDPC codes.The main contents of this paper include the following four optimization schemes for coarse-grained decoder: a)Compress the check matrix.By compressing the index information of diffient processors into the lookup tables of the global memory,it could save the storage space of GPU and improve the decoding speed.b)Check matrix's thread mapping and memory mapping inside GPU.c)A multi streams parallel scheme according to CUDA.d)Parallel decoding of multi codewords.Using combined access of external information to improve the throughput.Based on the coarse-grained optimization schemes,several optimization methods for fine-grained decoder are proposed: using algorithm optimization inside kernels,thread synchronization,simplifying multiple copies,using page locked memory,optimizing the allocation of TLP.With these methods,it could reduce the decoding complexity and ensure the continuous access,and accordingly shorten the time of data access.Simulation experiments for all the optimization schemes are carried out in this paper.By comparing and analyzing the decoding speed of CPU-based single-thread decoder and GPU-based decoder,it shows that the decoder designed by this paper could bring obvious acceleration. |