| Since the 1960 s,as the development of astronomy,many important scientific discoveries and achievements have been achieved.The great astronomical breakthroughs increasingly depend on the operation of large telescopes and the processing and analysis of astronomical big data.SKA(Square Kilometre Array)radio telescope which is driven by grand scientific goals is one of the largest international cooperative large scale scientific engineering that we participate in the field of astronomy.SKA radio telescope form the observation conditions from the ground equipment to the space satellites(include space station),and achieve the full wave observation ability include visible light,X rays,γ rays,radio,infrared and ultraviolet rays,and it promotes the astronomical research into the big data era with exponential growth.At present,the scale of astronomical observation data is in the order of PB magnitude,and with the development and progress of astronomical telescopes and other equipments,the scale of astronomical data will soon reach the order of EB magnitude.Such a large amount of data will cause the serious storage problem and require near-real-time data processing,which poses a huge challenge to the data processing capabilities of current computing devices.Besides,in order to reduce the running cost of SKA project,the power budget is also strictly limited,and power consumption is also a very important factor in the selection of data processing platform.In addition to the processing performance and power,the accuracy of the results is also an important index has to be considered in the analysis of astronomical scientific data,because some scientific applications require a higher level numerical precision.Increasing numerical precision can be achieved by simply using precision type of larger bit widths.However,if the existing floating-point units can be optimized to improve the precision under the same bit width,the overhead of data storage and communication can be effectively reduced.At present,the design of floating-point unit adopts IEEE floating point standard – IEEE 754.The bit widths of components that make up the bit string of IEEE 754 floating point numbers are fixed,and some bits are underutilized when expressing the floating-point numbers with a high precision and a low order of magnitude,or with a low precision and a high order of magnitude.In order to improve the underutilization of bits caused by fixed precision characteristic of IEEE 754,on the basis of the variable precision computing theory proposed by John L.Gustafson,variable precision floating-point operation logic is proposed in this thesis.We select the appropriate parameter configurations according to the application requirements,and the variable precision floating-point operation unit is studied and designed on FPGA,which extends the existing IP core of floating point operation.The precision representation,dynamic range and performance of the IEEE 754 and variable precision floating-point unit are compared and analyzed.The results show that compared with the IEEE 754 IP core,the variable precision floating-point unit can provide higher precision and larger dynamic range,and can effectively improve the numerical analysis results for precision sensitive applications such as scientific data analysis applications.By further optimizing the implementation of variable precision floating-point unit,it can be used as a better choice of floating-point IP core.Considering the performance of data processing,existing high-performance solutions based on CPU and GPU cannot simultaneously meet the performance requirements and power budget of SKA scientific data processing.With the consideration of high energy efficiency of hardware accelerators,and the flexibility and cost of prototype design,FPGA(Field Programmable Gate Array)is selected as the hardware acceleration platform.The key algorithms of astronomical data processing,W-projection degridding and K-means clustering algorithm are taken as examples to study and design the FPGA prototypes.Through the analysis of behavior and bottlenecks of degridding algorithm,the memory structure and the strategy of memory access for degridding algorithm is proposed.Besides,according to the bandwidth of off-chip memory,the parallel computing logic of degridding is proposed,and we achieve the optimal balance of processing performance and resource consumption.Finally,through the analysis of the correlation of the required data when processing multiple spectrum channels in degridding algorithm,the data reuse strategy in computing samples of adjacent frequency channels is proposed,which further improves the overall performance.The functionality and performance of design is verified on the target FPGA board,and we compare the performance of FPGA-based prototype with the standard test program on CPU and GPU platform.The results show that the performance of FPGA-based prototype is 13.75 times better than the single core CPU,and achieves 2.74 times and 2.03 times speedup,7.64 times and 7.42 times energy efficiency than the full performance of MPI-based CPU benchmark and the CUDA-based GPU benchmark,respectively.In the software and hardware co-design optimization of K-means algorithm,through the analysis of the calculation proportion of the three steps with different data sets and cluster categories in K-means clustering,only the first two steps(distance calculation block and minimum distance search block)are implemented on FPGA,and the cluster center update operation is implemented on CPU.In the distance calculation block design,two parallel strategies are compared.The method of reading by column can reduce the redundancy of the read data,and the number of sample elements read by column is 1/3 lower than that reading all dimensional elements of samples at once.We instantiate 4 target K-means kernels according to the resources of the target FPGA and test the performance.After optimization of K-means kernel on FPGA,the time proportion of computation part in whole execution time of algorithm is very small,and the most time is used for device initialization and data transmission.The execution time of FPGA changes insignificantly,and it does not increase linearly with the increase of datasets scale.The throughput of a single computing node is restricted by the limited computing resources,and the distributed computing cluster can provide several times or even tens of times of performance improvement compared with a single computing node.Apache Spark is an efficient distributed computing framework for big data processing.It supports in-memory computing which based on RDDs(Resilient Distributed Dataset)and provides reusability,fault tolerance and real-time flow handling mechanism.However,the tasks based on Spark framework can only be performed on CPU by default,and the low parallelism and low energy efficiency of CPU limits the cluster’s performance and scalability.Aiming at the deficiency of distributed computing framework Spark,in order to use heterogeneous hardware such as FPGA to improve the performance of distributed computing,in this thesis,we propose a heterogeneous distributed computing framework,which can effectively improve the performance and energy efficiency of distributed applications by integrating FPGA accelerator into the original Spark framework.We take the K-means algorithm studied in the previous section as an example of heterogeneous distributed acceleration,and we also propose the strategy of performance optimization in the distributed computing environment,which can provide guidance for design optimization of other applications.Finally,we verify the performance of the proposed heterogeneous distributed computing framework,and the experimental results show that the performance of K-means algorithm in FPGA-based Spark cluster is 3.5 times better than that of CPU-based Spark cluster.In this thesis,we researched on floating-point unit,hardware and software co-design of algorithm and heterogeneous distributed computing framework,which are studied from the bottom to the top of the design and optimization of energy-efficient computing system for astronomical big data processing.According to the research process and the experimental results,we use performance analysis model to model and analyze the performance of the system,and we further summarize the optimization strategies for the design of energy-efficient computing system for astronmical big data processing.The designed prototype system and the proposed optimization strategies in this thesis can provide a good reference and optimization direction for the design of SKA scientific data processing system. |