Font Size: a A A

FAST Extragalactic Neutral Hydrogen Data Preprocessing And I/O Optimization

Posted on:2020-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y M JiFull Text:PDF
GTID:2480306518962869Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The Five-hundred-meter Aperture Spherical Telescope(FAST)is the largest single-antenna radio telescope in the world.Since it was completed in 2016,FAST has produced a large amount of observation data.Due to the influence of external radio signals and the equipment,the raw data recorded by the FAST will contain "impurities".Therefore,the raw data must be "cleaned" to be used for scientific research.The process is collectively referred to as data preprocessing.In order to match the high-speed data output during the FAST observation,the data preprocessing process needs to be integrated into a high-performance and continuously executed back-end program,which is called Pipeline.However,the existing pipelines widely used in radio data processing are mainly limited by special features or efficiency requirements,which cannot be adapted to the data preprocessing of FAST.In this context,this dissertation proposes a pipeline scheme driven by HDF5 data stream and the corresponding I/O optimization strategies.Firstly,the workflow engine driving multi-task pipeline is designed,which provides invocation interface for the following data preprocessing algorithms,including flow calibration,bandpass and baseline correction,radio frequency interference masking,Gridding,etc.The pipeline supports automatic execution,user interaction,and has certain fault tolerance.Secondly,the layout specification of radio data based on HDF5 format is designed and a fast conversion method from FITS format to HDF5 format is proposed.Finally,based on the data layout specification of HDF5 format,the corresponding I/O optimization strategies are proposed,including chunk storage,persistent data cache,and MPI-IO parallel operation.Chunk storage is the encapsulation of high frequency access data in the storage structure,which reduces the cost of read and write caused by the disk default storage.Persistent data cache realizes high-speed I/O of the whole Pipeline at the cost of memory overhead.MPI-IO is based on MPI parallel mode to make full use of the processors and realizes the simultaneous execution of multiple tasks.In order to verify the legality and feasibility of FAST data preprocessing pipeline,this dissertation conducted sufficient experiments,mainly testing the performance of HDF5 and FITS data I/O,the I/O performance improvement brought by chunk storage,data cache,and the impact of MPI-IO on the overall pipeline.Combined with the above optimization strategies,this dissertation realizes the parallel data preprocessing pipeline driven by FAST data flow.The performance of optimization scheme is 6?7 times faster than serial pipeline scheme based on FITS data.
Keywords/Search Tags:FAST, Radio data, Data preprocessing, FITS, HDF5, I/O optimization
PDF Full Text Request
Related items