| With the completion of FAST and the 19-beam receiver survey project,the high sensitivity and larger sky area coverage have led to a significant increase in the number of observable objects and astronomical phenomena,providing a rich research sample for the progress of astronomical research.However,as the performance of radio telescopes continues to improve,the sensitivity of detecting faint signals has increased,and the number of interfering signals has also increased dramatically.Therefore,the annual petabytes of pulsar survey data bring the advantage of search scope and the problem of high computational resource consumption and difficult screening,so how to efficiently screen pulsar candidates has become a hot topic in this research field.Currently,artificial intelligence methods are widely used in pulsar candidate screening.Among them,most of the machine learning-based methods adopt a supervised learning approach,which has a strong dependence on data labeling,and the binary classification model used is prone to unknown pulsar or new pulsar omission,which is not strong in generalization.In this thesis,we propose a clustering-based pulsar candidate screening scheme to address the needs of unsupervised or semi-supervised learning scenarios to solve the scalar data mining problem of large-scale and fast sampling,and to facilitate the discovery of new pulsars.The scheme integrates the sliding window-based data set partitioning strategy and Mapreduce’s parallelization framework to improve the data set imbalance problem and enhance the algorithm operation efficiency.In addition,the advantages of density hierarchy-based and division-based clustering methods are combined for cluster class delineation,and kernel function similarity measures are introduced to improve the density measure.The results of comparative experiments on the pulsar data set HTRU2 and the actual FAST observation data set AOD-FAST to show that the algorithm can achieve better results,with the precision and recall of 0.946 and 0.905 on HTRU2;the F1-score and recall of0.846 and 0.994 on AOD-FAST,respectively;and when the parallel nodes are sufficient,the algorithm time complexity of the algorithm decreases significantly compared to the serial execution when there are enough parallel nodes.It can be seen that this method provides a feasible idea for the analysis and mining of big data of pulsar observations.This thesis also proposes a scheme to integrate this clustering algorithm into the PRESTO-based FAST pulsar distributed search process(pipeline),which combines the steps of signal processing of raw observation data,candidate generation,data conversion,candidate signal screening based on hybrid clustering algorithm,and candidate diagnostic map identification,improving the process from signal processing to candidate identification.It is expected to complement and assist the pulsar candidate screening process and improve the efficiency of pulsar search. |