| The coverage of China’s environmental monitoring network has gradually expanded,and a vast amount of pollution monitoring data has been collected from various environmental sites.However,these data are only used for data queries and generating daily,monthly,and annual monitoring reports by the environmental department,with low utilization and their potential value yet to be discovered.Therefore,there is an urgent need to fully utilize pollution data,explore the trend of atmospheric pollutant concentrations,and develop targeted pollution prevention and control measures.The emergence of data mining methods such as machine learning provides new perspectives for predicting the transmission paths of atmospheric pollutants in the context of network science.This thesis aims to use machine learning methods to solve the challenging problem of predicting the transmission paths of atmospheric pollutants and proposes a pollutant transmission path prediction model based on a fast attention structure and node category partitioning.The main research work of this paper is divided into the following three aspects.Firstly,a SCLP link prediction algorithm based on a fast attention mechanism was proposed.This part combines graph random walk strategies and graph convolutional neural network technology,first using the Struc2vec algorithm to obtain the initial feature vectors of network nodes.Then,referring to the memory addressing method in the neural Turing machine and the important node discovery algorithm based on betweenness centrality and other related works,a fast attention mechanism is designed to effectively adjust the initial vectors of the nodes,and then use the adjusted node vectors to perform link prediction tasks in complex networks.Experimental results comparing the ProNe,Node2vec,Hope,and Deep Walk algorithms demonstrate that the SCLP algorithm achieves an average improvement of 12.6%and 6.9%in AUC and accuracy values,respectively.Secondly,an APC algorithm for node category partitioning was proposed.This algorithm aims to solve the problem that the AP algorithm has poor clustering performance for non-circular data by proposing a distance-based merging process.This process uses single-linkage and complete-linkage methods used in hierarchical clustering to calculate and compare the average distance between nodes of different categories and the global average distance of all nodes,and then merges the clustering results.The APC algorithm that introduces the merging process not only solves the problem of poor clustering performance for non-circular data but also still has good support for circular data.Thirdly,prediction of pollutant transmission paths based on SCLP-APC.The process consists of the following two steps:(1)Construction of the pollutant network.The concept of Copula entropy is introduced to quantify the amount of PM2.5,a major atmospheric pollutant,transmitted between different monitoring stations.Based on the calculated PM2.5 transmission amount and transmission stability between different monitoring stations,appropriate thresholds are set to establish relationships between different monitoring stations,and the monitoring stations are treated as nodes in the network to construct the atmospheric pollutant transmission network.(2)The prediction of atmospheric pollutant transmission paths is transformed into link prediction in a complex network.The constructed network is input into the pollutant transmission prediction model to obtain the node category partition and the probability of links between nodes in the constructed network.A decay factor is set to consider the influence of node category on the probability of link formation between different categories of nodes.The results of experiments conducted using pollutant data collected in Lanzhou show that the proposed model has superior prediction performance,and provides a solution for urban air pollution control. |