Font Size: a A A

Research On Association Of Airquality Data Base On Improveassociation Rules Algorithm

Posted on:2021-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:M SuFull Text:PDF
GTID:2491306560453084Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of China’s economic construction and social productivity,air pollution has become an important topic of special concern to the public and the government.Using increasingly mature data mining methods and relevance theory methods,valuable hidden information is mined from air quality data,and association rules implicit in massive data are obtained through analysis,which is important for the decision-making of air environmental governance.The existing association rule algorithm based on the idea of frequent pattern growth(FP-growth)algorithm has the problems of complex tree building process and tedious calculation support,which leads to low mining efficiency.To this end,a structure based on a Bitmap-code List(BC-List)is proposed in this paper,and based on this structure an improved association rule algorithm BCLARM algorithm is proposed.In order to improve the algorithm’s ability to process massive amounts of data,the BCLARM algorithm was implemented on the Spark platform.Finally,this algorithm is used to explore the correlation between air quality indicators,analyze the causes of air pollution,and provide a theoretical basis for air environment management strategies.The main contents of this thesis are as follows:(1)Based on Bitmap-code List,this paper proposed a frequent item set mining algorithm to improve this problem of complex construction rules and cumbersome support calculations.Firstly,in this algorithm,a node coding model based on bitmap representation was adopted to generate BC-Tree,and the node information of BC-Tree was used as the data structure to quickly obtain the node set of BC-List by bitwise operation,which can reduce complicated intersection operation and improve connection efficiency;Secondly,the search space for mining frequent patterns was reduced by using the superset equivalence and support count prune strategy.Experiments shows that the algorithm has faster mining speed than FIN and DFIN algorithms.(2)Aiming at the huge amount of data,the F-BCLFARM algorithm based on the Spark platform is proposed.This algorithm uses a load balancing-based grouping strategy to optimize the BCLFARM algorithm on the Spark platform.The purpose to improve mining efficiency.Experiments show that the F-BCLFARM algorithm performs well in parallel effects,scalability,and runtime,and it has high mining efficiency when processing sparse and dense data sets.(3)The improved F-BCLFARM algorithm based on the Spark distributed platform is used to mine the correlation between among air quality indicators and the correlation among air quality indicators that affect air pollution.Hourly air quality and meteorological data from the monitoring and acquisition equipment are processed and discretized,and then entered into the database.The F-BCLFARM algorithm proposed in this paper is used to mine data to generate association rules,analyzing the association rules will help to infer the cause of air pollution based on the results,and provide decision support for air environment governance.
Keywords/Search Tags:Frequent Itemsets Mining, Bitmap Encoding, Spark Distributed Platform, Air Quality Indicator Analysis
PDF Full Text Request
Related items