Font Size: a A A

Research On Decision Tree Algorithm And Application In Air Quality Assessment

Posted on:2018-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:N HuFull Text:PDF
GTID:2371330596453011Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Air pollution will cause inconvenience to people's life and work,more seriously,it will pose a threat to the safety of life and property and unlock a panic among the people.After suffering the harm caused by air pollution,people begin to pay attention to the problem of improving air quality.In order to improve air quality reasonably and effectively,it becomes essential to study a mass of data about air quality.Useful information can be dug out from acquired data to establish a model about classification and prediction which is used to predict air quality.The decision tree algorithm is a kind of more classic and more commonly used algorithm in the establishment of classification and prediction model.Because the generated rules are relatively easy to be understood and the results of classification are more accurate and so on,the decision tree algorithms are widely used by people.However,there are still exist some shortcomings in practical application.Improving the efficiency of the algorithm based on the classical algorithm,which is also an important direction of algorithm's research.In this paper,by learning the common decision tree algorithms,including ID3,C4.5,CART,NBTree and REPTree,and making a contrastive analysis of experiments based on multiple data sets,the C4.5 algorithm with better overall effect was selected for further research and improvement.To this end,we learned to use the various functions of Weka open source platform and the source code structure of C4.5 algorithm in Weka was studied emphatically.According to the shortcomings of C4.5 algorithm,the improved C4.5_BF algorithm and C4.5_FS algorithm were studied,which can improve the algorithm's accuracy and modeling time respectively.The C4.5_BF algorithm can adjust the information entropy of attributes by introducing the balance factor,to solve the problem that it will cause the lack of the algorithm's accuracy when C4.5 fell into local optimization.By comparing the 12 data sets downloaded by UCI,it was proved that the C4.5_BF algorithm can improve the accuracy of the algorithm when the data set's attributes were consistent.But the modeling time would increase.By simplifying the formulas effectively,the C4.5_FS algorithm removed a large number of logarithmic operations in the formulas.The time complexity of the algorithm was optimized,thereby reducing the modeling time.Meanwhile,the criterion of attribute selection was changed to the information gain rate multiplied by the attribute characteristic number,in order to depress influence on the accuracy when getting rid of logarithmic operations.The experimental results showed that this method could reduce the modeling time for a discrete attributes' data set,and the larger the number of data set,the more obvious the effect.But for the data set that its attributes were continuous,it couldn't achieve the effect.And the overall level of classification accuracy decreased.By verifying the improved algorithms,we can conclude that the improved algorithms reduce its versatility,and the performance of the algorithm is related to the characteristics of the data set.We should choose the appropriate algorithm in practical application based on the characteristics of the dataset and the desired effect.Finally,the decision tree algorithms and the improved algorithms were applied to the air quality's assessment,which proved the conclusion of the study again and obtained the optimal air quality classification and prediction model.
Keywords/Search Tags:Decision Tree, C4.5 Algorithm, Weka, Air Quality
PDF Full Text Request
Related items