Font Size: a A A

Research On Air Quality Classification Of Non-random Sample Augmented XGBoost Model

Posted on:2023-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2531306620953479Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The effective application of machine learning models is mainly composed of three parts: data,algorithms and computing power.At present,the research focus of this type of model is to improve the machine learning algorithm part.However,in the field of practical application,the main factor that restricts the prediction effect of machine learning is the quality of data,especially in the case of small amount of data and many types of data,the scale and quality of data will not be able to match the requirements of the algorithm.Therefore,in view of the problem that the training and prediction effect of the XGBoost model is poor due to the small amount of data and the large number of categories,the commonly used methods are: over-sampling or under-sampling in samples with unbalanced categories? data augmentation.When the data is balanced and non-image data,how to use limited data to improve the prediction accuracy of the XGBoost model on the test set has become a problem to be solved in this paper.To solve this problem,an XGBoost model based on non-random sample augmentation is proposed.By taking non-random sampling from the originally separated test set and returning the sampled samples to the original training set,the amount of data in the training set is increased,the training intensity of the XGBoost model is strengthened,and the performance of the model on the test set is improved.,the main research results are the following two parts.First,combine the XGBoost model with non-random sample augmentation to perform predictive analysis using the generated simulated categorical data.It is found that the performance of the XGBoost model on the test set has been improved in different ways after the non-random sample is enlarged,and the improvement accuracy ranges from 0.21% to 3.33%.After repeated non-random sample augmentation,the maximum accuracy obtained is before the normalized expectation and variance boundary.Second,the non-random sample capacity-enhancing XGBoost model was used to analyze the air quality of Chengdu,Harbin,and Kunming.The quality of the prediction analysis,the results show that:(1)Compared with the prediction results of the XGBoost model,Chengdu,Harbin,The air quality classification prediction accuracy in Kunming was improved by 3.48%,0.3% and 0.6% respectively?(2)Based on Optuna parameter tuning The framework adjusts the hyperparameter parameters of the model.Compared with before the parameter adjustment,the prediction accuracy of the three cities is improved by 8.36%,5.47%,2.13%.The simulation and empirical studies in this paper show that in the case of balanced small samples,the non-random sample augmentation method can effectively improve the performance of the XGBoost model on the test set,so the method can be extended to similar data.
Keywords/Search Tags:AQI, Non-random Sample Augmentationb, CiteSpace, Random Forest, XGBoost
PDF Full Text Request
Related items