Font Size: a A A

Research On Flight Price Prediction Technology Based On Imbalanced Data Classification Method

Posted on:2021-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:D QiuFull Text:PDF
GTID:2512306029481404Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In today's big data era,the classification problem has always attracted the attention of researchers.Most traditional classification algorithms are developed on balanced data sets,and it is assumed that the cost of misclassification of each category is the same,but most of the actual data sets processed are unbalanced data sets,that is,when a certain category of data in the data set It is obviously less than other categories of data,and traditional classification algorithms cannot classify such samples well.Therefore,the classification of imbalanced data sets has attracted more and more researchers' attention.This paper improves and optimizes the existing traditional classification algorithm and applies it to the classification of air ticket prices.Mainly carry out research from the following aspects:This paper mainly studies the classification problem of unbalanced data sets.Improve from the data level and the algorithm level,and apply it to the classification of ticket prices.Mainly conduct research from the following aspects:First of all,in order to solve the problem of imbalance between classes in the distribution of imbalanced data sets,this paper proposes a selective hybrid sampling algorithm.The algorithm adopts an undersampling algorithm for the majority of samples in the original data set,and an oversampling algorithm for the minority samples,which improves the balance of the original data set from both local and global perspectives,making the new data set closer to the original distribution,and then bring the new data set into the neural network model for predictive analysis.Secondly,for the algorithm level,this paper proposes an improved XGBoost algorithm combining the resampling technology and XGBoost.First,the Easy Ensemble processing is performed on the data set to obtain a more balanced data set,which is then brought into the XGBoost model for predictive analysis.Finally,select the F value and the recall rate as evaluation indicators to evaluate the two algorithms.According to the analysis of the indicator results,the neural network model based on selective mixed sampling is relatively effective in predicting the minority samples in the uneven data set better.
Keywords/Search Tags:Imbalanced data set, Re-sampling, Classification algorithm, Composite XGBoost algorithm
PDF Full Text Request
Related items