Font Size: a A A

Prediction Analysis Of Used Car Price In Sichuan And Chongqing Based On Lasso And Integrated Learning

Posted on:2024-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:L Y SongFull Text:PDF
GTID:2542307106986169Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Under the background of the increasing number of new vehicles in our country and multiple transaction channels such as the Internet,the second-hand car market ushered in a new stage of prosperous development.The healthy development of second-hand cars not only plays an important role in the industry,but also plays a positive role in promoting the development of the whole economy and society.The orderly and healthy circulation of second-hand cars in the market is the key to ensure its healthy development,and ensuring that the value of used cars matches their price is the basis for their orderly and healthy circulation.However,our country has not formed standard evaluation system in the field of price evaluation of used car at present,and some trading platforms exist "separate politics".Pricing standards are not uniform and non-standard situation making used car trading market price chaos.It is difficult to meet the value and price to match the demand.On the basis of learning the correlation algorithm of regression problem,this thesis uses the regular method theory based on Lasso and integrated learning method to build a variety of used car price prediction models,and analyzes the influence of used car price.This thesis collected second-hand car data from Sichuan and Chongqing in December 2022.Before modeling,the data were processed with missing values and outliers,and some covariables were numerically processed.On the basis of data preprocessing,data visualization and correlation analysis are carried out to preliminarily understand the influence of each covariable on the second-hand car price.The effective samples obtained after data processing are 15677,which are divided into training set and test set according to a ratio of 7:3.The sample size of training set is 10974,and that of test set is 4703.The regular regression prediction model based on Lasso and the ensemble learning prediction model are established on the training set.The regular regression prediction model is based on the linear mean regression analysis of Lasso regression analysis and elastic constraint estimation regression analysis.The ensemble learning prediction model includes the random forest prediction model and XGBoost prediction model.The mean square error and mean absolute error of the model on the test set were calculated to compare the prediction effect of each model.The results show that the integrated learning algorithm is better than the regular method based on Lasso in predicting the used car price.XGBoost model has the best prediction effect,and its mean absolute error and mean square error are only 0.57 and0.64 respectively.The elastic constraint estimation method has the worst effect,and its mean absolute error is 3.28.Although it can be seen that the prediction effect of Lasso-based regular method is relatively weaker than that of integrated learning algorithm,it still has certain prediction ability.Therefore,the regular method and integrated learning algorithm based on Lasso have certain predictive ability for second-hand car prices in Sichuan and Chongqing on the whole.By comparing and analyzing each model,the optimal prediction model in ensemble learning algorithm is XGBoost model,and the optimal prediction model in regularization method is Lasso regression model.When focusing on the predictive ability of the model,it is recommended to choose the XGBoost model of integrated learning algorithm firstly,and followed by the random forest model.Lasso regression analysis should be considered when focusing on the specific influence of each explanatory variable on the explained variable,such as positive or negative influence,and the change of one unit of an explanatory variable will cause the change of the explained variable.The results of Lasso-based canonical regression model show that,in terms of the influence of various factors on the price of used cars,mileage and official fuel consumption have reverse effects on the price of used cars.The vehicles with high prices have the characteristics of late registration,automatic transmission,medium and large vehicles,high environmental protection standards,all-wheel-drive mode,two-box or less,and gasoline fuel type.The results of XGBoost model show that,in terms of the importance of variables,the importance of factors affecting the price of second-hand cars from high to low is as follows: New cars including tax price,torque,engine power,registration time,mileage,wheelbase,space size,emission standards,displacement,official fuel consumption,nature of use,driving mode,vehicle class,transmission type,vehicle location,body structure,fuel type.The price prediction model established in this thesis can provide an objective and reasonable pricing method for the trading platform and make value mining judgment based on data.On the other hand,the price of the vehicle can be roughly estimated according to the relevant influencing factors of the vehicle,and more attention should be paid to the indicators with a high degree of influence in the pricing or purchase of second-hand cars.
Keywords/Search Tags:Price forecast, Regular method, Integrated learning, Mean absolute error
PDF Full Text Request
Related items