Font Size: a A A

Comparative Analysis Of Unbalanced Data Classification Methods In The Field Of Financial Prediction

Posted on:2020-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y W LuFull Text:PDF
GTID:2439330602466891Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The efficient and orderly security market is an important prerequisite to ensure the efficient financing and orderly operation of the economy.And the establishment of the delisting mechanism provides a filtering function for the securities market,while ensuring that the securities market provides funds for excellent enterprises,it filters out those inferior companies that should have been eliminated but occupy limited resources.It is of great significance for tamping down the foundation of healthy and stable development of securities market.However,there is a problem that the non-quantitative index is difficult to operate,and the quantitative index is not comprehensive enough.The imperfection of the current risk warning standard in China gives many speculators the opportunity to take advantage of it.Some companies even "package" financial indicators to avoid being labeled by ST.The existence of this situation will not only reduce the utilization of resources and disrupt the good order of the capital market,but also make listed companies weaken their awareness of risk identification and mislead stakeholders into making decisions.Therefore,the establishment of a comprehensive index,accurate prediction and strong generalization of the warning method is particularly necessary.Many scholars domestic and abroad have done research on the application of machine learning method to the field of financial prediction,but in fact,most of the data of financial prediction indicators are unbalanced,and there are few studies on this basis.Further research on the combination and comparison of the two even more blank.On the basis of previous research,taking the financial index of A-share manufacturing listed company as an example,this paper uses unbalanced data processing methods of over-sampling,under-sampling,over-sampling and under-sampling,artificial data synthesis and machine learning methods of logical regression,decision tree,support vector machine,random forest and neural network.The results of different unbalanced data processing methods and different machine learning classification methods are compared.The prediction effect of different models is analyzed.Through the empirical analysis,comparing and studying the application effect of each method,the F-measures of these five methods are not high.The F-measure of decision tree model is the smallest and the least ideal.The AUC values of logic regression,support vector machine and neural network are relatively large,and the prediction results are better,and the result of support vector machine is the best.Therefore,from the model level,the performance of logic regression,support vector machine and neural network is the best.The performance of decision tree is the worst.For this nonlinear classification problem,decision tree and random forest methods should be avoided as far as possible.From the data level,the artificial data synthesis method is the best to fit the model.Following is the over-sampling and under-sampling data.The performance of models using data processed by the method of over-sampling and under-sampling are varied.In the logic regression,the under-sampling model is better than the over-sampling model,and the over-sampling model is better than the under-sampling model in neural network.And in the other three models,the performance is equivalent.The research in this paper makes up for the comparison of unbalanced data classification methods in the field of financial prediction.It is also of practical significance in economic life.In the application,if the unbalanced data is processed from the data level,the artificial data synthesis method is preferred to balance the data by constructing new samples in the neighborhood of a few kinds of samples,and then model fitting is carried out.Since most of the financial prediction problems are nonlinear classification problems,logical regression,support vector machine and neural network are preferred in the choosing of models.To sum up,when dealing with the problem of financial prediction analysis with unbalanced data,we can first use artificial data synthesis method to balance the data,then choose logical regression,support vector machine or neural network method to fit the model.
Keywords/Search Tags:bankruptcy prediction, unbalanced data, machine learning
PDF Full Text Request
Related items