Font Size: a A A

Support Vector Machine (SVM) Based On Feature Engineering Application For Non-life Insurance Bankruptcy Prediction

Posted on:2020-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ZhuFull Text:PDF
GTID:2439330590993110Subject:Insurance is superb
Abstract/Summary:PDF Full Text Request
The risk guarantee provided by insurance companies plays an important role in the development of the whole national economy.With the continuous development of economy and society,the demand of economic subjects for risk management increases sharply,and insurance,as the main body wit h the function of social "stabilizer" and economic "booster",needs more risk management,and its bankruptcy will certainly cause serious impact on the whole society.This not only challenges the ability of insurance-related risk management and risk control,but also affects the investment willingness and relevant decisions of investors,thus affecting the healthy operation of the entire insurance market.At the same time,the loss of insurance's safeguard function for the whole economy and society may cause social panic and economic instability.Therefore,it is necessary to effectively predict the financial risks of insurance companies,so that insurance companies can detect risks as early as possible and take preventive measures,so as to effectively avoid the adverse impact of insurance market risks on the whole economy,society and politics.Since the establishment of the single-variable financial warning model,a number of scholars have continuously proposed new bankruptcy prediction models,from the ZETA model to the multiple discriminant analysis method to the Logistic model to the most popular machine learning algorithm.As a kind of machine learning algorithm,support vector machine(SVM)has become one of the research hotspots due to its unique generalization and generalization ability,and has been applied to different fields.Compared with traditional machine learning methods,this method is mainly used to solve small-sample,nonlinear,high-dimensional machine learning problems and is often used for data classification.It has many advantages such as simple structure,good adaptability,global optimization,fast training speed and strong generalization ability.As mathematical breakthrough in other fields,the traditional support vector machine(SVM)theory and technology rapid development,appeared many improved support vector machine(SVM)theory,such as least squares support vector machine(SVM)and fuzzy support vector machine(SVM)and support vector machine(SVM),NN quadric surface-SVM,the BS-SVM and so on,these methods are the optimization of the original SVM has a certain effect.To sum up the previous scholars' methods for improving SVM,most of them start from the aspects of noise point processing,change of constraint function,sample classification by using distance,algorithm improvement,etc.No scholars have systematically studied the effect of feature engineering on SVM.In this paper,the non-life insurance companies in the United States are taken as samples,and the support vector machine based on the application of characteristic engineering is applied to the bankruptcy prediction of insurance companies.In addition,different classification results are obtained by focusing on different research objectives and setting corresponding thresholds.The empirical results show the characteristics of the engineering application to SVM model,predict ability is very good,greatly improve the classification accuracy,? error is greatly reduced.At the same time,the improvement effect of the three parts of feature engineering on the model classification result was compared,and it was found that the effect was different.The improvement effect of feature extraction was the best,but the comprehensive use should be more conducive to the improvement of the model performance.Finally,the model is evaluated,which further proves that feature engineering greatly improves the performance of SVM.Moreover,it is found that the classification accuracy obtained by setting different thresholds will also be different,and researchers can set corresponding targets according to the desired goal.The body of this paper can be roughly summarized into five parts:The first part firstly summarizes the background and significance of this topic,explains the research ideas,research contents,research methods and expected results,and introduces the machine learning model used in this paper.According to the development of prediction model,the improvement and development of support vector machine and the application of characteristic engineering,the existing research results at home and abroad are described.In the second part,bankruptcy is defined and SVM principle is introduced in detail.The mathematical model of SVM is derived in detail and its mathematical principle is described.At the same time,it also introduces the models formed by the previous scholars for their optimization,mainly FSVM,LSSVM,nn-svm,bs-svm,etc.,and briefly analyzes and summarizes these models.The previous scholars' model optimization is mainly reflected in the processing of sample distance and algorithm optimization.The third part introduces the characteristic engineering,first introduces some of its applications in other fields and then divides the characteristic engineering into three parts.In feature construction,the author changes the original data year by year to increase the data information.In the feature extraction stage,the author first normalizes the data to eliminate the dimensionality of the data,and then carries out T test to eliminate the variables that are not significant to the output variables.In the feature extraction stage,the author USES the information value to calculate and sort the variables,leaving the variables with strong predictive ability.The fourth part is the empirical part of this paper.The sample data is the American non-life insurance companies in the ISIS database in BVD.Data of all non-life insurance companies from 1995 to 2015 were selected.Then the characteristic engineering processing used for the samples is further explained.Finally,the training model is obtained by data fitting with support vector machine,and the test set is used for detection and analysis.Through comparison with the same group,the improvement effect of application feature engineering on SVM classification results was analyzed,and the experiment was carried out on three parts respectively to analyze the improvement effect of each part on prediction accuracy.Be based on the characteristics of the engineering application of the SVM model accuracy is 89.2%,? error is 0.096,? error is 0.233.Feature extraction performs better in the three parts of feature engineering.The fifth part is an introduction to evaluate model prediction ability of the three common indicators-confusion matrix,the ROC curve,AUC indicators,further proved by the empirical means through the ability to predict,based on the characteristics of the engineering model concluded in this paper,the SVM based on feature engineering used in the prediction performance is very good,three parts of model classification result has different degrees of improvement.In addition,this paper conducts threshold selection.On the basis of the confusion matrix and ROC curve introduced previously,it introduces threshold selection,briefly introduces its principle,and analyzes the meaning represented by different threshold selection.The sixth part is a summary of the research results of the whole paper,which suggests that the future bankruptcy prediction model should pay attention to the effect of characteristic engineering on data samples,and different thresholds should be selected for different objectives.Then,it expounds the reference significance for Chinese insurance companies,and looks forward to the future research direction for improvement.The innovation points of this paper are as follows:(1)start with feature engineering for sample processing,including feature construction,feature extraction and feature selection,and obtain better data features through three steps.The results show that feature engineering can greatly improve the accuracy of model classification results.All three parts can improve the classification performance of the model,and feature extraction has the best improvement effect on the accuracy of the model.(2)threshold analysis is carried out on the classification prediction results of the test set,and it is found that different thresholds can be set for different targets.Threshold setting will affect the classification results and accuracy of the whole model.Therefore,researchers can set thresholds according to the key targets of the study.
Keywords/Search Tags:support vector machine, Bankruptcy forecast, Feature engineering, The threshold value
PDF Full Text Request
Related items