Font Size: a A A

Modeling And Application Of Support Vector Machine Based On Grey Incidence Analysis And Improved SMOTE

Posted on:2017-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:B H YiFull Text:PDF
GTID:2349330503495672Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the credit industry and the continuous improvement of data mining technology, the traditional manual work of credit risk assessment is gradually being replaced by machine learning under the background of big data. As an effective classification tool, support vector machine can be used to construct a classification model in a short time by learning the historical data, so the category of the new sample can be easily determined. Since the rigorous mathematical reasoning and solid statistical basis, it is accepted by more and more experts and scholars, and is widely used in industrial production, text recognition, image analysis, intrusion detection, advertising recommendation, management and assessment, financial and insurance, medical diagnosis, life science and many other fields.In real life, however, the data complexity in classification problems is increasing, as the noises can influence the classification results and the imbalanced data can create biased hyperplane. Accordingly, the classic support vector machine has a poor classification performance. In order to better apply support vector machine to the practical problems, it is necessary to make full consideration of the impact of noise samples and imbalanced data by combining the properties of support vector machine. Meanwhile, the reason of classification accuracy declining is worth analyzing so that the support vector machine model can be nicely improved from both theoretical and practical perspectives.In this paper, the relevant theory and properties of the classic support vector machine are studied, and the problems of the noise and imbalanced data are discussed respectively. An improved support vector machine is presented to solve the problems above and is applied to the customer credit risk assessment case in the micro-loan company. The classification accuracy of default customer is significantly increased. The main research content of this paper is as follows:(1) Grey incidence degree is introduced, mean absolute grey incidence degree is defined; to overcome the disadvantages in traditional methods and ensure the greater contribution of support vectors to the classification result, a new approach of distinguishing noise based on two class centers is proposed; details steps of setting fuzzy membership are given.(2) An improved SMOTE algorithm that only generates misclassified samples is given to solve the data imbalance problem; Random-SMOTE is introduced to ensure the generated samples well distributed; detailed flowchart of improved SMOTE algorithm is described.(3) The impact of noises when applying SMOTE algorithm is analyzed; a new support vector machine algorithm is given, combining with sample selection method by grey incidence degree and sample generating method by improved SMOTE; detailed flowchart of the combined algorithm is presented.(4) Problems on micro-credit company customer credit risk assessment are studied; the credit risk assessment index system is established and the detailed default distributions in different indexes are demonstrated; the proposed method is applied in a micro-credit company and tested by the real data, the classification accuracy of the default customer is higher than other compared algorithms.
Keywords/Search Tags:grey incidence degree, SMOTE, support vector machine, imbalanced data, credit risk
PDF Full Text Request
Related items