Font Size: a A A

Research Of Characteristic Analysis And Application For Multivariate Credit Data Classification

Posted on:2019-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y D ZhangFull Text:PDF
GTID:2429330551461205Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Credit risk classification has always been a hot topic in the area of research.With the development of Internet technology,multivariate credit classification is facing new opportunities and challenges especially in the context of globalization.On the one hand,the researchers,with the amount of data increasing,can easily obtain more references in the assessment of credit risk.on the other hand,the credit dataset become more large and complex,which makes some of the original classification process noneffective.The diversity of credit datasets brings more and more demands for the flexibility of evaluation methods.How to choose suitable solutions based on the characteristics of datasets is worth discussing.Based on the data characteristics driven modeling methodology,research based on the existing literature at home and abroad,we build an integrated data characteristic identification and solutions scheme for multivariate data classification.With the help of this framework,we can find data characteristics existed in the dataset and then apply the corresponding solutions to enhance the performance of classifiers.In this paper,the characteristics of multivariate classification dataset are divided into three categories:quantity characteristics,distribution characteristics and quality characteristics.The quantity characteristics include large and small sample size.The distribution characteristics contain high dimension,sparsity and class imbalance.The quality characteristics cover data missing and data noise.According to the difference of each characteristic in the external manifestation,we summarize the detection methods and form a complete data characteristics identification scheme.In addition,the corresponding solutions are also reviesed in detail.Moreover,in order to handle the high dimensionality existed in classification dataset,this paper applies association rule mining algorithm to promote the original random selected attribute bagging method.This model can deal with high-dimensional dataset and achieve better experimental results.For further illustration,some credit datasets are selected as sample data to test the applicability of data characteristic identification scheme and the new bagging model proposed in this paper.Empirical results reveal that data characteristics existed in all datasets can be identified clearly and accordingly suitable methods in terms of the data characteristics can be selected to handle them in the framework of the proposed solutions,indicating that the proposed framework can be used as an effective data characteristic identification tool for multivariate data classification.In addition,Compared with the traditional methods,the AR_WSAB can identify the default samples efficiently.And,it is more effective in handling with high-dimensional dataset.
Keywords/Search Tags:multivariate data classification, data characteristics, charateristics identification, solutions, AR-WSAB
PDF Full Text Request
Related items