Research Of Characteristic Analysis And Application For Multivariate Credit Data Classification

Posted on:2019-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Zhang

Full Text:PDF

GTID:2429330551461205

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Credit risk classification has always been a hot topic in the area of research.With the development of Internet technology,multivariate credit classification is facing new opportunities and challenges especially in the context of globalization.On the one hand,the researchers,with the amount of data increasing,can easily obtain more references in the assessment of credit risk.on the other hand,the credit dataset become more large and complex,which makes some of the original classification process noneffective.The diversity of credit datasets brings more and more demands for the flexibility of evaluation methods.How to choose suitable solutions based on the characteristics of datasets is worth discussing.Based on the data characteristics driven modeling methodology,research based on the existing literature at home and abroad,we build an integrated data characteristic identification and solutions scheme for multivariate data classification.With the help of this framework,we can find data characteristics existed in the dataset and then apply the corresponding solutions to enhance the performance of classifiers.In this paper,the characteristics of multivariate classification dataset are divided into three categories:quantity characteristics,distribution characteristics and quality characteristics.The quantity characteristics include large and small sample size.The distribution characteristics contain high dimension,sparsity and class imbalance.The quality characteristics cover data missing and data noise.According to the difference of each characteristic in the external manifestation,we summarize the detection methods and form a complete data characteristics identification scheme.In addition,the corresponding solutions are also reviesed in detail.Moreover,in order to handle the high dimensionality existed in classification dataset,this paper applies association rule mining algorithm to promote the original random selected attribute bagging method.This model can deal with high-dimensional dataset and achieve better experimental results.For further illustration,some credit datasets are selected as sample data to test the applicability of data characteristic identification scheme and the new bagging model proposed in this paper.Empirical results reveal that data characteristics existed in all datasets can be identified clearly and accordingly suitable methods in terms of the data characteristics can be selected to handle them in the framework of the proposed solutions,indicating that the proposed framework can be used as an effective data characteristic identification tool for multivariate data classification.In addition,Compared with the traditional methods,the AR_WSAB can identify the default samples efficiently.And,it is more effective in handling with high-dimensional dataset.

Keywords/Search Tags:

multivariate data classification, data characteristics, charateristics identification, solutions, AR-WSAB

PDF Full Text Request

Related items

1	Characteristic Identification And Dimensionality Reduction Based Complex Data Forecasting And Classification Research
2	Statistical Identification And Measurement Of Psychological Factors In The Background Of Big Data
3	The Study Of Power Customer Classification Based On Data Mining
4	Research Of Spatial Identification Of Poverty In China Based On Nighttime Light Data
5	Study On Product CTQ Identification Based On Feature Selection
6	A Comparative Study On Credit Default Identification Of Four Kinds Of Data Mining Algorithms
7	The Application Of Data Mining In Stock Analyzing And Predicting
8	Research Of Listing Corporation Financial Fraud Identification Model Based On Data Mining
9	Research On Classification Of Banking Financial Customers Based On Data Mining
10	The Design And Realization Of Classification System Of One-to-One Marketing Based On Data Mining