Font Size: a A A

Research On Classification Method And Its Application In Risk Decision-Making With Feature Space Heterogeneity

Posted on:2010-07-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1119360275455440Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In risk decision-making,there exists a kind of problems in which a decision maker needs to establish a certain relationship between the historical data samples and the states of nature,and then for a new data sample,estimates the probability of each state of nature.Based on the information obtained,the decision maker would make the decision in order to maximize the function of expected revenue(or minimize the risk loss function) by using a risk decision-making model.From the perspective of establishing a relationship between the historical data samples and the states of nature, the above described problems would boil down to the classification problems in data mining.Therefore,various classification techniques could be applied to this kind of risk decision-making problems.Since the accuracy and efficiency of the classification techniques used are critically important,the research on classification methods and their applications in risk decision-making problems play an important role in both theory and practice.Most of the related researches have focused on classification techniques and their applications in different kinds of risk decision-making problems.As a matter of fact, exploring and knowing the characteristics of the data before any data mining technique is applied are important for the results.In classification,feature space heterogeneity is an important kind of data characteristics,and impacts significantly on the classification performance.This paper is focused on classification methods and their applications in risk decision-making considering feature space heterogeneity. The main aim of this research is to explore the existence of feature space heterogeneity in classification problems,and develop some novel classification approaches to deal with the feature space heterogeneity and improve the classification accuracy,which is helpful for risk decision-making.The organization of the thesis is as follows:In Chapter 1,we firstly explain the background of this paper,and then review the literature on various classification approaches and their their applications in risk decision-making,as well as the researches on feature space heterogeneity in classification problems.The content and significance of this thesis are addressed at the end of Chapter 1.In Chapter 2,the basic idea of classification in data mining is first introduced,followed by a brief description of feature relevance and feature selection in classification problems.Then we introduce the concept of feature space heterogeneity addressed in this paper.Since feature space heterogeneity is not directly observable from the data set,we propose a measurement for detecting and evaluating the feature space heterogeneity in a classification problem based on the main idea of meta-analysis.The main steps of the proposed measurement include global feature selection and random sample partitioning.Experimental results on a series of benchmark data sets and artificially mixed data sets verify the effectiveness of the proposed measurement.In Chapter 3,the impact of feature space heterogeneity on classification performance is investigated.We first briefly analyze the characteristics of feature space heterogeneity in classification,and then demonstrate that the feature space heterogeneity would degrade the classification performance if it is not considered.In this chapter,we propose a novel classification approach based on integration of logistic regression and support vector machines(SVMs).The main idea of this approach is to use the posterior probabilities obtained by logistic regression to modify the outputs of SVMs.In the experimental study,we demonstrate that for a classification problem with feature space heterogeneity,it is advantageous to partition the sample data set into homogeneous subsets and construct a specific classifier in each subset.In Chapters 4 and Chapter 5,two different classification approaches for dealing with the feature space heterogeneity are presented.Chapter 4 proposes a Classification Algorithm based on Factor Analysis and Clustering(CAFAC) to eliminate the feature space heterogeneity and improve the classification performance. In the proposed CAFAC,orthogonal factor analysis model is first applied to transform the original features into new features without irrelevance and redundancy. Heterogeneity in the original feature space can be reflected by the differences of new features,and captured by the clustering method adopted in our approach.Therefore, we could obtain a number of subsets in each of which the feature space is homogeneous.A component classifier is then constructed in each subset for classification.Experimental results on a series of benchmark data sets and artificially mixed data sets verify the effectiveness of the proposed CAFAC.In Chapter 5,we develop a novel classification algorithm,Supervised Clustering for Classification with Feature Space Heterogeneity(SCCFSH),which can be applied to some online risk decision-making problems with hard time and resource constraint.Our approach consists of four main steps:grid-based supervised clustering,supervised hierarchical grouping of clusters,feature relevance evaluation in each cluster,and weighted distance calculation for classification.The main advantage of the proposed SCCFSH is that it is enabled to deal with feature space heterogeneity in classification problems in a scalable and incremental way.Computational results in the experiments verify the efficiency and effectiveness of the proposed approach.Chapter 6 concludes the thesis, and gives some directions for further research.Innovations and contributions of this thesis are briefly summarized as follows:(1) An effective measurement for identifying and evaluating feature space heterogeneity in a classification problem is proposed.The measurement can be used to explore the data characteristics and provide some information for improving classification performance.(2) A novel classification approach based on integration of logistic regression and support vector machines is proposed.The new approach utilizes the posterior probabilities obtained by logistic regression to modify the output of SVMs and is capable of improving the classification accuracy in comparison with conventional SVMs.(3) For classification problems with significant feature space heterogeneity,a new classification algorithm based on factor analysis and clustering is proposed.The proposed algorithm is enabled to eliminate the feature space heterogeneity by partitioning the sample data set into homogeneoue subsets,and thus improve the classification performance.(4) A new classification approach capable of solving a classification problem with feature space heterogeneity in an incremental way is developed.This new method is favorable for on-line classification tasks with continuously changing data and hard constraints on time and resources.
Keywords/Search Tags:risk decision-making, classification, feature space heterogeneity, factor analysis, clustering, incremental learning
PDF Full Text Request
Related items