Font Size: a A A

Ensemble Learning Models And Algorithms For Risk Decision-Making Problems

Posted on:2018-10-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:H S XiaoFull Text:PDF
GTID:1319330536968966Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Risk is mainly caused by the uncertainty of different states of nature in risk-decision problems.Therefore,a key to successful risk decision-making is the accurate estimation or prediction of the occurrence probabilities of different states of nature.From this aspect,this paper studies how to use ensemble learning theories and methods to accurately estimate and predict the occurrence probabilities of different states of nature.The main aim is to increase the profit and decrease the loss,as well as provide some theoretical fundamentals and reference points for risk decision-makers in practice.First,the literature related to risk decision-making and ensemble learning is reviewed and the following conclusions are made.The first one is that as the huge volume of data are now stored and can be easily retrieved by the managers and decision-makers,how to utilize machine learning and data mining,especially supervised learning techniques,to analyze the data and predict the occurrence probabilities of different states of nature,has become an important research area in risk decision-making.The second one is that it is beneficial to introduce ensemble learning to risk decision-making problems due to its strong generalization ability,which is quite favorable for increasing the accuracy of probability estimation and consequently,improve the result of risk decision-making.The last one is that different patterns and characteristics of data exist in different risk decision-making problems.There exists no learning model that could beat all others in any problem,due to the “No Free Lunch” theorem that the performance of learning models in a specific problem are determined by the data characteristics and patterns to a large extent.Therefore,Therefore,the characteristics and patterns of data in different risk decision-making problems are analyzed.And then,some ensemble learning models that are able to improve the prediction accuracy are proposed.Second,for the customs targeting problem and other similar risk decision-making problems where there are huge number of data samples,the values of some attributes vary in a large interval,and most attributes values are relatively concentrated,we analyze the impact of these characteristic on the performance of learning models and describe the limitations of apply the existing methods to this problem.To address these important issues,we propose a dynamic K-means clustering algorithm,in which the number of clusters is adjusted according the clustering validity function in a dynamic pattern.Our aim is to generate a cluster result with high similarity within the same cluster,and high dissimilarity between different clusters.Based on the proposed dynamic K-means clustering algorithm,we present a risk decision-making model and apply it to the real-world data set in customs targeting.The results of empirical study indicates that the risk decision-making model could effectively improve the hit rate of customs targeting,which is quite favorable for custom administration in both theoretical means and practical guidance.This study also provides an effective approach to risk decision-making problems with large volume of data and various values of attributes.Third,for the consumers' credit risk evaluation problem,we develop an ensemble learning model based on supervised clustering.Consumers may differ from each other quite a lot in their behaviors or purchasing patterns,which would lead to a performance degradation of single learning models.In addition,the existing researches using ensemble learning for credit evaluation are mainly based on random sampling,which may result in a poor performance due to the variety of consumers.To handle these issues,supervised clustering is employed in the proposed ensemble learning model to partition the data set into a number of subsets,in each of which the samples are from the same class.Subsets from different classes are then paired to form a number of training subsets.After the above steps,different base learners are constructed in different subsets.For a sample whose class label is unknown and needs to be predicted,we combine the outputs of different base learners by weighted voting.The weight associated with a base learner is determined by its performance in the neighborhood of the sample.In the empirically study,two benchmark data sets and one real-world data set are adopted to evaluate the performance of the proposed credit evaluation model based on ensemble learning.The results indicate that the proposed approach is able to obtain a number of diverse base learners and achieve stable results,which is helpful to improve the accuracy of credit evaluation and decrease the risk of credit granting,providing effective decision support for banks and credit agencies.Last,we study the customers responding problem,which is a key to successful database marketing,in the marketing research area.In this problem,there exists class imbalance among the customers,which means the number of responding customers(promising customers)is outnumbered by that of non-responding customers to a great extent.In addition,for better understanding of the patterns and characteristics of the customers,the customers responding model needs to be entailed high interpretability.To address these issues,we propose a database marketing model based on ensemble of associative classification rules.In the proposed model,samples belonging to the major class(non-target customers)are divided into a number of clusters through clustering analysis for better understanding of the various behavioral patterns and characteristics hidden in different customers groups.And then,the obtained clusters(customers groups of non-target customers)are combined with samples belonging to the minor class(target customers)to generate a number of training subset.In each training subset,a set of associative classification rules that satisfy the minimal support threshold are extracted.For a new customer whose class label(whether he/she is a target customer)needs to be predicted,the rules are ensembled based on their local confidences.To validate the efficacy and effectiveness of the proposed model for databse marketing,we compare the model with some other database marketing models on a real-world dataset.The empirical results show that not only could the proposed model achieve high hit rate,but also entail high interpretability,which is favorable for marketing managers and decision-makers in both theory and practice.
Keywords/Search Tags:Risk Decision-Making, Supervised Learning, Ensemble Learning, Clustering Analysis, Associative Classification Rule
PDF Full Text Request
Related items