Font Size: a A A

Research On Key Technologies Of Personal Behavior Prediction Based On Random Forest

Posted on:2021-08-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:1488306464959139Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Personal behavior prediction(hereinafter referred to as behavior prediction)is to predict a person's future behavior and performance based on person's past behavior and performance.Behavior prediction can help us to better understand target individuals,which is a guiding measure to obtain the desired results in advance;on the other hand,to avoid bad results in advance.Therefore,behavior prediction research is of great significance in areas such as risk prevention,precision marketing,employee retention and so on.The existing prediction methods have made some achievements,however these methods are not good enough in practical applications with relatively low comprehensive performance due to their failure to address a series of problems with behavioral data such as high feature dimensions,insufficient effective data,unbalanced data types and dynamic data increase,ignoring their accuracy,computational efficiency and universality.To address the defects and deficiencies of the existing behavior prediction methods,this thesis constructs a behavior prediction framework based on random forest,the key technologies are analyzesd in the framework,three behavior prediction algorithms are proposesed based on random forest,and applies them to forecasts on employee initiative resignation,new financial product purchase and credit card customer default to verify the effectiveness in practice.The main content of this thesis is as follows:(1)According to the characteristics of the behavior prediction problem,a behavior prediction framework on the basis of random forest is constructed by combining the features of the behavior prediction problem,such as unbalanced data,high feature dimension,insufficient data and dynamic data increase.The framework is composed of eight modules,including data collection,data preprocessing,feature engineering,data splitting,model building,performance evaluation,improvement and optimization,and classification prediction.Upon this framework three different behavior prediction methods based on random forest are proposed.(2)A weighted quadratic random forest algorithm is proposed for the behavior prediction problems with unbalanced data types and high feature dimension data.This algorithm firstly sorts the importance of features to achieve dimensionality reduction,then adopts random forest modeling,calculates the weight of each tree by using F-measure value,and gets classification prediction results by weighted voting.Through the experiment of real employee data set,the proposed method has significant improvement in multiple evaluation indicators especially on recall rate and F-measure,compared with traditional random forest,decision tree,logistic regression,BP neural network and other algorithms.The experimental results can help human resource departments to predict the employee turnover more accurately,determine the key factors and provide reference for reducing the turnover intention of employees.(3)Considering behavior prediction problem with insufficient effective data and cold start,a transfer random forest algorithm is proposed.This algorithm firstly takes a large amount of historical data,similar to the research object,as the training sample in the source domain,and a small amount of data obtained from the research object as the training sample in the target domain.The data in the source domain are randomly sampled and combined with the full amount of data in the target domain to form the training data set.Then assign weight to each sample,and give more weight to the target domain sample.Secondly,in the process of generating random forest,the sample weight involved in Gini coefficient calculation determines the characteristics of node division.At the same time,the sample weight is involved in the calculation of the weight of each tree.Finally,the classification results are determined by a weighted vote.This algorithm is applied to the prediction of customer purchase behavior of new financial products based on the direct selling data set of Portuguese Banks,being capable of targeting customers accurately.Experiments show that the proposed algorithm has better performance on multiple evaluation indicators than random forest,decision tree,logistic regression and adaptive boosting algorithms.(4)Considering behavior prediction problem with dynamically increasing data,an incremental random forest algorithm is proposed to address the behavior prediction problems with dynamically increasing data.The algorithm first establishes a basic random forest model;then allows all samples arriving over time to directly participate in incremental modeling;a classification decision tree that supports incremental learning is built;finally discards the samples with the smallest contribution to free up unnecessary space occupation.This algorithm is used to predict credit card customer default behavior,and the model can be adjusted in real time based on dynamically added data to achieve better prediction results.Experiments show that this algorithm performs better on multiple evaluation indicators compared with traditional random forest,decision tree,logistic regression,naive Bayes,BP neural network and support vector machine.
Keywords/Search Tags:Behavior Prediction, Random Forest, Weighted Voting, Transfer Learning, Incremental Learning
PDF Full Text Request
Related items