| With the development of the Internet and the prevalence of third-party payment methods,some financial enterprises,such as banks,have been impacted.In order to further seize market share and obtain stable development,banks have begun to adapt to the trend of the times.They apply advanced computer technology to bank marketing business activities,predict customer demands,so as to carry out targeted marketing for customers,further maintain the relationship between customers and banks,and improve the probability of product marketing success.The transaction data of bank customers is unbalanced,and the cost-sensitive random forest algorithm is often used to deal with the two classification problems of imbalanced data sets,and it has high accuracy and minority classification performance,but it also has shortcomings.So,the thesis studies the cost-sensitive random forest algorithm,optimizes the cost-sensitive random forest algorithm,and designs and implements a bank marketing system based on customer demand prediction.The main work of the thesis is as follows:(1)The traditional cost-sensitive random forest algorithm does not consider the actual distribution of the sample data and uses Euclidean distance to calculate the sample distance when constructing the cost function,and does not consider the difference of the classification performance of a single decision tree in the basic decision tree classifier combination stage.To fix these two gaps,we optimize the algorithm by two steps: Firstly,the actual distribution of the sample and the feature weight are taken into account to ensure the fairness of the cost-sensitive function to important features and improve cost-sensitive learning in the construction of the cost function;Secondly,the prediction error rate of each decision tree is calculated by OOB(out of bag)data set when the basic decision tree classifiers are combined,and different voting weights are given to each decision tree based on the error rate to improve the performance the overall prediction performance of the classifier is improved.Also,to prove the cost-sensitive random forest algorithm,several UCI datasets such as bank marketing dataset are selected.The performance and accuracy of the improved cost-sensitive random forest algorithm are evaluated by comparing with the prediction models of random forest,traditional cost-sensitive random forest and support vector machine.(2)The data of a bank’s customer has the characteristics such as missing data in important fields and data storage types are mostly category or character type.To fix this gap,data preprocessing operations such as deleting redundant features,data normalization,so as to generate training feature vectors suitable for the improved cost-sensitive random forest algorithm prediction model.(3)On the basis of the SSM development integration framework and three-tier design model,a bank marketing system based on customer demand forecasting was designed and implemented,and an improved cost-sensitive random forest algorithm was used to construct a customer demand forecasting model.The system mainly includes five functional modules: customer management,marketing management,event management,special topic management and system management.This system has been successfully applied in a bank’s customer relationship management platform.In summary,the thesis studies and optimizes the shortcomings of the cost-sensitive random forest algorithm,and successfully applies it in bank customer demand forecasting.It provides a reference for algorithm improvement for projects that require two-classification of unbalanced data and it also provides a feasible solution for bank digital marketing. |