| In this paper,the purchase behavior of bank customers is studied,machine learning algorithm is used to predict the purchase intention of customers to order bank products,and the customer groups are classified.In the process of bank development,it is necessary to take customers as the center.By predicting whether customers will buy bank products and conducting classified research on customer groups,the cost of bank marketing activities can be reduced,precision marketing to customers can be realized,quality services can be provided to customers,economic benefits of banks can be improved,and development of banks can be promoted.This article uses the relevant data of a bank’s sales of fixed deposit products in marketing Activities,and based on constructing a single classification prediction model,proposes a Stacking prediction model that uses decision trees,support vector machines,and GBDT(Gradient Boosting Decision Tree)algorithms as secondary classifiers to predict whether customers will purchase bank fixed deposit products.Compared with the single model,the Stacking model has better prediction effect,which proves that the Stacking model has better prediction effect than the single model.The main work includes:(1)Firstly,understand the data structure,check whether there are missing values or duplicate samples in the data,and then visualize the data to analyze the distribution of customers who choose to purchase fixed deposit products on different variables.Use the Label Encoding algorithm to encode the data,and then perform feature selection based on the RFECV(RFE: Recursive Feature Elimination,CV: Cross Validation)algorithm and the GBDT algorithm,retaining ten variables to build the model.Due to the serious imbalance of data,11.7% of the total samples are purchased,while 88.3% are not purchased.The SMOTE algorithm is used to process the imbalance of data(2)Divide training sets and test sets,and construct decision trees,support vector machines,GBDT,and logical regression models.Through comparison,it can be seen that the AUC values of decision trees,logical regression,and support vector machine models have significantly improved after imbalance processing.Considering the prediction effect of the model from the perspective of accuracy,recall,and F1 value,the decision tree model has the best prediction effect.With decision tree,support vector machine and GBDT algorithm as the primary classifier,and logical regression algorithm as the secondary classifier,the Stacking model is constructed.Compared with the single algorithm,the prediction effect of the Stacking model is better,with the accuracy rate reaching 87.69%,2.13% higher than the single model,the recall rate reaching 88.01%,1.9% higher than the single model,and the F1 value reaching 88.32%,1.17% higher than the single model.(3)Based on the confusion matrix output from the Stacking model,customer groups are classified into potential customers,lost customers,cost customers,and non potential customers.Visualize the distribution of different groups in terms of continuous and classified characteristics through box charts and bar charts.Through comparative analysis,it is found that the age distribution of bank customers is wide.In marketing activities,customers over the age of 80 cannot be ignored,and these customers also have the possibility of purchasing fixed deposit products.The time interval for contacting customers should not be too long,and the time for phone conversations with customers should not be too long.More attention should be paid to regular customers of banks and customers without housing loans. |