| Talent is the key force for the vigorous development of an enterprise,and now many enterprises are facing the problem of brain drain caused by the active turnover of employees.Brain drain will reduce the competitiveness of enterprises,and then affect the development of enterprises.How to solve this problem has become a difficult problem for human resource managers.Statistical analysis and machine learning method can be used to find the key factors affecting employee turnover and establish employee turnover forecast model,help human resource managers to analyze the reasons for employee turnover,find employees with turnover intention in advance and take corresponding measures to reduce the risk of employee turnover.Employee turnover data provided by IBM Watson Analytics platform was taken as the research object.After the data was explained and cleaned,logistic regression model was used to analyze the importance of the characteristics that affect employee turnover,in order to find the key factors that affect employee turnover.First of all,descriptive statistical method was used to analyze the distribution of employees in various characteristics of the company,so as to get a preliminary understanding of the distribution of employees as a whole.Then,the correlation between the characteristics and the turnover status was analyzed to eliminate the features that have no linear correlation with the turnover status.After that,the data was preprocessed,the features were selected,and the unordered features were one-hot encoded.After the relevant parameters were set,the 10-fold cross-validation training model was used to train the model and the logistic regression model with accuracy rate of 74%,accuracy rate of36%,recall rate of 73% and AUC value of 0.80 was obtained,as well as the ranking of each characteristic positively and negatively related to employee turnover was obtained.The three factors positively related to employee turnover,travel frequently,overtime,single,were selected,combined with employee’s age and monthly income to analyze the reasons for the turnover of sales representatives,laboratory technicians,human resources,sales executive and research scientist with higher turnover ratio,and the corresponding suggestions were given.In order to avoid over fitting of the model caused by unbalanced data processing,Bagging algorithm and under sampling method were combined to process the imbalanced data in the turnover data set,and the enterprise employee turnover forecast model was established.Factor analysis and WOE coding were used to process the features.Bagging algorithm and under sampling methods were used to construct a balanced data subset.Logistic regression was selected as the base classifier,and the corresponding parameters were set for training.The 5-fold cross-validation training model was used to train the base classifier Finally,the forecast model with accuracy rate of 76%,accuracy rate of 41%,recall rate of 81% and AUC value of 0.85 was obtained,and the recall rate and AUC value were higher than those of logistic regression model,random forest model and gradient boosting decision tree model,which shows that the model had good forecast performance.The results show that logistic regression model can be used to get the importance order of each characteristic which is positively and negatively correlated with employee turnover.Combined with bagging algorithm and under sampling method,a prediction model can be established which can avoid over fitting and has good prediction performance,and can help enterprise human resource managers to analyze the reasons of employee turnover,find the employees with turnover intention in advance.It has a certain reference value to solve the problem of employee turnover in enterprises. |