Breast cancer is the leading cause of female death worldwide,which seriously affects people’s health and safety.In the five-year survival prediction of breast cancer,the traditional method is mainly based on the prediction of clinicians,and the prediction accuracy is not high.Survival prediction based on machine learning not only has higher prediction accuracy,but also helps doctors make more accurate personalized treatment plans for different prognostic criteria.The specific research content is as follows:Aiming at the problems of feature redundancy and the irrelevance between features and categories in the original breast cancer data set,this paper proposes an improved particle swarm optimization feature selection algorithm PSOFS-CC.Firstly,an initialization strategy based on standard mutual information and Pearson coefficient was proposed to generate high-quality particles.Secondly,a new position update strategy is proposed to prevent particles from deviating from the global optimal position.Finally,the crisscross strategy was introduced to help particles jump out of local optimum and improve the search ability.Experimental results show that the PSOFS-CC algorithm is better than the comparison algorithms in the average accuracy and the average number of selected features.The raw breast cancer data is imbalanced,which affects the performance of classification algorithms.Aiming at the problem that the AdaBoost ensemble learning algorithm has low classification performance in imbalanced data,this paper improves the AdaBoost ensemble learning algorithm.Firstly,the sample weight initialization was improved based on the imbalance rate of the sample.Secondly,the error rate update was improved based on AUC and F1.Finally,the vulture search algorithm was introduced to optimize the weight coefficients of the weak classifier.Experimental results show that the improved algorithm not only improves the accuracy of imbalanced data,but also enhances the recognition ability of minority samples.Aiming at the problem of low classification accuracy of breast cancer 5-year survival prediction,this paper established a breast cancer survival prediction model based on AdaBoost ensemble learning algorithm.Firstly,the breast cancer data in SEER were screened according to the screening criteria,and the breast cancer data were preprocessed from two aspects: data transformation and data normalization.Then,PSOFS-CC algorithm was used for feature selection,and combined with expert knowledge,the optimal feature subset was obtained.Finally,based on the improved AdaBoost ensemble learning algorithm,the five-year survival prediction model was established.Experiments show that the model established in this paper is better than the comparison algorithms in Accuracy,F1,G-mean and AUC,and improves the survival prediction results of breast cancer. |