| In recent years,with the rapid development of society,people’s living standards and quality have made a qualitative leap,but more and more health problems have also accompanied them.Among them,cancer has become a major problem that seriously plagues people’s life and health.Breast cancer is a serious threat to the lives and health of women around the world.For breast cancer,its incidence continues to rise,and it is affected by multiple factors such as the internal environment of the body,living environment,and living habits.Patients of different races and regions may also show different characteristics,and different types of breast cancer patients may also show different characteristics.There are also differences in treatment plans,prognostic effects,survival time,etc.So scholars continue to explore and study breast cancer.With the development of artificial intelligence technology,machine learning has shown very strong data mining capabilities and has been applied to all walks of life.Combining machine learning with medical problems and integrating development,it proposes more intelligent auxiliary diagnosis methods,provides diagnosis reference for doctors,and strives for more treatment time and cure opportunities for patients.Explore more hidden information related to diseases and provide new directions and ideas for disease research.In this paper,using the clinical characteristic data of breast cancer patients admitted to Yunnan Cancer Hospital,the data is preliminarily cleaned and processed,and the missing values are filled with the Miss Forest,to determine whether to divide some characteristics of the patients according to the reference interval as the standard,the data set is divided into two cases,forming two different forms of the data set.Then,the datasets were put into the Random Forest and XGBoost models respectively.After training and parameter adjustment,the classification and prediction effect of the model on breast cancer molecular typing was improved.Finally,the overall classification and prediction effect of the model and the classification and prediction effect of a single category were evaluated.Comparing whether two different forms of data would have a larger impact on the model results,and looking for clinical features associated with breast cancer by feature importance.The results showed that whether to divide some clinical characteristics of breast cancer patients according to the reference interval and then put them into the model had little effect on the results of the same model;Among the two machine learning algorithms,XGBoost has a better classification and prediction effect on molecular typing of breast cancer than Random Forest.The XGBoost model trained by the dataset divided by some clinical features according to the reference interval has a better result,with an accuracy of 73.79%;From the perspective of the classification and prediction effect of a single category,different data sets in the same model have different effects on the classification and prediction effect of a single category.At the same time,no matter which model is trained with which data set,it can better classify and predict luminal A breast cancer,followed by luminal B breast cancer,but it is difficult to make a good judgment on HER-2 and TNBC breast cancer,This also shows that the characteristics of HER-2 and TNBC breast cancer are more complex;By comparing the top 20 rankings of the four characteristics obtained from different forms of data in different models and counting the frequency of each variable,it is found that proliferation index(Ki-67),estrogen receptor(ER),progesterone receptor(PR)and other three variables always rank in the top four and show great importance.In addition,human epidermal growth factor receptor-2(HER-2),low-density lipoprotein,leukocyte Estrogen,triglyceride,high-density lipoprotein,progesterone,neutrophils,tumor size,total cholesterol and other 10 variables also showed relatively great importance,indicating that these variables have a great influence on the classification and prediction of molecular typing of breast cancer,and also indicating that there is a certain relationship between these clinical characteristics and breast cancer. |