| Background:Acute Myeloid Leukemia(AML)is a highly clinically heterogeneous hematologic malignancies characterized by clonal proliferation and differentiation disorders of blast cells in bone marrow and peripheral blood.Patients with AML whose conditions develop rapidly,and most patients become critically ill.If treatment is delayed,death usually takes place within a few months.According to statistics,the incidence of leukemia in our country is 3/100,000 ~ 4/100,000.Among the mortality rates caused by malignant neoplastic diseases,leukemia ranks the 6th in male mortality and the 8th in female mortality.Among children and adults under the age of 35,leukemia ranks first.Studies have shown that the 5-year expected survival rate of adult AML patients is only20% to 25%,with poor prognosis.Risk stratification in AML prognosis factors mainly include cytogenetics(karyotype,chromosome translocation and inversion,etc.),molecular genetic abnormalities(including fusion gene,gene mutation)and the FAB classification and so on.With the rapid development of high-throughput sequencing technologies,the discovery of new molecular genetic abnormalities(such as gene overexpression or underexpression),epigenetics and proteomics abnormal findings,have further enriched the prognostic evaluation system.The traditional risk stratification has certain limitations on the efficacy evaluation and prognosis stratification of AML patients,and it can no longer meet the needs of clinical assessment of prognosis.Therefore,how to systematically and accurately evaluate the prognosis of AML patients and guide the individualized and precise treatment decisions of patients,so as to improve the overall survival of patients with AML,is a problem to many researchers.Prediction of disease prognosis by Machine Learning(ML)modeling has great clinical application potential.Based on this,this project is designed.Based on the analysis of gene expression profile data of AML patients,a model of AML1-year prognostic survival was constructed by using machine learning algorithm.Objective:Based on the data of public Firehose database,the relationship between gene expression level and 1-year’s prognosis of AML,and molecular markers related to the occurrence,development and prognosis of AML were analyzed through genomics method.A 1-year prognostic model of AML based on machine learning algorithm was constructed.The prediction performance of the model was evaluated,so as to provide a new evidence for prognostic risk stratification of AML,and then guide the individualized precision treatment decision of AML.Methods:Clinical and transcriptome Data of AML patients were downloaded from the Broad Firehose database of GDC(Genomic Data Commons),an external link.A total of 163 patients with survival and m RNA sequencing data that meet the requirements were screened out.With the survival period of 1-year as the limit,163 patients were divided into two groups with a survival period of more than 1 year as group1 and a survival period of less than 1 year as group2.The differentially expressed genes were screened by the DESeq program package of R.As a result,screening a total of 20 typical Differentially Expressed Genes(DEGs),which meet the requirement:| log2 Fold Change|≥1.4,adjusted P values < 0.05.The Rattle program package of R was used to construct six kinds of machine learning model based on 20 genes,which were to evaluate a oneyear prognosis of AML.Receiver Operating Characteristic Curve(ROC)was the evaluation index of the one-year prognosis model of AML patients to compare the prediction performance of six machine learning algorithms,and used the internal data to verify it,so as to screen out the highest value of AUC model as the most ideal prognosis model.Results:1.Setting | log2 Fold Change | ≥1.4,adjusted P values < 0.05 as the criteria,a total of 20 typical Differentially Expressed Genes(DEGs)was screened out by using DESeq program package of R.Among them,5 genes(EBF4,MTUS2,NT5 E,AFF2,IGDCC4)were up-regulated,and their high expression was associated with good prognosis.There were 15 down-regulated genes(ADAMTS2、TRPM4、PACSIN1、CACNG4、SPON1、CCDC3、VSTM4、MAOA、ESPN、C1QA、LILRA4、UBXN10、LIF、WDR86and PEG10),and the high expression of these genes was associated with poor prognosis,and all of the above 20 genes could be used as molecular markers for the occurrence,development and prognosis of AML.2.The Area Under Curve(AUC)value of Desicion Tree model is 0.63;the AUC value of the random forest(Random Forest,RF)is 0.72;the AUC value of the Boost model is 0.75;and the support vector machine(Support Vector Machine,SVM)model AUC value is 0.72;while the linear regression(Linear Regression)model AUC value is 0.71,and artificial neural network(Artificial Neural Network,ANN)model AUC value is 0.66.In conclusion,the Boost prognostic model is more effective in predicting the 1-year prognosis of patients.Conclusion:1.Gene expression level can be used as a factor for the 1-year prognosis of AML patients.2.Differential gene expression levels are closely related to the occurrence,development and prognosis of AML.Gene expression level,as an emerging prognostic marker,provides a new basis for prognostic risk stratification of AML and contributes to the realization of individualized precision treatment of AML.3.The model based on ML algorithm can accurately predict the prognosis of AML.Compared with the decision tree,RF,SVM,linear regression,and ANN models,the Boost model performed better in predicting one-year survival of AML. |