| The incidence rate and incidence rate of lung cancer are the highest in all the malignant tumors.There are many different subtypes of lung cancer,among which adenocarcinoma is the most common type of cancer affecting the lung,accounting for about 40% of the world’s lung cancer.In recent years,although we know more and more about lung adenocarcinoma,the prognosis of lung adenocarcinoma is poor because it is difficult to be diagnosed in the early stage and easy to metastasize.The five-year survival rate is about 21%.In this context,based on the whole genome data of lung adenocarcinoma patients,we can find the gene markers related to the prognosis and survival of lung adenocarcinoma patients and build a prognosis model,which can help medical staff to carry out more accurate treatment of lung adenocarcinoma patients and improve the prognosis of lung adenocarcinoma patients.In this paper,we propose a bioinformatics analysis method that can predict the survival time of cancer patients.This method uses machine learning technology,based on the whole genome data of lung adenocarcinoma patients in TCGA database,combined with feature selection algorithm and classification algorithm to build a joint prediction model to predict whether the survival time of lung adenocarcinoma patients is more than three years.The model can help medical staff to divide different lung adenocarcinoma patients into different risk groups,and then can carry out more personalized treatment for different patients,improve the prognosis of patients.The main work of this paper is as follows:(1)The data of lung adenocarcinoma patients used in this study are from TCGA database.We download and collate the whole genome data and clinical data of lung adenocarcinoma patients through the official website,and carry out default value filling,standardization and other pre-processing work on the collated data,so that it can be transformed into a more convenient form of machine learning model.(2)Based on the whole genome data of lung adenocarcinoma patients,combined with differential expression gene screening and SVM-RFE feature selection algorithm,we found the feature gene set closely related to the prognosis and survival of lung adenocarcinoma patients.Then we use these feature genes as feature sets,and use support vector machine,logical regression,k-nearest neighbor,random forest and other machine learning algorithms to build a prognosis model to predict whether the survival time of lung adenocarcinoma patients can exceed three years.The results showed that the classification accuracy of all prognostic models was more than 80%,and AUC was about 0.9.Among them,the prognosis model constructed by SVM algorithm has the best classification effect,and the classification accuracy is 88%.(3)As a comparative experiment,we use the same four machine learning algorithms to construct the corresponding prognosis model based on the clinical data of lung adenocarcinoma patients,taking the clinical characteristics as the feature set,and the classification accuracy of the model is about 72%.By comparing the classification results,we can know that,compared with the clinical information,our joint prediction model has better classification effect in predicting the prognosis of patients with lung adenocarcinoma,and can better help medical staff to divide different lung adenocarcinoma patients into different risk groups,and then carry out more accurate treatment for different patients. |