As a major branch of artificial intelligence,machine learning has been developing vigorously in recent years and continuously carrying out interdisciplinary cooperation.In the medical field,the emergence of medical image diagnosis,treatment query and advice,medical data collection,drug discovery robot surgery and other new content has benefited from this.Among all medical conditions,cancer,as a malignant disease,has always been the direction of medical practitioners.Lung cancer has been consistently ranked first among cancers,83 percent of which are non-small cell lung cancer,so it is a very aggressive cancer.In view of the high incidence of this lung cancer,good prognosis of patients is the most important thing.In the existing published literature,limited by the lack of follow-up data and the single data volume,there are few clinical prediction models for NSCLC.Some hospitals use their own programs to make predictions,but the samples are all from their own patients.Even if it is a third-class hospital,the hospital’s patients are mostly from the surrounding cities to see a doctor.From a statistical point of view,the data itself is offset and universality is not the highest.The most accurate model used at present is 72.87%,with a total sample size of 683.In this study,Asian tumor patients were used as the main subjects to construct a survival prediction model and to predict the prognosis of patients 5 years after surgery.On the contrary,as a highly malignant disease,the 5-year prognostic index is not enough to meet the clinical needs.Therefore,a prognosis prediction model with a large sample size,complete information,accurate prediction effect and wide application is needed in clinical practice.In order to solve the above problems,this paper introduces several machine learning models and obtains a large sample size through the medical data platform for analysis.The platform covers the whole country,collecting diagnosis and treatment data from nearly 100 hospitals or institutions in the national medical system,which are mainly distributed in shandong,tianjin,Beijing,henan,guangzhou and Shanghai,covering about 30 million patients.To some extent,the bias of single center samples can be compensated.Based on a regional health care big data platform,this paper screened relevant clinical data that met the inclusion criteria,integrated them into target data sets after cleaning and standardization,and randomly divided them into training sets and test sets.The machine learning model was constructed and optimized on the training set to predict the recurrence of lung cancer in patients 2 years and 5 years after surgery,and the prediction results of each model were verified and compared on the test set.In order to ensure that the model conforms to the practical application as much as possible,five clinical experts were invited to give valuable clinical suggestions in the selection of independent variables.After the prediction results were obtained,the obfuscating matrix,accuracy,area under ROC curve(AUROC),accuracy and recall were compared.Through the comparison of several models,the optimal model for predicting the prognosis of NSCLC was found.In predicting recurrence of lung cancer of different model,the Logistic regression to predict the overall good performance,rather than the performance of the parameter model was not significantly increased performance even worse(especially KNN classifier),but the neural network model based on deep learning can greatly improve the accuracy of prediction and classification effect,compared with the Logistic regression model has better prediction results and performance advantages.It showed excellent accuracy in predicting 2-year recurrence and 5-year recurrence,reaching 86.2% and 83.0% respectively.This study aims to help clinicians complete their work more efficiently and accurately from an interdisciplinary perspective.The prediction of postoperative recurrence of early NSCLC patients with a large sample of real-world clinical diagnosis and treatment data can reflect and represent various situations in the real world,which has important scientific value and clinical significance for the study and understanding of the pathogenesis characteristics and postoperative prognosis of early NSCLC patients in China.On the other hand,learning from machine learning methods and model algorithms can more comprehensively predict the risk of postoperative recurrence of NSCLC,providing more valuable information for first-line clinicians to make decisions on the course of patients’ disease.Early detection,early diagnosis and early treatment are the three early concepts always advocated in medicine.Our research fits that description.If a patient can be warned about the risk of recurrence in advance,the doctor can intervene to prevent the disease from advancing to the late stage or starting treatment when metastasis has occurred,which can easily lead to missed optimal treatment time.It is expected that this study will provide a possibility for the medical field and provide valuable reference for doctors and patients. |