| The rapid development of high-throughput bioinformatics technology provides new ideas and means for cancer diagnosis.By analyzing high-throughput omics data,the molecular changes of cancer can be revealed and biomarkers with diagnostic value can be found.Lung adenocarcinoma is a common subtype of lung cancer,which has no obvious early symptoms and relies mainly on X-ray examination and pathological biopsy for clinical diagnosis,resulting in low detection rate and poor prognosis.Therefore,it is of great significance to carry out early diagnosis and precise treatment of lung adenocarcinoma,which can improve the survival rate and quality of life of patients.This study aims to use multi-omics data and deep neural network model to predict lung adenocarcinoma and identify its biomarkers.The main work is as follows:First,gene expression data(GSE19188)and DNA methylation data(GSE139032,GSE49996)of lung adenocarcinoma were downloaded from GEO database,and data of12 lung adenocarcinoma patients were downloaded from TCGA database.The gene expression data were background corrected and normalized using R package “affy”,and noise and batch effects were removed.The DNA methylation data were preprocessed in the same way,and then the Beta values were converted into M values.Then,biological analysis feature selection method was used to reduce the dimensionality of multi-omics data.In addition,Cartesian product method was used to expand the sample size and integrate gene expression data and DNA methylation data.Second,a deep neural network model was built using the integrated multi-omics data to predict the differentially expressed genes between lung adenocarcinoma samples and normal samples.Compared with traditional machine learning models,this model has better performance and accuracy.At the same time,using the integrated analysis of gene expression data and DNA methylation data can also improve the accuracy of model prediction.The accuracy of this study was 0.9435,and AUROC was 0.9916.Finally,candidate biomarkers were screened by gene function and pathway enrichment analysis,protein-protein interaction network construction,weighted gene co-expression network analysis and other methods.The intersection of PPI network and WGCNA screened lung adenocarcinoma related genes was taken,and these potential genes were verified by Kaplan-Meier survival analysis using TCGA database to discover the detailed molecular mechanism of lung adenocarcinoma.And literature retrieval was used to evaluate the biological relevance of the selected genes.The results showed that COL5A2 and SERPINB5 were of great significance for identifying lung adenocarcinoma and were considered as biomarkers of lung adenocarcinoma.This study used deep neural network model to predict lung adenocarcinoma based on integrated multi-omics data and identified biomarkers by bioinformatics technology,achieving satisfactory results. |