Data mining technology is the cornerstone of various studies in the era of big data.As an important interdisciplinary research field,it includes computer science,statistical science and other multidisciplinary knowledge.Nowadays,using data mining technology to process and analyze gastric cancer expression data with "dimensional disasters" has become the current common research spotlight of cancer genomics and computer science.Studies have shown that the recurrence and metastasis of gastric cancer is one of the crucial reasons for the high mortality rate of gastric cancer.Accordingly,based on data mining technology to explore the mechanism of occurrence and development of gastric cancer recurrence and metastasis at the molecular level,and to provide clinical decision support,it is of great significance for the diagnosis and treatment of gastric cancer.Based on the investigation of the current situation at home and abroad,the following three scientific questions are proposed:(1)How to establish a screening model of hub genes for gastric cancer recurrence and metastasis based on data mining technology,which takes into account both data integrity and gene correlation?(2)How to excavate the causal relationship between the hub genes of gastric cancer recurrence and metastasis,and then reconstruct the regulatory network of key genes of gastric cancer recurrence and metastasis?(3)How to build a gastric cancer recurrence and metastasis classification model with excellent performance?Addressing the aforesaid problems,corresponding models to carry out research are established.The main work is summarized as follows:(1)Establishing a screening model of key genes for gastric cancer recurrence and metastasis based on weighted gene co-expression network.Based on this model,216 key genes for gastric cancer recurrence and metastasis were obtained while preserving data integrity and correlation.Enrichment analysis showed that the gene set was closely related to tumor recurrence and metastasis,which proved the feasibility and accuracy of the model.(2)Reconstructing the regulatory network of key gene sets for gastric cancer recurrence and metastasis based on Bayesian principle.The mountain climbing algorithm combined with BIC scoring strategy was used to search the network structure,the Bootstrap method was used to calculate the network confidence,and the robustness of the network was enhanced by deleting the edges with low confidence.Based on the above method,a directional weighted gene regulatory network with 209 nodes and 456 edges was obtained.In addition,the parallel computing method was adopted to accelerate the confidence calculation process,which reduced the time cost.(3)Establishing a classification and prediction model of gastric cancer recurrence and metastasis based on hybrid integration algorithm.A feature selection method based on differential expression analysis,chi-square test and light GBM score was presented,and a hybrid integrated algorithm based on Ada Boost and Stacking was proposed to construct the classification model.Finally,a classification model of gastric cancer recurrence and metastasis with accuracy of about 85%,sensitivity of about 81%,and precision of about 87% was obtained.Compared with the base classifier,the accuracy of the model is increased by about 25%,the sensitivity is increased by about 20%,and the precision is increased by about 16%,which proves that this model has excellent performance. |