Font Size: a A A

Research On Cancer Driver Pattern Mining Algorithm Based On Multi-omics Feature

Posted on:2024-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:X ChuFull Text:PDF
GTID:2554306923488944Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the completion of the genome project and the rapid development of high-throughput sequencing technology,a large amount of genomic data has been generated.Among the large amount of cancer multi-omics data,mining genes associated with cancer development has become one of the current hot topics.Most of the existing methods identify cancer driver genes from single histology data,while how to identify cancer driver genes or gene modules using effective information from multi-omics data still needs to be further improved.Thesis integrates cancer multi-omics data and make full use of the histological feature information,structural information and functional information among genes to improve the ability of identifying driver genes and driver modules for better performance in cancer prediction and feature extraction.The specific study consists of three main parts as follows:(1)To address the problem that multi-omics features are underutilized in cancer driver gene identification,this thesis proposes a model based on a machine learning approach to analyze the impact of multi-omics features on identifying driver genes.The method uses the Kullback-Leibler measure to calculate the feature importance of CGC genes and non-CGC genes in four different histological data,and then uses machine learning algorithm to detect cancer driver genes in the pan-cancer data.The prediction results of the method on the pan-cancer dataset validate the effectiveness of the method.In addition,the method can find certain causative genes associated with cancer.(2)To solve the issue that functional and structural information among genes may affect the identification of driver genes,this thesis proposes a network embedding framework for identifying driver genes based on functional and structural information.The method uses a network propagation algorithm to obtain gene function information to construct a mutation integration network that associates genes with weak node information.The structural information features of genes are extracted from the constructed mutation integration network using the struc2 vec model,and genes with similar but distant structures are found by structural similarity.The biological network constructed by this method not only contains functional correlations between genes,but also reflects structural correlations between genes,enabling more comprehensive information to be obtained.Experiments are conducted on a variety of cancer datasets,and the method can effectively identify genes closely related to cancer.(3)Aiming at the problem that gene mutations may affect neighboring genes,a method is proposed to identify cancer driver modules based on network function and topological information.The method uses the mutation impact function to calculate the impact function of the interaction between two mutated genes to obtain the similarity between genes.The adaptive diffusion strength index is used to quantify the degree of influence of a mutated gene on its neighbor genes to obtain the optimal characteristics between genes.This method divides driver genes with the same or similar biological functions into the same module,which more accurately reflects the functional correlation between genes.The experimental results show that the method has an advantage over similar methods in the comparison of robustness analysis and enrichment result analysis,and can more effectively predict the cancer-related driver modules.
Keywords/Search Tags:Network propagation algorithm, Network embedding, Adaptive diffusion intensity, Protein interaction network, Multi-omics data
PDF Full Text Request
Related items