Font Size: a A A

Research On Screening Method Of Cancer Driver Gene Sets Based On High-throughput Sequencing Data

Posted on:2019-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:L PanFull Text:PDF
GTID:2404330566984716Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Numerous studies have shown that gene mutations are the underlying cause of cancer.Therefore,how to screen out the carcinogenic driver genes from the massive mutation information is a very important research topic.In order to search for driver genes,the popular method is to screen genes that are significantly high-frequency mutated in sequencing data.However,cancer-associated mutations have significant heterogeneity,which leads to differences of driver genes in patients with the same type of cancer.Therefore,it is difficult to screen out driver genes with low frequency mutations by the popular method.The study find that although the driver genes are heterogeneous,the set of driver genes in the same pathway has high coverage of the samples.In addition,the genes in the driver gene set also have distinct exclusivity.That is,for individual patients,at most one gene in the set has been mutated.Based on the coverage and exclusivity,we propose a screening method for driver gene sets based on high-throughput sequencing data.It mainly includes three aspects of work:(1)Processing the sequencing data to obtain the mutation matrix which can be used in the proposed algorithm model.We integrate the detection process of cancer candidate mutations,including sequencing data quality control,comparisons,and mutation detection.To overcome the problems of different programming tools,such as different programming languages,incompatibility and complex operations,we form an integrated,efficient and convenient mutation detection system finally.In order to facilitate the construction of integer programming model and provide data for subsequent analysis,we convert the results of mutation detection system to a binary mutation matrix.(2)Based on the two characteristics of the driver gene set in cancer,i.e.,high coverage and exclusivity,we introduce the maximum weight submatrix model.On this basis,we consider the impact of gene mutation heterogeneity on mutation frequency and propose an improved adaptive model by adding the gene covariate to the original model.In order to solve objective function of the model,we use ant colony algorithm to overcome the local optimum problem effectively.(3)The method proposed in this paper is applied to two types of cancer data,i.e.,lung adenocarcinoma and glioblastoma multiforme.Compare our method with three existing algorithms from two aspects: we explain the gene set from the statistical significance of the accuracy,coverage and exclusivity.On the other hand,we explain the interaction of driver genes in a driver gene set from the biological significance.The experiment results show that we find more driver genes that have high coverage and high exclusivity.Furthermore,the relevant medical literatures show that these driver genes play important roles in the development of cancer.These conclusions prove the effectiveness of the proposed method.The proposed method achieves self-adaptive processing on weights of coverage and exclusive for each driver gene set.In addition,the method improves the accuracy of the algorithm,and has an auxiliary role in further understanding of the mechanism of cancer and clinical targeted treatment.
Keywords/Search Tags:Driver Gene set, Adaptive, Gene Mutation Detection, Coverage, Exclusivity
PDF Full Text Request
Related items