Font Size: a A A

Design And Implementation Of Cancer Classification Algorithm Based On Pathogenicity Of Somatic Mutations

Posted on:2020-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:P DuFull Text:PDF
GTID:2404330599459592Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The development of gene sequencing technology has provided a lot of samples for bioinformatics.It has important guiding significance for the evolution of biological species and genetic research,as well as the detection and diagnosis of clinical diseases.To reduce the impact of cancer to human health,a considerable amount of research work have been devoted to the diagnosis and treatment of cancer,such as personalized tumor medicine and targeted tumor therapy,etc.Among them,somatic point mutation based cancer classification(SMCC)is an important research direction.The rapid increase of DNA sequencing data has greatly promoted the development of SMCC,but issues like high data sparsity,small sample size and poor classification performance still remain.This paper proposes a deep neural network-based cancer classification model to explore the complex relationship of gene mutations and cancer at the level of pathogenicity of mutations by combining annotation information of gene mutation,it also explores the risk and mutation patterns within different cancer types.In this paper,we collected the somatic point mutation data of 3180 patients of 12 cancer types from TCGA database,combined with the annotation tools to score the pathogenicity of mutations,and counted the genes related to these cancers and their deleteriousness at the genetic level.Due to the high dimensionality and sparsity of the original gene-based data,the accuracy of the classification is low,in this paper,TCGA data are analyzed and counted at the level of somatic point mutation.Combined with gene grouping filtering based on gene mutation frequency and sample similarity and data dimensionality reduction based on pathogenicity prediction of somatic mutation,a classification model with high accuracy is proposed.The result shows that the accuracy of the cancer classification model based on deep neural network and pathogenicity prediction of somatic mutations achieves 19% and 30.8% improvement over the SVM and KNN models based on somatic mutation,respectively and 9% improvement over state-of-the-art DeepGene method based on DNN.At the same time,the model reveals the relationship between somatic mutation information and cancer type,and explores the pathogenic genes related to cancer types.The method proposed in this paper provides a research idea for the pathogenicity research of cancer.
Keywords/Search Tags:Cancer classification, Pathogenicity prediction, Deep neural network
PDF Full Text Request
Related items