Font Size: a A A

Signature Genes Identification Of Cancer Occurrence And Pattern Recognition

Posted on:2019-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:J X WenFull Text:PDF
GTID:2370330593950351Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
At present,cancer is a major disease that seriously endangers the health of human beings.Therefore,studying of the pathogenesis of cancer has become one of the hotspots of current research.With the development of high-throughput sequencing technology,researchers can use gene chip technology to obtain gene expression data for a variety of cancers,and analyze the impact of gene expression on the occurrence and development of cancer from the human genome.However,the expression data obtained by gene chip technology have the characteristics of small sample size and large feature dimension,which brings difficulties for people's subsequent research.Therefore,it is of great theoretical and clinical value to use excellent feature selection algorithm to identify the key genes that lead to cancer occurrence,which is beneficial to early intervention,diagnosis and treatment.In this paper,we establish a signature genes recognition methods for early cancer based on the Cancer Genome Atlas Database(TCGA)transcript data set.This method can screen a small number of signature genes at the stage of cancer occurrence and maintain a high recognition accuracy.The main work consists of three parts:In the first part,we select the breast cancer samples from the TCGA data set as the research set,and propose the selection method of the signature genes of breast cancer.Through support vector machine,random forest and other machine learning methods,the prediction accuracy can reach more than 98%,which is higher compared with the previous study.The KEGG(Kyoto Encyclopedia of Genes and Genomes)pathway analysis shows that there are eight pathways associated with genes significant correlation(P < 0.05).A functional analysis of the part of the eight pathways shows theirs close relationships at the level of gene regulation,which indicates the identified signature genes play an important role in the pathogenesis of breast cancer and is very important for understanding the pathogenesis of breast cancer and the early diagnosis of breast cancer.In the second part,we apply the research methods of breast cancer to a variety of other cancers in the TCGA database.To identify signature genes for the pathogenesis of cancer,which provides a theoretical support for the early stage of cancer research and diagnosis.The pattern recognition method was used to analysis the genome-wide gene expression data which was collected from the TCGA database.For the transcription of breast invasive carcinoma,lung adenocarcinoma,lung squamous cell carcinoma,colon adenocarcinoma,kidney renal clear cell carcinoma,thyroid carcinoma,liver hepatocellular carcinoma of the seven cancers,the accuracy can be as high as 98% for the TCGA data and as high as 92% for the GEO(Gene Expression Omnibus)independent data,the recognition accuracy of stage I is more than 95%,which is higher compared with the previous study.The common genes emerging in five cancers were obtained from the signature genes of seven cancers: PID1 and SPTBN2.At the same time,we obtain three common pathways from the KEGG pathway analysis,indicating the close relationship in the occurrence and development of cancer.The screened high reliability and small significant genes are of great value for early diagnosis of cancer.In the third part,a cancer significant gene screening and pattern recognition software is generated according to the screening of significant genes.With the inclusion of this paper's method,a variety of machine learning modeling and prediction functions have been added to form an integrated bioinformation mining software.It can screen and analyze all cancer data of TCGA database,providing convenience for future analysis of the pathogenesis of other cancers and the interconnection of multiple cancers.Based on the seven cancers samples in the TCGA database,we establish a method of signature gene recognition for single and multiple cancers in this paper.The results show that the method of signature gene screening is effective.The screened high reliability and small significant genes can effectively distinguish between normal and early cancer samples,which is of great value to the mechanism and early diagnosis of cancer.
Keywords/Search Tags:Signature genes selection, Gene expression, Pattern recognition, TCGA data, Early cancer
PDF Full Text Request
Related items