Font Size: a A A

Research On Cancer Classification Prediction Model Based On Gene Expression Profile And Protein Interaction Network

Posted on:2019-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:X F ZhangFull Text:PDF
GTID:2404330545973989Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cancer is a clonal abnormality of certain cells in some local tissues caused by various carcinogenic factors,and the lesions caused by loss of normal regulation of its growth at the gene level.There are many types of cancer,and the pathogenic mechanism is complex and difficult to cure.In recent years,it has seriously endangered the lives and health of human beings.Diagnosis and treatment as soon as possible can help save the lives of patients.Because the diagnosis of traditional cancer is often based on pathology,histology,and immunology,it depends largely on experience and is prone to misdiagnosis.In recent years,with the rapid development of biological microarray technology,a large amount of gene expression data has been generated.Molecular biology studies have shown that many factors,including gene mutations,loss of function of tumor suppressor genes,and activation of proto-oncogenes,are closely related to the development and progression of cancer.It is with the help of these massive gene expression profiles that the identification of many cancer gene sequences provides very useful information for the diagnosis and treatment of cancer.In cancer classification research,based on gene expression profile data and protein interaction network,this paper proposes a new cancer classification prediction integration model.First,the mutual information method and Jaccard's similarity coefficient were used to perform correlation analysis and similarity analysis on the gene expression profile data and protein interaction network data respectively,and feature selection was performed by simultaneously maximizing the correlation and similarity of the selected feature gene set.Then based on cross-validation method,using Bootstrap method for diversity sampling,using the SVM,KNN and RF algorithm based on the selected feature gene set in the previous training to obtain a plurality of more differentiated classification models,and based on these models.Vote and get the final classification result.This paper applies this method to datasets of four different types of cancer in the GEO database and explores the selection of datasets,the construction of feature extraction methods,the diversity sampling,and the construction of classification integration models through experiments and comparative studies.The data of four cancer gene expression profiles in the GEO database,including acute myeloid leukemia,breast cancer,colon cancer,and non-smoking lung cancer data sets in Taiwan,were studied experimentally.This article compares experiments from two aspects of feature selection and classification model.The experimental results show that compared with the feature gene extraction method that integrates gene expression profile data and protein interaction network data,the feature extraction method using only gene expression profile data The accuracy is about 5% higher.In terms of classification model,the classification accuracy of the classification integration model presented in this paper is higher than that of other classification models on each dataset.The experimental results show that in the selection of data sets,the gene's biological significance can be better revealed through the use of differential expression of genes in different cancers,combined with complex associations between genes and genes;in the construction of feature extraction methods,Combining proteinprotein interaction networks,using genes associated with the same cancer share common functional characteristics,overcoming the limitations of using only statistical methods for differential expression analysis;in the sampling process,multiple sampling of the training set overcomes The single-class learning algorithm over-fits the problem and enhances the generalization ability.In the construction of the classification-integrated model,the advantages of multiple algorithms are complemented to solve the problem that the application scope of a single classification algorithm is limited.
Keywords/Search Tags:cancer classification, feature selection, mutual information, machine learning, multi-algorithm & multi-model
PDF Full Text Request
Related items