Font Size: a A A

An Integrated Prediction Method For Cancer Classification Based On Gene Expression Data And MiRNA Expression Profiles

Posted on:2019-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z ShiFull Text:PDF
GTID:2404330545473862Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The increasing incidence and mortality in recent years,cancer has got the attention as it has become the leading cause of human death.It is possible to reduce mortality and improve the cure rate by its early and accurate detection for better treatment.However,the traditional diagnosis technology based on morphology has great subjectivity,and it often fails to detect the carcinogenesis in the early stage which leads to a large number of missed diagnosis and misdiagnosis.With the rapid development of micro array technology,people are able to use expression data to excavate genes and miRNA by which it is possible to differentiate cancer samples from the normal.This paper,for the first time,proposed an integrated prediction method for cancer classification based on gene expression data and miRNA expression profiles.First,at the data level,the gene and miRNA expression profiles are merged together,to help obtain more classification information.Then,at the feature selection level,hybrid feature selection algorithm is proposed to extract features through different levels and scales,this strategy can remove redundant features and can realize feature selection.Thirdly,at the prediction model level,multi-sampling and multi-algorithm prediction model is proposed to find the best training set and the best model through competition of the algorithms.Finally,an independent testing set is used to evaluate the selected prediction model,and the performance of this integrated prediction method is evaluated by ten fold cross validation.The ensemble of multi-category multi-state information for three levels(datasets,feture selection and prediction)constitues the framework of the proposed integrated forecasting method.We study the Breast,LUAD,LUSC from TCGA database,when only 10 co-expression features are selected,the classification accuracy of 99.23%,99.43%and 99.61%is achieved by 10 fold cross validation.The experimental results show that using co-expression data is able to improve classification accuracy compared to using just one kind of data.We also find that despite of different feature subset in every fold cross,there are some features appear several times,those features are highly likely to cause cancer.In addition,in the final co-expression data,the number of genes and miRNA is roughly the same,so it is reasonable to believe that miRNA and genes play an important role in biological development.
Keywords/Search Tags:Cancer classification, TCGA database, An integrated prediction method, Genes expression data, MiRNA expression data, Hybrid feature selection algorithm, Multi-sampling and multi-algorithm prediction model
PDF Full Text Request
Related items