Font Size: a A A

An Improved Filter Feature Selection Method And Its Application On The Identification Of Tumor Markers

Posted on:2016-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:L H HeFull Text:PDF
GTID:2284330467997102Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, the morbidity and mortality of tumor grow rapidly. Scientists devoted to thestudy of tumor markers. Through the detection of tumor markers monitoring the occurrenceand development process of tumor, scientists could make a diagnosis as early as possibleand the tumor could be treated immediately. And numerous studies indicates that miRNAsand genes play an important role in regulating a wide range of biological process andpossess the tumor marker potential for diagnostic, therapeutic, prognostic exploration.With the development of high-throughput microarray chip technology, there are a largenumber of microarray expression data, which have few samples compared to the genes ofhigh dimensions. So developing an effective method to select informative gene subset fromhigh dimensional microarray data becomes a particularly challenging and important issuefor microarray data analysis due to their large number of features and small sample size.In the last decade, a variety of filter feature selection methods have been proposed.Usually, feature selection techniques can be divided into three categories: filter methods,wrapper methods, and embedded methods. These categories depend on the combinationalmodality of feature selection search and the construction of the classification model. Butwrapper and embedded methods have higher computational cost and higher risk ofoverfitting than filter methods, and simple filter methods generally outperform the twotypes of methods. So in this article, we focus on using the filter methods.Most methods simply grade the features according to some rules and rank the featuresbased on their scores, and then choose the top-ranked features, for example top100. Theserious weakness of these simple ranking approaches is that the selected features may becorrelated among themselves. The redundancy of the combined feature set will reduce theefficiency and broadness of the feature set for classification. So, a filter feature selectionframework which contains the step of reducing redundancy was proposed by usingminimum redundancy-maximum relevance (mRMR). But most of microarray datasets onlyhave few samples compared to the genes of high dimensions. mRMR feature selectionmethod is sensitive to small perturbations of the training data and the results have unstablesignatures. And in recent years, more and more expression datasets contain samples of cancertissues with their corresponding control tissues as paired.In this article, we propose an ensemble feature selection method based on minimumredundancy maximum relevance (mRMR) method for paired microarray data. In order toincrease the stability of the method, the improved method uses an ensemble strategy togenerate diverse subsets from the original dataset. Then, the mRMR method is used toobtain multiple feature lists on the subsets. Finally, a rank aggregation strategy is adopted todecide the final list of selected features. We apply the method on six paired microarraydatasets across different cancer types. Through comparison on the performance with otherwidely used filter methods, the proposed method obtains an excellent performance on theresults. It indicates that the improved method is effective and has a good applicability offeature selection for paired microarray expression data analysis. And recently, numerousstudies indicates that miRNAs are involved in tumorigenesis and development of manycancers, act as either oncogenes or tumor suppressors, and possess the tumor markerpotential for diagnostic, therapeutic, prognostic exploration. To explore carcinogenicmechanism, it is critical to discover the aberrantly expressed miRNAs and their target genes.In this article, the improved feature selection method is used to improve multiply-stepapproach to identify the target relationship miRNA-gene pairs. In addition, a relevant webtool for the analysis of the aberrantly features and the target predictions based on miRNAand gene expression data has been proposed.
Keywords/Search Tags:Feature Selection, Paired Microarray Expression Data, MiRNA, Target Gene
PDF Full Text Request
Related items