Font Size: a A A

Researches On Optimized Characteristic Gene Selection Based On Neighborhood Mutual Information

Posted on:2019-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:D X XieFull Text:PDF
GTID:2370330623451016Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
DNA microarray(gene chip technology)is a major technological breakthrough in the field of molecular biology at the last century.It can simultaneously test thousands of genes in cells in a single experiment,so the original studies of individual genes go into the genomics era.The analysis and mining of gene expression profile data from microarray experiments can help people understand the cell growth and gene expression in different periods,measure changesin normal tissue and tumor tissue,measure changes before and after treatment,discovery drug,diagnose genetic disease,forecast disease,and figure out the mysteries of human biology.It is important for biological and biomedical research and development,and is current ly one of the key hotspots and bioinformatics research.However,gene expression data have the characters of high-dimension,small samples,high noise,high redundancy and continuous,etc.It is a challenge for the traditional method of data mining.On the basis of combing,analyzing and summarizing the existing data mining method,this paper research the feature gene selection and data classification.The main contents and results are as follows:Proposed a hybrid method of features gene selection based on optimal neighborhood mutual information.Firstly,all genes are sorted by using Relief F algorithm,and the first k genes are selected as the primary subset of genes,so that eliminate noise and other inactive genes,reduct dimensional and improve the quality of data.Secondly,according to the influence of the neighborhood radius on the performance of neighborhood mutual information model,the differential evolution algorithm is used to optimize the neighborhood information radius.Finally,An improved neighbor mutual information model was designed to achieve the final gene selection based on the neighborhood information of the optimal radius and the forward greedy search strategy,it eliminated the noise and redundancy genes.The simulation results show that the proposed method is superior to Relief F,Kruskalwallis,Gini Index,MI and NMI in the recognition accura cy and the number of genes.
Keywords/Search Tags:Gene expression profiling, Data mining, Feature gene, Ensemble classificat ion
PDF Full Text Request
Related items