Font Size: a A A

Research Of Feature Selection For Tumor Gene Expression Data

Posted on:2019-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2404330548467874Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of DNA sequencing technology,researchers can measure massive gene expression data in various tissue samples,which provides technical support for the study of tumor pathogenesis from the molecular level.As one of the main research aspects of data mining technology,medical data mining is also a research hotspot of bioinformatics.The mining technology based on gene expression data is of great significance for finding pathogenicity,predicting protein function,and disease diagnosis and prediction.Due to the inherent characteristics of genes and the limitations of DNA sequencing technology,the data are characterized by high dimensionality,small sample size,and high noise.Therefore,traditional statistical methods and pattern recognition methods are difficult to apply to gene expression data mining tasks directly.This dissertation focuses on the characteristics of gene expression data,and uses the method of selection of characteristic genes as the main research direction,and its main contributions include the following points:(1)In order to solve the problem that the ant colony optimization algorithm is slow in convergence and easy to fall into local optimum in the search process,an improved pheromone update strategy and a state transition rule are proposed.The positive feedback coefficient and evaporation factor are added to the pheromone renewal strategy.If the quality of the feature subset obtained by ants does not increase within several generations,the pheromone evaporation factor will be adaptively adjusted to accelerate the evaporation of pheromone;On the other hand,the pheromone feedback coefficient is also adaptively adjusted to reduce the positive feedback effect and improve the global search ability of the ant colony algorithm.Combining the random strategy and the greedy strategy as the state transition rule improves the search performance of the ant colony and avoids falling into a local optimal situation.(2)A feature selection method based on random forest and ant colony algorithm is proposed to improve the accuracy of the classification algorithm.By combining different algorithms in data mining,this method selects high-resolution feature subsets in higher-dimensional data sets.The algorithm computes heuristic information by using a low-cost feature evaluation method,accelerates the search of candidate feature subsets by adopting an adaptive pheromone updating strategy,and uses a sequential forward selection strategy to construct a global optimum from the candidate subsets.The experimental results show that the proposed method can eliminate redundant and extraneous features effectively,and improve the efficiency of the classifier.(3)Aimed at the problem of a large number of unrelated genes,redundant genes and noise genes in gene expression data,a feature selection method combining filter method andant colony algorithm was proposed.The method weeds out the genes with less classification information by the ReliefF algorithm,then inputs the candidate gene subsets into the ant colony algorithm,and selects the optimal gene subset in the process of iterative improvement.The classification experiments on tumor gene expression data show that the proposed method can get a better classification results though by selecting fewer genes.
Keywords/Search Tags:Gene Expression Data, Feature Selection, Ant Colony Optimization, Random Forest, Relief F Algorithm
PDF Full Text Request
Related items