Research Of Feature Selection For Tumor Gene Expression Data

Posted on:2019-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Li

Full Text:PDF

GTID:2404330548467874

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of DNA sequencing technology,researchers can measure massive gene expression data in various tissue samples,which provides technical support for the study of tumor pathogenesis from the molecular level.As one of the main research aspects of data mining technology,medical data mining is also a research hotspot of bioinformatics.The mining technology based on gene expression data is of great significance for finding pathogenicity,predicting protein function,and disease diagnosis and prediction.Due to the inherent characteristics of genes and the limitations of DNA sequencing technology,the data are characterized by high dimensionality,small sample size,and high noise.Therefore,traditional statistical methods and pattern recognition methods are difficult to apply to gene expression data mining tasks directly.This dissertation focuses on the characteristics of gene expression data,and uses the method of selection of characteristic genes as the main research direction,and its main contributions include the following points:(1)In order to solve the problem that the ant colony optimization algorithm is slow in convergence and easy to fall into local optimum in the search process,an improved pheromone update strategy and a state transition rule are proposed.The positive feedback coefficient and evaporation factor are added to the pheromone renewal strategy.If the quality of the feature subset obtained by ants does not increase within several generations,the pheromone evaporation factor will be adaptively adjusted to accelerate the evaporation of pheromone;On the other hand,the pheromone feedback coefficient is also adaptively adjusted to reduce the positive feedback effect and improve the global search ability of the ant colony algorithm.Combining the random strategy and the greedy strategy as the state transition rule improves the search performance of the ant colony and avoids falling into a local optimal situation.(2)A feature selection method based on random forest and ant colony algorithm is proposed to improve the accuracy of the classification algorithm.By combining different algorithms in data mining,this method selects high-resolution feature subsets in higher-dimensional data sets.The algorithm computes heuristic information by using a low-cost feature evaluation method,accelerates the search of candidate feature subsets by adopting an adaptive pheromone updating strategy,and uses a sequential forward selection strategy to construct a global optimum from the candidate subsets.The experimental results show that the proposed method can eliminate redundant and extraneous features effectively,and improve the efficiency of the classifier.(3)Aimed at the problem of a large number of unrelated genes,redundant genes and noise genes in gene expression data,a feature selection method combining filter method andant colony algorithm was proposed.The method weeds out the genes with less classification information by the ReliefF algorithm,then inputs the candidate gene subsets into the ant colony algorithm,and selects the optimal gene subset in the process of iterative improvement.The classification experiments on tumor gene expression data show that the proposed method can get a better classification results though by selecting fewer genes.

Keywords/Search Tags:

Gene Expression Data, Feature Selection, Ant Colony Optimization, Random Forest, Relief F Algorithm

PDF Full Text Request

Related items

1	Research On Feature Selection Algorithm Based On Tumor Gene Expression Data
2	Study On Feature Selection Of Tumor Gene Expression Profiles
3	Selection Of Tb Susceptible Genes Based On Improved Random Forest Algorithm
4	Gastric Cancer Characteristic Gene Selection And Survival Analysis Based On Gene Expression Data
5	The Analysis Of Tumor Gene Expression Profile Data Based On Hybrid Feature Selection Algorithm
6	An Research On Feature Selection Of Tumor Markers Based On Microarray Data
7	Analysis Of Cancer Gene Data Base On Random Forest And Support Vector Machine
8	A Study On Feature Selection For Cancer Detection Based On Biological Expression Data
9	Research On Cancer Feature Gene Selection Based On Microarray Data
10	Research On Feature Selection Algorithm Based On Breast Cancer Gene Expression Data