Font Size: a A A

An Research On Feature Selection Of Tumor Markers Based On Microarray Data

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:P YuFull Text:PDF
GTID:2404330605954308Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of sequencing technology,the research and application of DNA sequencing technology in disease prediction and gene diagnosis are becoming more and more common.In view of the current high incidence of cancer diseases,the introduction of DNA sequencing technology to help researchers identify cancer markers is of great help in improving the cure rate of cancer patients.However,the gene expression profile data(microarray data)obtained by DNA sequencing technique has the characteristics of high dimensionality,small sample size,uneven distribution and high noise.And noise genes will reduce the accuracy of cancer classification,making it difficult for researchers to analyze the data in a short time.Therefore,it is necessary to find an effective method to preprocess the gene expression profile data.Feature selection as an efficient data preprocessing method has become a hotspot in the field of biological information and has been widely used in biological data processing.Currently,some feature selection methods have been successfully applied to cancer data analysis.Among them,the feature selection method based on Wrapper has obtained high classification accuracy in the process of searching for cancer markers,which has attracted the attention of many researchers.The wrapper method mainly depends on the choice of search strategy.Using different search strategies to process data will get different results.According to the characteristics of microarray data,this paper proposes two feature selection algorithms for identifying cancer markers.The main research results are as follows:(1)To solve the problem of dimensional disaster caused by cancer microarray data,this dissertation combines the advantages of filter method and wrapper approach proposes a hybrid algorithm(IGICRO)on a single chemical reaction optimization algorithm(CRO),which was applied to high-dimensional cancer microarray data set.The purpose of the proposed algorithm is to improve the classification accuracy and convergence speed of a single CRO algorithm and finds genes that are more relevant to cancer.The hybrid method(IGICRO)firstly uses the information gain(IG)method to reduce the dimension of the dataset,then adds the neighborhood search mechanism in the process of updating the solution to improve the local search performance of CRO,and improves the collision process of four operators of CRO.Experimental results show that the number of feature subsets selected by IGICRO algorithm is small,and the classification accuracy of IGICRO algorithm is higher than other comparison algorithms.(2)The lung cancer data obtained by DNA sequencing technology has the characteristics of high-dimensional and small samples.In order to quickly remove the irrelevant features and identify the genes related to lung cancer,an improved hybrid harmony search algorithm(MHS)was proposed based on the standard harmony search algorithm.MHS method uses multiple filter methods to screen the data and removes the noise gene.Then two local operators(single molecule collision and multi-molecule collision)are added to improve the local performance of the harmony search algorithm.Experimental results show that the MHS algorithm combined with KNN classifier on the lung cancer microarray data set can not only effectively remove irrelevant genes,but also obtain better classification accuracy than the comparison algorithm.The experimental comparison verifies that the MHS algorithm has a better performance in lung cancer data processing.
Keywords/Search Tags:Feature Selection, Chemical Reaction Optimization Algorithm, Harmony Search Algorithm, Information Gain, High Dimensional Cancer Data
PDF Full Text Request
Related items