Font Size: a A A

Research On The Algorithm Of Gene Feature Selection Based On Classification Technology

Posted on:2017-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2334330518995352Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the research of bioinformatics has been widely conducted,and using the data analysis methods to discover the various rules becomes an important research content of bioinformatics.Based on genetic testing and data analysis technology,researchers can accurately identify different types of cancer and provide the basis for targeted clinical treatment.With the development of human genome project,the data of biological sequences grow exponentially.The traditional data analysis and research is far from the requirements of bioinformatics.The data of gene expression profile has the characteristics of small sample space,multi gene types and high redundancy,which has brought great challenges to the existing traditional data analysis methods.So the selection of gene expression is an important content of gene expression data analysis.By removing a large number of irrelevant and redundant genes,it is an effective method to solve the problem with high dimension and small sample.Based on the above analysis,this thesis analyzes and studies the data of gene expression profile,puts forward the method of gene feature selection based on classification technology,and gives the detailed experimental results.In order to improve the stability of the algorithm,the margin space is firstly established to describe the distance between the samples in the original feature space,and then the weight of each sample is calculated.On the basis of weighted sample data,the improved information measure is used as the evaluation criterion to measure the size of gene information,and the gene is sorted.The repeatability of gene information is regarded as the noise interference.Then,based on the preliminary selection model,the paper calculates the distance of the feature set,uses the floating sequence search algorithm to get the candidate feature subsets of different sizes,and combines the SVM classifier to classify the candidate feature subsets,and finally gets the information collection.In order to improve the classification performance and stability of the algorithm,the improved feature selection algorithm is proposed.In the first step,the paper combines different ranking criteria to make up for the defects of them and effectively improves the accuracy of classification.Secondly,by adding some of the deterministic prior gene,the paper uses artificial neural network to optimize the fuzzy weights to determine the combination between priori genes and the genetic information,and establishes the adaptive ability of gene selection model.Combined with four kinds of classifiers which are support vector machine,logistic regression,neural network and decision tree,the proposed model and the classical feature selection model are compared and analyzed.Through the experimental analysis,we find that the proposed model has better stability under the premise of ensuring the classification performance.
Keywords/Search Tags:gene expression profile data, feature selection, sample weight, gene classification
PDF Full Text Request
Related items