Font Size: a A A

Research On Tumor Classification Algorithm Based On Gene Expression Data

Posted on:2017-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:W WuFull Text:PDF
GTID:2404330488479894Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,tumor disease has already been a serious threat to human life and health,at the same time the biological information technology develops rapidly,exploring mechanism of tumor on the profile of gene according to the gene expression data,benefit for the diagnosis and treatment of tumor.DNA Microarray can detect the dynamic expression level of tens of thousands of genes at the same time and these expression values constitute the gene expression profile data.An important characteristic of tumor microarray data is the large number of genes relative to the number of samples,but only a small number of genes are actually associated with tumor classification.Due to the redundant genes not only increase the complexity in time,but also reduce the classification accuracy,so it needs to select the information gene from the genes for the tumor classification.Aiming at this problem,the research in this paper is mainly about the new method suitable for the analysis of tumor gene expression data,and the main work summarizes as follows:It put forward a feature selection method combined the improved ReliefF and the genetic algorithm.First step was to make certain improvements to the sample selection methods in ReliefF,then weighted with the improved ReliefF on genes,and choosed the higher weight of genes.The last was using these selected genes to guide the population initialization of genetic algorithm,the aim is to improve the speed of genetic algorithm in searching for the optimal solutions,so that in a relatively short period of time to find the optimal solution.The results,from six tumor datasets,showed that the algorithm had good comprehensive performance from the aspects of the classification accuracy,sensitivity,specificity and the size of feature subset and so on.It also put forward a multi-class Support Vector Machine(SVM)algorithm based on the combination between the Hadamard Error Correction Code and the SVM(HDM-SVM).First generated the error correction coding by Hadamard code matrix,and then transformed tumor multi-classification problems to binary classification problems through the error correction coding,finally used the support vector machine(SVM)to train the binary classifiers.This algorithm,via weighting and filtering each binary classification's accuracy to retain the classifiers in good classification performance,ensured the classification accuracy.Comparing with the classical methods based on support vector machine(SVM)on six multi-classification tumor data sets,finally it has proved the effectiveness of the proposed method.
Keywords/Search Tags:Gene expression profile, Tumor classification, Genetic algorithm, Hadamard matrix, Error correction coding, Support Vector Machines
PDF Full Text Request
Related items