Font Size: a A A

Tumor Gene Expression Profile Data Mining Based On Machine Learning And Intelligent Optimization

Posted on:2019-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:L Y GaoFull Text:PDF
GTID:2404330542493826Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Objective: DNA microarray technology can simultaneously track the expression of many genes,and gene expression data is obtained,widely used in various fields of biomedicine.An important research direction of gene expression data is the pathological diagnosis and classification of major genetic diseases such as tumor.As a disease caused by specific gene sequence or expression variation,the appearance of tumor expression profile provides a new method of clinical research for oncology.Therefore,to provide an important basis for the early diagnosis and clinical treatment of tumors,the method of mining tumor gene expression data is studied in this paper.Methods: Tumor gene expression profiles are characterized by small sample size,high dimensionality,high noise,redundancy and nonlinearity,which bring challenges to the existing analysis tasks.Support vector machine(SVM)in machine learning has unique advantages in dealing with high-dimensional nonlinear data.Intelligent optimization algorithms generally do not require the continuity and convexity of the objective function and constraints,but also can be well adapted to the data uncertainty.Accordingly,this paper used machine learning and intelligent optimization algorithms to analyze tumor gene expression data,main contents include:(1)Based on the presence of a large number of unrelated and redundant genes in gene expression data,a two-stage hybrid method of information gain(IG)combined with SVM was proposed to select informative genes.Information gain was used to remove a large number of irrelevant genes in tumor samples,and SVM to further reduce redundant genes.(2)Intelligent optimization methods such as particle swarm optimization(PSO)and artificial bee colony(ABC)were studied,and their advantages in dealing with high-dimensional nonlinear problems were analyzed.(3)PSO algorithm combined with ABC was proposed to optimize SVM.Taking the optimal results of PSO as the initial values of ABC algorithm,optimal parameter values of SVM can be searched more effectively.Results: The proposed methods were used to multiple tumor gene expression datasets.Experimental results showed IG combined with SVM could obtain informative gene subsets with fewer number and higher quality.Analysis of obtained genes found that these genes are of great significance for tumor research,including confirmed and undiscovered genes.In addition,as for optimized classification model of SVM by combining PSO and ABC intelligent optimization methods,the results on multiple groups of tumor data show that the hybrid method has better robustness and higher classification accuracy than other optimization methods.Conclusion: The methods proposed in this paper can obtain high quality subsets of informative genes,and the classification of the tumor samples by constructed classification model could achieve better results.The effectiveness of machine learning and intelligent optimization algorithm in the treatment of tumor samples is validated,which could provide potential value in making early diagnosis and clinical treatment of tumor.
Keywords/Search Tags:tumor gene expression data, support vector machine, intelligent optimization, feature selection, classification
PDF Full Text Request
Related items