Font Size: a A A

Processing And Analysis Of Gene Expression Data Based On Machine Learning Algorithm

Posted on:2019-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:X HanFull Text:PDF
GTID:2404330623962413Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of gene microarray technology,the research on gene expression profiling data has gradually become a research hotspot in bioinformatics.These studies provide new ideas and ways for us to realize and understand life phenomena.In recent years,machine learning has received extensive attention from bioinformatics researchers with its outstanding performance in pattern recognition and data mining.At present,the analysis of gene expression profiling data has been widely used in disease prediction,diagnosis and targeted therapy.In this paper,we use machine learning algorithm to solve the common problems of gene expression profiles data analysis: missing value imputing,gene activity state clustering and tumor cell classification,and prove the feasibility of the program through multiple experiments on different data sets.Specifically,the main research content of this paper is as follows:(1)In actual gene microarray experiments,various subjective and objective factors usually result in more or less missing values in the gene expression profile data produced by the experiment.In this paper,we propose a method,which integrates several traditional missing value padding algorithms(such as K-nearest neighbor padding,least squares padding,etc.)based on integrated learning and imputes the missing values in the data set,and achieves accurate result.(2)In order to analyze the function of genes and the differential expression of genes under different conditions based on gene expression profile data,researchers usually cluster the gene expression profile data.In this paper,we propose a probabilistic model based gene activity states clustering method,which combines K-means clustering and Gaussian Mixture Model to describe the distribution of the data and achieves better clustering effect.(3)Disease classification and prediction based on gene expression data has always been a research hotspot in bioinformatics,and this problem becomes very difficult when the data sets have missing values or the data are not standardized.In this paper,combining with the proposed missing value filling algorithm and gene activity state clustering model,we classify the tumor samples using several classical classifiers(such as K nearest neighbors,support vector machine,etc.)and a high classification accuracy is obtained.The experimental results show that the proposed gene expression profile data missing value imputing algorithm and gene activity states clustering model have good performance and feasibility,and the tumor classification experiment proves that our method plays a positive role in practical application.
Keywords/Search Tags:Gene Expression Profile Data, Machine Learning, Missing Value Imputation, Activity State Clustering, Tumor Classification
PDF Full Text Request
Related items