Research On Analysis Method Of Tumor Gene Expression Data Based On Machine Learning

Posted on:2019-02-25

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Liu

Full Text:PDF

GTID:1364330596956048

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

With the development of various sequencing technologies,a large amount of gene expression data has been generated,and the use of traditional biological methods for gene expression data analysis has been unable to meet social needs.In recent years,researchers have introduced machine learning theories and methods into the field of bioinformatics,and through the comprehensive analysis of gene expression data to discover important information contained in biology.Aiming at the characteristics of tumor gene expression data and taking machine learning as the starting point,this paper proposed a series of data analysis algorithms for tumor gene expression data through the research and exploration of such issues as characteristic gene selection,tumor sample classification and tumor clustering.The main research contents are as follows:1.Tumor characteristic gene selection based on deep learning and matrix decomposition.Firstly,in view of the inability of the deep learning model to select tumor characteristic genes,we propose sample learning based deep sparse filtering method for tumor characteristic gene selection.Secondly,based on the optimal mean algorithm and the block optimization theory,we propose the optimal mean-based block robust characteristic gene selection method to analyze the integrate data in TCGA.Finally,the class label information is added into the unsupervised algorithm by using the scatter matrix,supervised penalty matrix decomposition algorithm is proposed for characteristic gene selection.2.Tumor sample classification based on sample expansion and deep learning.Aiming at the problem that the training samples are seriously insufficient when using deep learning model to implement tumor sample classification,the sample expansion method based on denoising autoencoder is proposed to obtain a large number of auxiliary samples.Furthermore,by combining the sample expansion method with two deep learning models,a sample expansion-based stack autoencoder model and a sample expansion based 1D convolutional neural networks model are designed for tumor sample classification.3.Tumor samples clustering based on low-rank subspace segmentation.In order to cluster the tumor gene expression data,the traditional subspace segmentation method needs to rely on the spectral clustering method.To deal with this problem,based on the discrete constraints to directly learn the sample labels of the subspace,two low-rank subspace tumor sample clustering methods are proposed.Firstly,considering the manifold structure inside the tumor gene expression data,we propose a low-rank subspace clustering algorithm based on discrete constraint and hypergraph regularization.Secondly,in order to eliminate the influence of outliers in tumor data,a robust low-rank subspace clustering algorithm based on discrete constraint and capped norm is proposed to improve the robustness of the algorithm.4.Biclustering of tumor data based on dual hypergraph regularization principal component analysis.Considering the sample manifold structure and gene manifold structure in the tumor data simultaneously,the sample hypergraph and gene hypergraph are constructed respectively to obtain the local geometric information of the data,and the dual hypergraph is used as the regularizer of principal component analysis for sample clustering and gene clustering.Then we propose a dual hypergraph regularization principal component analysis algorithm to biclustering the tumor gene expression data.Experimental results on multiple tumor gene expression datasets verify the effectiveness and superiority of the proposed algorithm.

Keywords/Search Tags:

Gene expression data, machine learning, characteristic gene selection, tumor sample classification, tumor sample clustering, biclustering

PDF Full Text Request

Related items

1	Processing And Analysis Of Gene Expression Data Based On Machine Learning Algorithm
2	Cancer Classification Methods Based On Gene Expression Data
3	Tumor Gene Expression Profile Data Mining Based On Machine Learning And Intelligent Optimization
4	Research On Machine Learning Method Of High Dimensional Small Sample (Medical) Data
5	Research On The Algorithm Of Gene Feature Selection Based On Classification Technology
6	Research On Characteristic Gene Selection And Cancer Classification Clustering Algorithm For Gene Expression Dat
7	Study On Informative Gene Selection And Classification Of Tumor
8	Research On Classification Algorithms For Tumor Gene Expression Data Based On ELM
9	Classification Of Cancer Subtypes Based On Gene Expression Data
10	A Study Of Tumor Classification Method Using Bayes Na(?)ve Classifier Based On The Maximum Relevance Minimum Redundancy Feature Selection Method