Font Size: a A A

Research Of Feature Selection Of Gene Expression Data Based On Low-rank Representation With Graph Regularization

Posted on:2019-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:L L KangFull Text:PDF
GTID:2310330548961468Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Human Genome Project,DNA chip technology is widely used in various fields of life science,the resulting gene expression profiling data can describe the expression status of each gene from the micro-world,so it is successful in the field of cancer diagnosis and treatment,which opens a new door for human development and becomes a popular research direction in academia.However,gene expression data of “high dimension and small sample”characteristics will reduce the identification accuracy of cancer genes,so it is necessary to use appropriate data processing methods for effective dimensionality reduction,and the accurate and efficient selection of genes associated with the disease-causing gene subsets has become a very important work.By combining biological knowledge and machine learning theory,the feature selection of cancer genes is carried out.And the corresponding experimental results of the effectiveness of the analysis,the completion of the main work as follows:1.A feature selection algorithm of gene expression data is proposed based on low-rank scoring with graph regularization.First,it is considered that the low-rank representation(LRR)model has the ability to reveal the global information,and the local structure information of the data is equally important to reveal the data intrinsic properties,so the manifold regularization constraint is introduced to the LRR as a low-rank representation with graph regularization model.By solving this model obtains the coefficient matrix and constructs the graph weight matrix.Then this weight matrix is used to replace the similarity matrix in the Laplace score,forming a new scoring method for the feature selection of DNA expression data,which is called the low-rank scoring algorithm with graph regularization.Last,the clustering of the selection features on different gene expression databases and compared with the traditional scoring algorithm,the experimental results illustrate the high effectiveness of the algorithm.2.A feature selection algorithm of gene expression data is proposed based on smoothed low-rank with graph regularization.As for the low-rank representation model,a joint linear representation of each data by the data matrix itself as a dictionary,and using the minimized kernel norm as the convex envelope of the rank function to obtain the ideal low rank representation.However,in practical applications,its performance may deviate from the optimal solution of the original problem because of the kernel norm is not the best choice forthe convex relaxation of the rank function.Aimed at the problem that the traditional low rank representation can not describe the data structure accurately,the logarithmic determinant function is used instead of the kernel norm to estimate the rank function smoothly and added to the manifold regularization term of local geometric structure when the target function is constructed.After processing the obtained coefficient matrix and constructing the structure of the data graph.Finally,the clustering of the selection features on gene expression databases,the algorithm has a higher clustering accuracy compared with other feature selection algorithms.
Keywords/Search Tags:Gene expression databases, Feature Selection, Graph regularization, Lowrank representation
PDF Full Text Request
Related items