| With the initiation and implementation of the human genome project and other species genomics programs,human beings have accumulated a lot of data in the life sciences such as sequence,structure and function of biological macromolecules.The study of these data will not only improve the existing genetic theory system,but also affect the prevention and treatment of cancer.Based on this background,this paper studies the sparse modeling method of biological sequencing data,and focuses on the research of regression matrix decomposition method and its application in sequencing data,including:(1)According to the characteristics of gene regulation sparsity,the L2,1-norm sparse regression method is applied to biological sequence data,so that it could obtain the sparse identification results of better differential expression genes.The sparse principal component analysis algorithm is used to solve the problem that the variables produced in principal component analysis are not easily explained,so that the results are easy to explain and the key genes can be extracted conveniently.By converting the principal component analysis into a regression type problem,we add norm constraint to the regression coefficients to get sparse loads and improve the execution efficiency,so as to identify the feature genes accurately and efficiently.(2)This part presents a method based on multi-attribute sparse low rank regression,the method of linear regression methods were sparse and low rank constraints,by applying the L2,1-norm constraint,make trait genes have sparse features;By means of low rank constraint,there is a correlation between the data rows or columns,so that the matrix can be projected into a lower dimensional linear subspace.In addition,the sparse regression algorithm to add tag information,and the application of multiple attribute data to integrated analysis of cancer,low rank regression algorithm is proposed,the equivalent for the linear discrimination analysis method based on sparse regression to do classification and feature selection,so that the end result is more significant biological significance,improve the accuracy of the classification and identification of trait genes.(3)Combined with the gene regulation of prior information,the L2,1-norm and the trace norm at the same time added to the matrix changes,design a traceable sparse low rank regression algorithm and its application in biological sequence data,better data analysis,improve the accuracy of feature selection and data classification.In order to make sparse regression problems traceable,we add trace norm penalty in sparse algorithm,and use trace norm is convex problem,which has good convergence performance,and we can get the unique optimal solution.The algorithm can deal with the unknown data of the training set and directly get the low dimensional expression of the matrix.It has certain advantages for the analysis of high-dimensional data,and it can improve the accuracy of computation and classification effectively. |