Font Size: a A A

The Research Of The Theory Of Matrix Decomposition In Gene Expression Profiling

Posted on:2013-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:L L SuFull Text:PDF
GTID:2234330371497838Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
To identify tumor types is still a difficult point and hot issue of present biology and medicine field. Traditional methods are that medical staff with rich clinical experience diagnose tumor type through observing the features of patient’ pathological tissue and the corresponding tumor treatment plan is implemented. However, the shortcomings of these methods are that many decisions have a high subjective content and the lagging of tumor treatment, i.e. the tumors are often found in middle and advanced stage. Therefore, how to avoid these shortcomings has became a research hotspot in the fields of biology and medicine. With the development of DNA microarray technology in recent years, gene expression profiling which are gene expression levels in different conditions is used to diagnose and predict whether the tumor happened and tumor types. Finally, we can observe the mechanism of tumors’ occurrence and development at the molecular level and may find out unknown tumor types. Meanwhile, the successfully prediction of tumor types and therapeutic targets are obtained through analyzing the correspongding pathogenic genes.In this thesis, tumor gene expression profiling was rsearched based on Matrix Decomposition Theory, the latest and clssical matrix decomposition methods were introduced into analyzing these datas, let the numerical sequence without structure information transforms into structure graph with structure information. The important researchs are how to extract the features of tumor gene expression profiling and how to identify tumor types, while explaining the experiments’ results and analyzing the performance of the proposed algorithms. Main content as follows:1. The classical matrix decomposition methods were used to analyzing tumor gene expression profiling. Firstly, tumor samples were seen as points in high-dimension space and then structuring graph by using various weights (the grapgh can be described as matrix form). Next the feature information of each sample can be obtained by these matrix decomposition methods. Finally, Several published datasets of tumor gene expression profiling were classified by Support Vector Machine (SVM) classifier and K-Nearest Neighbors (KNN) classifier, and thse experiment results show a good performace to this method.2. Utilizing Non-negative Matrix Factorization (NMF) method to extract the feature information of tumor gene expression profiling according to the relatively new non-negative matrix theory. In the beginning, these noise genes in tumor gene expression profiling were roughly eliminated and a gene subset was then obtained. And the gene subset data was processed in NMF method. In this way, all the sample of tumor with high-dimension were mapped into a low-dimension space, further eliminate these noise and redundancy of tumor gene expression profiling. The clustering experiment was completed by using Fuzzy C-Mean (FCM) algorithm in the end, and these results show the validity of this method.。3. In view of many traditional Scoring Criterias, which contain the mean and variance information of gene expression level, are sensitive to outliers that may are producted by these factors from the environment, equipment and artificial operation etc.. Therefore, these Scoring Criterias might lead to a wrong gene scoring according to its importance of the classification and selecting a unreasonable feature gene subset, which can’t objectively reflect the feature of tumor samples. So this thesis presents a novel method that integrates the Algebraic Connectivity Strength of Point (ACSP) and Scoring Criteria to identify genes associated with tumor type. First, for each gene, the ACSP is used to identify reliable expression levels of the gene in all the samples. The informative genes are then selected using Scoring Criteria based on these reliable expression levels. Finally, the SVM classifier is used to classify the two datasets of gene expression profiling. The results show that the informative genes selected by the proposed method have higher credibility than those selected by Scoring Criteria alone...
Keywords/Search Tags:Tumor, Gene expression profiling, LU decomposition, Non-negativeMatrix factorization, The Algebraic Connectivity Strength of Point
PDF Full Text Request
Related items