Font Size: a A A

The Research On The Clustering Functional Evaluation Algorithm And The Discriminant Analysis Algorithm For Gene Chip Dataset

Posted on:2010-12-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:F Z WuFull Text:PDF
GTID:1100360278476329Subject:Electronic biotechnology equipment
Abstract/Summary:PDF Full Text Request
When the Human Genome Project (HGP) marked the end, humanity had entered a post-genome era, in which human focus on gene function research. Gene Chip, also called DNA microarray, with characteristics of fast, high-throughput, high accuracy, has become an important and indispensable tool for studies of gene function. Data analysis is an important aspect of gene chip technology. It belongs to the area of bioinformatics research. In the dissertation, mainly focus on two questions: the clustering functional evaluation and discriminant analysis to gene chip dataset.The cluster analysis is an important approach in gene chip dataset analysis. The purpose of the analysis is to divide genes into groups based on gene expression patterns, and then to predict genes function using these groups. However, due to the clustering results are usually influenced by the clustering algorithm and/or its parameters. Clustering with different clustering algorithms or parameters often produces extremely diverse clustering results. How to evaluate these clustering results, especially from the perspective of biological function similarity, is a challenge in the cluster analysis of gene chip dataset. In Chapters IV and V, aiming directly at the question, study clustering functional evaluation algorithm. Develop a new approach to measure gene annotation semantic similarity. The algorithm based on Gene Ontology (GO) term locations measures the gene function similarity. Taken yeast metabolic pathway isoleucine and glutamic acid biosynthesis pathway as examples to show the accuracy of this algorithm. Based on the algorithm, raise a novel clustering functional evaluation to measure the quality of clustering results. This algorithm assesses the clustering quality using both differential degree between gene functions in separate clusters and similar degree between gene functions in the same cluster. Taking yeast expression data as an example, the results show that the method can accurately evaluate the quality of clustering results. Under the guidance of the evaluation approach, the higher-quality clustering results can be obtained.The discriminant analysis to DNA microarray data is also an important content. It needs to be done for gene chip to be used in clinical diagnosis. China is a liver cancer-prone country. MicroRNA chip dataset and gene chip dataset all can be used to predict the metastasis of Hepatocellular carcinoma (HCC). The microRNA can regulate expression of corresponding target genes. Whether or not there is the regulation relationship between metastasis-related microRNAs (feature microRNAs) and genes (feature genes) in the HCC? Taking the problem as starting point in Chapter VI, study the identification of metastasis-related microRNAs and genes, and analyze their relationship. A novel approach, called t-cross-weight, was developed. The approach calculated weight for each gene through repeatly random sampling t-test. The advantage of t-cross-weight is that, according to rank of weight, can gradually broaden the set of feature microRNAs or feature genes, and use support vector machines (SVMs) with differential kernel function, under the guidance of k-cross-validation tendency to identify appropriate the set of feature microRNAs and of feature genes. The results suggest that 100 microRNAs and 710 genes were identified. According to the expression of the 100 feature microRNAs, employing the SVMs with polynomial kernel function, the accuracy rate of predicting metastasis of HCC is greater than 83.99%; and using linear kernel SVMs with the expression of the 710 feature genes, the accuracy rate is over 96.76%, which indicats significant prediction accuracy. Taking further analysis to these feature microRNAs and genes, found the existence of regulation relationship, which suggests that the metastasis of HCC may be associated with some feature microRNAs regulating some feature genes. Enrichment analysis to these feature genes with DAVID, an online tool, shows that the feature genes enriched in cell cycle pathway (p=0.0006), indicating the cell cycle pathway may be closely related to metastasis of HCC.The innovations in this paper are mainly showed as follows: 1. Developed a new algorithm to measure similarity of gene annotation semantic. This algorithm measures similarity of gene function with the form of data, which breakthrough the defects of fuzzy in the previous gene function comparative. The similarities between a large numbers of genes can be easy obtained by using the algorithm, indicating that it is superior to the manual way in efficient and accurate.2. Developed a novel clustering functional evaluation algorithm. The algorithm assesses the clustering results from the perspective of gene function similarity, so that it can overcome the previous drawback that clustering quality evaluation is only from the aspect of mathematical characteristics of data. Therefore, the result of higher quality can be obtained.3. Proposed a new method to identify feature genes. This method transfers results of t-test into weight value. According to the weight values and SVMs of different kernel function to identify feature genes, which overcome the shortcomings that feature genes and kernel functions is selected by randomized trial ways.4. Found a regulation relationship between feature microRNAs and genes in HCC metastasis.The researches on the algorithm of clustering functional evaluation and the identification of metastasis-related genes and microRNAs in HCC have important academic and application value. Firstly, using the algorithm of clustering function evaluation can obtain higher-quality clustering results, which can divide genes into groups in more accurate functional classification. Secondly, these feature microRNAs and genes chose by the t-cross-weight can improve the prediction accuracy of HCC metastasis.Lasterly, the microRNAs-Genes associate network offers a new idea to the research on mechanism of metastasis of HCC. Otherwise, the algorithms of gene annotation semantic similarity and t-cross-weight also can be used to other similar gene functional compare and discriminant analysis, respectively.
Keywords/Search Tags:Gene Chip, Cluster Analysis, Discriminant Analysis, Hepatocellular Carcinoma, Metastasis, t-Cross-Weight
PDF Full Text Request
Related items