Font Size: a A A

The Genome Evolution Analysis Based On PCA+LLE Combination Dimensionality Reduction Algorithm

Posted on:2012-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:R Y WeiFull Text:PDF
GTID:2210330338469713Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The high dimensional data reduction algorithm in the computer pattern recognition plays an important role in the research. According to the algorithm the high-dimensional data can be reduced to low dimensional, and the characteristics of the high dimensional of the data can be found easily from the low dimensional data. Because the data of the characteristics of the genome must contain a large amount of information, it must be the high dimensional, if the high dimensional data reduction algorithm is applicationed to the data will have good effect.In this thesis, a combination dimensionality reduction algorithm PCA + LLE was proposed to analyze the bacterial genome data, and a dendrogram was constructed based on the method. At the same time a scientific solution was proposed for the K and d selection in LLE. The main tasks as followed:(1)Discussed the Two practical problems in LLE: The selection of K nearest neighbor problem and the parameters intrinsic dimension d selection peoblem.The progress of K selection in the algorithmLLE was analyzed in this thesis and had a general comparision, then proposed a new method for the K selection in LLE according to the cost function, and had an experiment with the use of the method. The results of the experiment proved the success of the method. Proposed a new conception similarityaccording to the character of minimize of cost function on the issue of d selection in LLE, and analysised the relationship between the d selection and similarity.By the use of the character of the relationship proposed a method for the d selection.(2)Proposed a combination dimensionality reduction algorithm PCA + LLE.Analyzed a number of problems of PCA and LLE. Under the disadvantages and advantages of PCA and LLE proposed a combination dimensionality reduction algorithm PCA + LLE. A group of 23 bacteria genomes were analyzed by the use of the new algorithm.According to the result of the new dimension reduction, the 23 bacteria genomes can be divided into two major categories, this conclusion was similar with the other prople's research , thus proving the correctness of the experiment and the combination dimensionality reduction algorithm PCA + LLE is correct.(3) The creating of the dendrogram of the 23 bacteria based on the combination dimensionality reduction algorithm PCA+LLE.Gave a overview of the method of creating dendrogrm based on molecular level and genome analysis. Proposed a method of characters extraction of DNA sequence based on the GC content, and with the used of the combination dimensionality reduction algorithm PCA+LLE got a dendrogram of the 23 bacteria.Finally, the thesis made a summary of the work, and points out the future research directions.
Keywords/Search Tags:bioinformatics, the K nearest neighbor parameters, the intrinsic dimension d, the combination dimensionality reduction algorithm PCA+LLE, dendrogram
PDF Full Text Request
Related items