Font Size: a A A

Study On Analysis Method For Human Genome Microarray Data And Its Application

Posted on:2014-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LiFull Text:PDF
GTID:1260330425476740Subject:Microbiology
Abstract/Summary:PDF Full Text Request
Human genome project was finished at the University of Washington on April14,2003. It meaned the research of life science fully came into post-genome era. Thefunctional genomics which takes the identification of gene function as the core is oneof the biggest branches. As the realization of gene functions is closely related to theexpression of RNA, many kinds of methods that base on analyzing the expression ofRNA become important tools in today’s bioscience, including microarray technology.Owning to the property of high-throughput, timeliness and accuracy, microarraysare widely used in the study of expression profiles and gene functions since theinvention of microarrays. Scientists use biology public database to share their genechip data with others from all over the world. However, there are still some problemssuch as how to analyze the huge original data efficiently and how to extract morevaluable information or find underlying orders.In this paper, a large set of human genome microarray datasets downloaded frompublic database are employed as the object. By analyzing those downloaded samples,the expression stability values of the discovered human housekeep genes have beenevaluated. To overcome the shortcomings of the existing graph clustering methods forhuman genome microarray data, a global graph clustering method was proposed basedon modularity and subgraph smoothness. In the end, the application of the data miningof human genome microarrays in the miRNA’s targets prediction is attempted.Firstly,16398samples of human genome chip data downloaded from two ofmost authoritative public database, have been processed by classifying, sorting,pretreatment and transforming procedures, and imported into a local human genomemicroarray database. By using geNORM algorithm, we studied the expressionstability of566samples of the discovered human housekeep genes, and chose themost stable expressed gene as the internal standard control that was used in theresearch of three human genome chip datasets related to the carcinogenic effects ofaflatoxin B1(AFB1). Compared to the original experiment, more relative differentgenes are found by using internal control EEF-2which have higher expressionstability in cross samples.Secondly, to overcome the shortcomings of the existing graph clustering methodsfor gene microarray data, a global graph clustering method, called module smoothness,was proposed based on modularity and subgraph smoothness. The definition of subgraph smoothness was introduced to avoid getting local optimum solution. Foreach clustering result, the subgraphs with lower smoothness values were split intosingletons, and those newly generated singletons were used in the next clustering. Theoptimal global clustering result was achieved after several iterations. Compared withfour commonly used clustering methods (classic graph clustering, K-means, SOM andFuzzy) on a group of genome expression profiles data, the module smoothnessmethod showed higher classification accuracy. The average non-overlap proportionand FOM’ value of this method were better than the others overall. When the datasetwas clustered into39clusterings, the FOM’ value of this method was28.41%,19.21%,9.84%and24.67%, respectively, lower than that of the above-mentionedfour methods. The execution efficiency of this method was5.94%higher than Fuzzymethod, and similar to SOM algorithm.Thirdly, by applying data mining of human genome microarrays, a optimizedmethod of miRNA’s target prediction, named Dual sites SVM was proposed. Themachine learning algorithms SVM and a double seeds sites searching design wereadopted in it. Besides other vectors based on complementary base-pairing principles,we defined two extra feature vectors generated from the experimental data from thelocal human genome microarrays database. The executing efficiency of the modeltrained by the Dual sites SVM method are16.76%and19.09%higher than the PicTarmethod and sigle site SVM method respectively. Compared with other six commonlyused methods, our method can effectively improve identification rate, on the basis ofnonreduction of classification accuracy. The Dual sites SVM prediction method hasbeen coded and uploaded as an online tool to serve all researchers of bioinformatics.Begin with the building of local human genome microarray databases, this studyfocus on the relative analysis methods of genome microarray datasets, such as theexpression stability of inner control genes, the data clustering approach and so on. Ithas achieved the preliminary result in these aspects, and provides some useful resultsand references for the further research and application of human genome microarraydataset.
Keywords/Search Tags:Microarrays, Human genome microarray, Database, Inner control gene, Graph-based clustering, miRNA targets
PDF Full Text Request
Related items