Font Size: a A A

A Research On Data Mining Of Gene Profiles

Posted on:2014-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhouFull Text:PDF
GTID:2250330401984409Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
The appearance of gene chip technology makes it possible for researchers toanalyze expression level of thousands of genes in various physiological states or atdifferent stages of the development. Currently, the technology has been widely usedin medical diagnosis, drug screening, crop breeding, environmental monitoring, andetc. Given massive gene expression data, how to use them to study the variousrelationships among genes is nowadays an important field in bioinformatics, andmining gene expression data using mathematical models has become a hot topic. Inthis paper, several methods of gene expression data mining are proposed, and a deepanalysis of corals’ time sequence gene expression data is undertaken.In chapter1, we introduce the history and principles of gene chip technology,and the present situation in gene expression data mining. Also, the main work infollowing chapters is summarized.In establishing logical network of genes, the expression states of genes arerequired, but gene expression data gained from database only represent expressionlevel of genes rather than exact expression states. To overcome this, a threshold valueanalysis method of gene expression data based on genetic algorithm and LVQnetwork is presented in chapter2. For a small data set, Otsu algorithm obtained byoptimizing genetic algorithm is adopted to perform the threshold analysis, and to givea bivalent classification result. For a large data set, firstly select a subset of it and useOtsu algorithm to do the threshold analysis of the subset; secondly, take the resultedset as the training set to train the LVQ network; lastly, employ the trained LVQnetwork to give a bivalent classification of the large data set.Gene expression data on discrete time points are non-stationary signals, which contain abundant information. In chapter3, in order to mine the information better,genes with bigger expression differences are firstly found out through data preprocessand differential gene screening. Then Low frequency coefficients representing thebasic trend and high frequency coefficients reflecting changes of samples are gainedthrough wavelet transform of these genes expression data. Correlation analysisbetween each couple of these genes is done with low frequency coefficients and highfrequency coefficients. Key genes are detected by means of correlation analysis. Atlast, biological information in DNA sequences is obtained with functional annotationsof key genes or their products, which means key genes that are closely related to reefbuilding in corals are found.In the end of the thesis, we look forward to the future of gene chip technologyand data mining of expression data. The direction of future research is tentativelyproposed according to the limitations and deficiency of the present method.
Keywords/Search Tags:gene chip, data mining of gene expression profiles, thresholdanalysis, wavelet transform, gene annotation
PDF Full Text Request
Related items