Font Size: a A A

Construction And Application Of SNP Microarray Database And The Related Analysis Tools

Posted on:2006-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:X J DongFull Text:PDF
GTID:2144360212482739Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
The sequencing of the human genome announced in July 2000 and published in early spring 2001, heralds major breakthroughs in our understanding and treatment of human diseases. We all carry the same 30,000-40,000 genes, and we share 99.9% of our DNA sequence with each other, yet we are all different in so many ways; not only in physical appearance and susceptibility to disease and infections, but also differ in how we respond to medicines. These differences are due to the combination of gene sequence variations we carry as well as environmental influence. The SNP Consortium and others are cataloguing these sequence differences, the majority of which are Single Nucleotide Polymorphisms (SNPs). SNPs provide a powerful means to identify the genes in as yet uncharacterized parts of the genome that may have some relation to a specific disease.The DrSNP (disease-related SNP) project was carried on under this background.In current research of relationship between diseases and polymorphism of gene, people usually focused on only one or several SNPs limitedly. Additionally, different institutes did seldom share the original data with each other well and enough, which makes them do analysis independently and inefficiently. In fact, the factors that impact the disease may be multiplex, including multiple polymorphism sites and interaction between various environmental factors. Small sample data and single analysis strategy could not function well any more. Especially for those complex genetic diseases, we need to build up special database and analysis tools to perform large sample data. Based on the above insight, we planed to build up a database of SNP microarray data and sample data, and to develop and integrate the corresponding analysis tools based on the database. We aimed to help users to share the information and do data analysis online. In this paper, we described the framework to build up this database system, investigated different methods to analysis the SNP data, and discussed the strategy to perform the analysis online. The paper is mainly divided into three parts as the following listed: (1)Construction of DrSNP analysis system. The project of DrSNP(Disease-related SNP)is aimed to find the relationship between diseases and gene polymorphisms by experiments with the service of bioinformatics solution, which includes and integrates three databases(sample data, SNP array data, SNP-related gene data) as well as developing tools to screen SNP candidates and to analysis the experimental data. DrSNP project is contributed by the whole group, including the experimental persons and bioinformatics researchers. We devoted to design the database framework and integrate them into a whole. In the second chapter of the paper, it mainly described the framework of DrSNP system, function of each part and their relationship. Finally, two analysis tools based on the three databases are simply introduced: SNP screening and result analysis.(2)Development of web-based SNP data analysis environment using R software.R is a language and environment for statistical computing and graphics. It's free and opensource. The features of R make it a good actor of doing statistic analysis on the back-end. For the"DB + Web service"construction of our system, we need to response to users'request from the Internet. The batch of"R on web"solutions in r-project is devoting to solve this problem with the advantage of R in statistical computing. After comparing the common frameworks like Rweb, RCGI, CGIwithR, Rho, RZope, RPHP and RServe, we finally build up an online SNP array data analysis framework by referring to the solution of Rserce, in which the communication between Java and R could be carried on through the bridge of Rsreve, and programmers could easily call the computing functions in R package.(3)Association study between coronary heart disease and OLR1 gene.Coronary heart disease (CHD) is a complex multigene disease. For the important role of oxidized LDL receptor1 (OLR1) gene in the course of endothelial activation, dysfunction and injury. Endothelial activation is believed to be a very early step in the evolution of atherosclerosis. So we selected two SNPs in OLR1 gene and sampled 338 CHD patients and 280 healthy persons as our research subject. After investigation, both of the SNPs were found to be in Hardy-Weinberg Equilibrium in both CHD and control group. There were no significant differences in genotype or allele frequencies between cases and controls for any of the two markers. Studemt-test showed that there were signicicant differences in TC, HDL, LDL, LDL/HDL and TC/HDL levels between cases and controls group. Haplotype analysis was carried out on the total sample and the cases/controls sample. The result indicated that no significant association between the haplotypes.
Keywords/Search Tags:Bioinformatics, Single Nucleotide Polymorphism(SNP), Statistic analysis, CHD Chi-square test, Hardy-Weinberg Equilibrium, Logistic regression model, OLR1
PDF Full Text Request
Related items