Prediction Of Protein-coding Genes And Genetic Disease Relevant Genes

Posted on:2007-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:H Wang

Full Text:PDF

GTID:2144360242461936

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Prediction of protein-coding genes, which is valuable for finding new genes, understanding the composition of genomes and identifying disease relevant genes, plays a very important role in various kinds of genome projects.The accurate identification of splice sites of eukaryotic genes is one of the challenging and essential problems of gene structure prediction. At present, the widely used splice site identification methods, such as the weight array model (WAM), are based on the features of the conservative signal sequences around splice sites. Besides such kind of information, in this paper, other features useful for identifying splice sites are exploited, including the relationship between the conservative signals and the C+G content of sequences around splice sites, the compositional features of the up and down stream sequences of splice sites and their dependence on the C+G content of sequences around splice sites. Further, different models are constructed to describe these features, and a logitlinear model is created to integrate them. Eventually, a new program SpliceKey for the prediction of splice sites is developed. Testing results demonstrate that the prediction accuracy of SpliceKey is not only significantly higher than that of WAM, but also better than that of DGSplice, a recently released splice site prediction program.A novel approach and the corresponding program DCGene to predict causative genes by mining functional information based on GO annotation is presented. When GO terms are used to evaluate the possibility of candidate genes to be causative genes, the features of GO terms-the DAG information-are effectively considered. This algorithm can effectively compute the relevant degree between genes and disease, which guarantees the accuracy of disease gene prediction. For assessment of the method, a leave-one-out test of 1057 disorders whose causative genes have been identified from OMIM database, using candidate genes from the corresponding located chromosome regions, containing 89 genes on average, and 12954 candidate genes from the human genome is preformed respectively. The prediction results demonstrate that the method can effectively predict the disease genes from candidate genes on located chromosome region and genome scale. Consequently, the prediction results can either be used to identify causative genes in chromosome region or to afford potential loci on genome-wide scale for linkage analysis of simple diseases and association study of complex diseases.

Keywords/Search Tags:

human genetic disorders, gene structure prediction, splice site prediction, disease gene prediction

PDF Full Text Request

Related items

1	Diverse mechanisms of human genetic disease: Splice order determination in the COL1A2 gene. Effects that influence splice site mutations in osteogenesis imperfecta and a translocation disrupting SNRPN gene causes Prader-Willi syndrome
2	Study On The Methods To Protein Structure Prediction
3	Algorithm Research Of Human Genetic Disorder Gene Prediction Based On Protein Network
4	Research On Disease Gene Prediction Algorithm Based On Gene Network
5	Protein–macromolecule Interaction And Bioinformatics Analysis Of The Molecular Disease Mechanism
6	An Algorithm For Disease Gene Prediction Based On Molecular Networks
7	Computational Approaches to Prediction and Analysis of Human Leukocyte Antigen Genes
8	Point Cloud-based Method For Protein Ligand Affinity Prediction And Binding Site Prediction
9	Genetic Algorithm Based Composite Kernel Partial Least Square In Disease Prediction And Classification With Genomic Data
10	Research On Disease Gene Prediction Method Based On Annotated Gene Set