Font Size: a A A

Prediction Of Polyadenylation In Human Gene Sequences

Posted on:2009-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:J B DuanFull Text:PDF
GTID:2120360278463915Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
Polyadenylation (PolyA) in mRNA 3'end is one of the three main steps of eukaryotic pre-mRNA processing. The prediction of polyadenylation sites in human DNA and mRNA sequences is very important for realizing pre-mRNA processing and prediction of gene structure. When more than one latent PolyA sites exist in 3' UTR, a alternative polyadenylation will decide gene expression based on tissue and disease mechanism. For prediction of gene structure, identifying PolyA sites exactly is profitable on confirming 3'end.This paper generally analyses the forming mechanism of PolyA site, the primary, secondary and higher-order structures of the sequences around PolyA site, and the alternative polyadenylation, gives a particular description and analysis of the research about the predicting of PolyA site and some question unsolved. After distilling 1835 polyadenylation sequences from the Refseq database of NCBI, we calculate and analysis the frequency of all hexamers around PolyA site.This paper presents a machine learning method to predict polyadenylation signals (PASes) in human DNA and mRNA sequences. This method consists of three steps of feature manipulation: generation, selection and integration of features. In the first step, new features are generated using k-gram nucleotide acid patterns. In the second step, a number of important features are selected by an entropy-based algorithm. In the third step, support vector machines are employed to recognize true PASes from a large number of candidates. At last, a mathematic model forms. Based on the analysis results of test data, it is shown that the specificity is 71.67% on intron level and 80.77% on exon level when the sensitivity is supposed to be 60%.
Keywords/Search Tags:polyadenylation signals, machine learning, entropy, support vector machines
PDF Full Text Request
Related items