Font Size: a A A

Statistical Inference For Codon Substitution Models Based On Codon Usage Bias And Branch Clustering

Posted on:2013-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:S S ChengFull Text:PDF
GTID:2250330422953064Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Bioinformatics is one of the most popular fields of multiple subjects, which involves biology,mathematics, computer science and the related subjects. Phylogenetic analysis, as one of importantresearch contents of bioinformatics, makes the inferences and evaluation of biological evolutionalrelationships by using the probabilistic and statistical methods and biomolecular data. Thecomputational molecular evolution based on phylogenetics has been widely applied in the fields ofevolutionary genetics, ecology and genomic science and some biological experiment sciences such asvirology and developmental biology. The establishment of probabilistic substitution models betweenbiological data units such as nucleotides, amino acids or codons is one of the major research contents.Thus the study of probabilistic substitution models is significantly meaningful.In this thesis, statistical inference for parameters of codon substitution models based on codonusage bias and branch clustering is presented. In chapter2, a new codon substitution model based onamino acid biochemistry distances and codon usage bias is implemented, and the application of thecoding sequences of two data sets are analyzed. The analytic results suggest that the new codonsubstitution model can provide a better fit to data comparing with existing codon models, and morerational estimates of paramerers are obtained. In chapter3, considering heterogeneity sites inmolecular evolution, we first estimate the similarity degree of species by clustering methodology.Then species are classified into several groups based on their similarity degrees, allowing each groupto have its own selection pressure of molecular evolution. The parameters of the new codon model arestatistically inferred by using maximum likelihood method. The model is applied to real data sets andcompared with the existing model to analyze the adaptability of models to detect selection pressure ofspecies. In chapter4, we introduce the estimation methods of EM algorithm for estimating parametersof phylogenetic tree built from the observed sequences with missing and inserting sites. We mainlyapply EM algorithm to estimate parameters of the gamma distances of coding sequences with missingdata based on JC69and K80models. The methods of testing the reliability of phylogenetic tree arealso introduced.
Keywords/Search Tags:codon substitution model, codon usage bias, amino acid biochemistry distances, EMalgorithm, phylogenetic tree
PDF Full Text Request
Related items