Font Size: a A A

Detecting The Heterogeneity Of Substitution Patterns Between Molecular Sequences By Asymmetry In Paired Comparison

Posted on:2004-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:J L HuangFull Text:PDF
GTID:2120360092485631Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
As a considerably large amount of DNA sequences become increasingly available, they are widely used to study phylogeny of species and multigene families, as well as the mechanism of evolution at molecular level. Most of these studies are implemented by phylogenetic analysis of DNA sequences. One assumption commonly made in molecular phylogenetic analysis is that the substitution process is the same in all branches of a phylogenetic tree (i.e., in all sequences). Such a substitution process is called a homogeneous (or uniform) process (Tourasse and Li 1999). Another common assumption is that of stationary, i.e. the nucleotide frequencies in sequences do not change with time and are thus equal to those in the ancestral sequence (e.g. see Gu and Li 1996). If homogeneity or stationarity holds, base compositions should remain similar among sequences. However, analyses of many data sets revealed that these two assumptions are often unrealistic (Lockhart et al. 1994; Galtier and Gouy 1995). Ignoring nonhomogeneity or nonstationarity can mislead phylogenetic reconstruction and inferences about the mechanism of molecular evolution (Hasegawa et al. 1993; Steel et al. 1993; Funk et al. 1995; Galtier and Gouy 1998; Naylor and Brown 1998; Rodriguez-Trelles et al 2000; Tarrio et al. 2000), and affect the testing of the molecular-clock hypothesis (Tourasse and Li 1999) in the case that homogeneity and stationarity assumptions do not hold in a given sequence dataset. Therefore, it is important to test the homogeneity for a given set of DNA sequences before molecular phylogenetic analysis is carried out. Knowledge of the violation of the homogeneity assumption would allow the investigators to choose sophisticated methods of phylogenetic reconstruction (Lockhart et al. 1994; Galtier and Gouy 1998; Kumar and Gadagkar 2001) or to conduct phylogenetic analyses with the offending sequences removed, if possible (Nei and Kumar 2000). Furthermore, analyses of DNA sequence datasets with the violation of the homogeneity assumption is not only useful to elucidate the evolutionary mechanisms that have shaped the observed differences in genes and species with atypical substitution process, but also provides important clue for the future development of phylogenetic methods.Although violation of homogeneity assumption is known to adversely impact the accuracy of phylogenetic inference and tests of evolutionary hypotheses, testing such the assumption has been little investigated to date. Recently, Kumar and Gadagkar (2001) proposed a simple measure, disparity index (ID), to quantify the difference in observed substitution patterns between molecularsequences, and use it to develop a statistical test As the probability distribution of this statistic (ID) is unknown, Monte Carlo approach must be used to test the null assumption of homogeneity. It is well known that more stable empirical distribution could be obtained only by a large number of simulation replicates. So, the accuracy of the test based on Monte Carlo approach depends on the number of simulation replicates. Therefore, the computation of such the testing method is time-consuming, and it limits its more general application. In this thesis, we shall present an alternative simple method, which measures the observed difference in evolutionary patterns for a pair of sequences, for testing the homogeneity of substitution process between DNA sequences. This method is so simple that it can be easily applied to any DNA sequences as long as alignment is possible. We also examine the performance of this simple test under a variety of biologically realistic conditions, develope a Monte Carlo procedure to test the homogeneity of the observed patterns, and compare it to Kumar and Gadagkar's test by computer simulation as well as empirical data analysis.In fact, this chi-test holds irrespective of whether there is among-site (i) heterogeneity in substitution rates, (ii) correlation in evolutionary rates/patterns, and (iii) variation in substitution patterns. Computer simulations also show...
Keywords/Search Tags:homogeneity, substitution process, Chi-square test, power
PDF Full Text Request
Related items