Skip to main content
Log in

Maximum-likelihood models for combined analyses of multiple sequence data

  • Articles
  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

Models of nucleotide substitution were constructed for combined analyses of heterogeneous sequence data (such as those of multiple genes) from the same set of species. The models account for different aspects of the heterogeneity in the evolutionary process of different genes, such as differences in nucleotide frequencies, in substitution rate bias (for example, the transition/transversion rate bias), and in the extent of rate variation across sites. Model parameters were estimated by maximum likelihood and the likelihood ratio test was used to test hypotheses concerning sequence evolution, such as rate constancy among lineages (the assumption of a molecular clock) and proportionality of branch lengths for different genes. The example data from a segment of the mitochondrial genome of six hominoid species (human, common and pygmy chimpanzees, gorilla, orangutan, and siamang) were analyzed. Nucleotides at the three codon positions in the protein-coding regions and from the tRNA-coding regions were considered heterogeneous data sets. Statistical tests showed that the amount of evolution in the sequence data reflected in the estimated branch lengths can be explained by the codon-position effect and lineage effect of substitution rates. The assumption of a molecular clock could not be rejected when the data were analyzed separately or when the rate variation among sites was ignored. However, significant differences in substitution rate among lineages were found when the data sets were combined and when the rate variation among sites was accounted for in the models. Under the assumption that the orangutan and African apes diverged 13 million years ago, the combined analysis of the sequence data estimated the times for the human-chimpanzee separation and for the separation of the gorilla as 4.3 and 6.8 million years ago, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adachi J, Hasegawa M (1995) Improved dating of the human/chimpanzee separation in the mitochondrial DNA tree: heterogeneity among amino acid sites. J Mol Evol 40:622–628

    Article  CAS  PubMed  Google Scholar 

  • Agresti A, Yang MC (1987) An empirical investigation of some effects of sparseness in contingency tables. Comput Stat Data Anal 5:9–21

    Article  Google Scholar 

  • Bull JJ, Huelsenbeck JP, Cunningham CW, Swofford DL, Waddell P (1993) Partitioning and combining data in phylogenetic analysis. Syst Biol 42:384–397

    Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376

    Article  CAS  PubMed  Google Scholar 

  • Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet 22:521–565

    Article  CAS  PubMed  Google Scholar 

  • Gaut BS, Lewis PO (1995) Success of maximum likelihood phylogeny inference in the four-taxon case. Mol Biol Evol 12:152–162

    CAS  PubMed  Google Scholar 

  • Goldman N (1993) Statistical tests of models of DNA substitution. J Mol Evol 36:182–198

    Article  CAS  PubMed  Google Scholar 

  • Haberman SJ (1977) Log-linear models and frequency tables with small expected cell counts. Ann Stat 5:1148–1169

    Google Scholar 

  • Hasegawa M, Kishino H, Yano T (1985) Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174

    CAS  PubMed  Google Scholar 

  • Horai S, Satta Y, Hayasaka K, Kondo R, Inoue T, Ishida T, Hayashi S, Takahata N (1992) Man's place in Hominoidea revealed by mitonchondrial DNA genealogy. J Mol Evol 35:32–43

    Article  CAS  PubMed  Google Scholar 

  • Kishino H, Hasegawa M (1989) Evaluation of maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179

    Article  CAS  PubMed  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall, London

    Google Scholar 

  • Pilbeam D (1986) Distinguished lecture: hominoid evolution and hominoid origins. Am Anthropol 88:295–312

    Article  Google Scholar 

  • Reeves JH (1992) Heterogeneity in the substitution process of amino acid sites of proteins coded for by mitochondrial DNA. J Mol Evol 35:17–31

    Article  CAS  PubMed  Google Scholar 

  • Searle SR (1971) Linear models. Wiley, New York

    Google Scholar 

  • Self SG, Liang K-Y (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610

    Google Scholar 

  • Swofford DL (1991) When are phylogeny estimations from molecular and morphological data incongruent? In: Miyamoto MM, Cracraft J (eds) Phylogenetic analysis of DNA sequences. Oxford University Press, New York, pp 294–333

    Google Scholar 

  • Takezaki N, Rzhetsky A, Nei M (1995) Phylogenetic test of the molecular clock and linearized trees. Mol Biol Evol 12:823–833

    CAS  PubMed  Google Scholar 

  • Tateno Y, Takezaki N, Nei M (1994) Relative efficiencies of the maximum-likelihood, neighbor joining, and maximum-parsimony methods when substitution rate varies with site. Mol Biol Evol 11:261–277

    CAS  PubMed  Google Scholar 

  • Wakeley J (1993) Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA. J Mol Evol 37:613–623

    Article  CAS  PubMed  Google Scholar 

  • Wakeley J (1994) Substitution rate variation among sites and the estimation of transition bias. Mol Biol Evol 11:436–442

    CAS  PubMed  Google Scholar 

  • Yang Z (1993) Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol 10:1396–1401

    CAS  PubMed  Google Scholar 

  • Yang Z (1994a) Estimating the pattern of nucleotide substitution. J Mol Evol 39:105–111

    Google Scholar 

  • Yang Z (1994b) Statistical properties of the maximum likelihood method of phylogenetic estimation and comparison with distance matrix methods. Syst Biol 43:329–342

    Google Scholar 

  • Yang Z (1994c) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39:306–314

    CAS  Google Scholar 

  • Yang Z, Goldman N, Friday AE (1994) Comparison of models for nucleotide substitution used in maximum likelihood phylogenetic estimation. Mol Biol Evol 11:316–324

    CAS  PubMed  Google Scholar 

  • Yang Z, Goldman N, Friday AE (1995) Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Syst Biol 44:384–399

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Z. Maximum-likelihood models for combined analyses of multiple sequence data. J Mol Evol 42, 587–596 (1996). https://doi.org/10.1007/BF02352289

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02352289

Key words

Navigation