Skip to main content
Log in

Detecting Site-Specific Biochemical Constraints Through Substitution Mapping

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

The neutral theory of molecular evolution states that most mutations are deleterious or neutral. It results that the evolutionary rate of a given position in an alignment is a function of the level of constraint acting on this position. Inferring evolutionary rates from a set of aligned sequences is hence a powerful method to detect functionally and/or structurally important positions in a protein. Some positions, however, may be constrained while having a high substitution rate, providing these substitutions do not affect the biochemical property under constraint. Here, I introduce a new evolutionary rate measure accounting for the evolution of specific biochemical properties (e.g., volume, polarity, and charge). I then present a new statistical method based on the comparison of two rate measures: a site is said to be constrained for property X if it shows an unexpectedly high conservation of X knowing its total evolutionary rate. Compared to single-rate methods, the two-rate method offers several advantages: it (i) allows assessment of the significance of the constraint, (ii) provides information on the type of constraint acting on each position, and (iii) detects positions that are not proposed by previous methods. I apply this method to a 200-sequence data set of triosephosphate isomerase and report significant cases of positions constrained for polarity, volume, or charge. The three-dimensional localization of these positions shows that they are of potential interest to the molecular evolutionist and to the biochemist.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J Roy Stat Soc B Met 57:89–300

    Google Scholar 

  • Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188

    Article  Google Scholar 

  • Chessel D, Dufour A, Thioulouse J (2004) The ade4 package. I. One-table methods. R News:5–10

  • Dutheil J, Galtier N (2007) Detecting groups of co-evolving positions in a molecule: a clustering approach. BMC Evol Biol 7:242–242

    Article  PubMed  Google Scholar 

  • Dutheil J, Pupko T, Jean-Marie A, Galtier N (2005) A model-based approach for detecting coevolving positions in a molecule. Mol Biol Evol 22:1919–1928

    Article  PubMed  CAS  Google Scholar 

  • Dutheil J, Gaillard S, Bazin E, Glémin S, Ranwez V, Galtier N, Belkhir K (2006) Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics 7:188–188

    Article  PubMed  Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376

    Article  PubMed  CAS  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA

    Google Scholar 

  • Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N (2003) ConSurf: identification of functional regions in proteins by surfacemapping of phylogenetic information. Bioinformatics 19:163–164

    Article  PubMed  CAS  Google Scholar 

  • Goldman N, Thorne JL, Jones DT (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149:445–458

    PubMed  CAS  Google Scholar 

  • Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864

    Article  PubMed  CAS  Google Scholar 

  • Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704

    Article  PubMed  Google Scholar 

  • Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374–374

    Article  PubMed  CAS  Google Scholar 

  • Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge

    Google Scholar 

  • Koshi JM, Goldstein RA (1997) Mutation matrices and physical-chemical properties: correlations and implications. Proteins 27:336–344

    Article  PubMed  CAS  Google Scholar 

  • Kosiol C, Goldman N (2005) Different versions of the Dayhoff rate matrix. Mol Biol Evol 22:193–199

    Article  PubMed  CAS  Google Scholar 

  • Kraulis PJ (1991) Molscript—a program to produce both detailed and schematic plots of protein structures. J Appl Crystal 24:946–950

    Article  Google Scholar 

  • Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358

    Article  PubMed  CAS  Google Scholar 

  • Lolis E, Alber T, Davenport RC, Rose D, Hartman FC, Petsko GA (1990) Structure of yeast triosephosphate isomerase at 19-A resolution. Biochemistry 29:6609–6618

    Article  PubMed  CAS  Google Scholar 

  • Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of site specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol 21:1781–1791

    Article  PubMed  CAS  Google Scholar 

  • Mayrose I, Mitchell A, Pupko T (2005) Site-specific evolutionary rate inference: taking phylogenetic uncertainty into account. J Mol Evol 60:345–353

    Article  PubMed  CAS  Google Scholar 

  • Merritt EA, Bacon DJ (1997) Raster3d: photorealistic molecular graphics. Methods Enzymol 277:505–524

    Article  PubMed  CAS  Google Scholar 

  • Nielsen R (2002) Mapping mutations on phylogenies. Syst Biol 51:729–739

    Article  PubMed  Google Scholar 

  • Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936

    PubMed  CAS  Google Scholar 

  • Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18(Suppl 1):S71–S77

    PubMed  Google Scholar 

  • R Development Core Team (2006) R: a language and environment for statistical computing

  • Sainudiin R, Wong WS, Yogeeswaran K, Nasrallah JB, Yang Z, Nielsen R (2005) Detecting site-specific physicochemical selective pressures: applications to the Class I HLA of the human major histocompatibility complex and the SRK of the plant sporophytic self-incompatibility system. J Mol Evol 60:315–326

    Article  PubMed  CAS  Google Scholar 

  • Sokal RR, Rholf FJ (1995) Biometry, 3rd edn. W. H. Freeman, New York

    Google Scholar 

  • Verhoeven KJF, Simonsen K, McIntyre LM (2005) Implementing false discovery rate control: increasing your power. Oikos 108:643–647

    Article  Google Scholar 

  • Wong WS, Sainudiin R, Nielsen R (2006) Identification of physicochemical selective pressure on protein encoding nucleotide sequences. BMC Bioinformatics 7:148–148

    Article  PubMed  Google Scholar 

  • Woolley S, Johnson J, Smith MJ, Crandall KA, Mcclellan DA (2003) TreeSAAP: selection on amino acid properties using phylogenetic trees. Bioinformatics 19:671–672

    Article  PubMed  CAS  Google Scholar 

  • Xia X, Li WH (1998) What amino acid properties affect protein evolution? J Mol Evol 47:557–564

    Article  PubMed  CAS  Google Scholar 

  • Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39:306–314

    Article  PubMed  CAS  Google Scholar 

  • Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford, UK

    Google Scholar 

  • Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641–1650

    PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This work was supported by Centre National de la Recherche Scientifique and Action Concertée Incitative “Informatique, Mathématiques et Physique pour la Biologie.” The author would like to thank Nicolas Galtier, Tal Pupko, Itay Mayrose, Adi Stern, Adi Doron, Eyal Privman, Nimrod Rubinstein, Ofir Cohen, Osnat Penn, David Burnstein, and Guillaume Achaz for helpful suggestions on this work, Nicolas Galtier for help with the writing of the manuscript, and Karine Jacquet for help with the ade4 package. This publication is contribution 2008-051 of the Institut des Sciences de l’Evolution de Montpellier (UMR 5554—CNRS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Dutheil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dutheil, J. Detecting Site-Specific Biochemical Constraints Through Substitution Mapping. J Mol Evol 67, 257–265 (2008). https://doi.org/10.1007/s00239-008-9139-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-008-9139-8

Keywords

Navigation