Abstract
The neutral theory of molecular evolution states that most mutations are deleterious or neutral. It results that the evolutionary rate of a given position in an alignment is a function of the level of constraint acting on this position. Inferring evolutionary rates from a set of aligned sequences is hence a powerful method to detect functionally and/or structurally important positions in a protein. Some positions, however, may be constrained while having a high substitution rate, providing these substitutions do not affect the biochemical property under constraint. Here, I introduce a new evolutionary rate measure accounting for the evolution of specific biochemical properties (e.g., volume, polarity, and charge). I then present a new statistical method based on the comparison of two rate measures: a site is said to be constrained for property X if it shows an unexpectedly high conservation of X knowing its total evolutionary rate. Compared to single-rate methods, the two-rate method offers several advantages: it (i) allows assessment of the significance of the constraint, (ii) provides information on the type of constraint acting on each position, and (iii) detects positions that are not proposed by previous methods. I apply this method to a 200-sequence data set of triosephosphate isomerase and report significant cases of positions constrained for polarity, volume, or charge. The three-dimensional localization of these positions shows that they are of potential interest to the molecular evolutionist and to the biochemist.
Similar content being viewed by others
References
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J Roy Stat Soc B Met 57:89–300
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188
Chessel D, Dufour A, Thioulouse J (2004) The ade4 package. I. One-table methods. R News:5–10
Dutheil J, Galtier N (2007) Detecting groups of co-evolving positions in a molecule: a clustering approach. BMC Evol Biol 7:242–242
Dutheil J, Pupko T, Jean-Marie A, Galtier N (2005) A model-based approach for detecting coevolving positions in a molecule. Mol Biol Evol 22:1919–1928
Dutheil J, Gaillard S, Bazin E, Glémin S, Ranwez V, Galtier N, Belkhir K (2006) Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics 7:188–188
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA
Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N (2003) ConSurf: identification of functional regions in proteins by surfacemapping of phylogenetic information. Bioinformatics 19:163–164
Goldman N, Thorne JL, Jones DT (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149:445–458
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–864
Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704
Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374–374
Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge
Koshi JM, Goldstein RA (1997) Mutation matrices and physical-chemical properties: correlations and implications. Proteins 27:336–344
Kosiol C, Goldman N (2005) Different versions of the Dayhoff rate matrix. Mol Biol Evol 22:193–199
Kraulis PJ (1991) Molscript—a program to produce both detailed and schematic plots of protein structures. J Appl Crystal 24:946–950
Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358
Lolis E, Alber T, Davenport RC, Rose D, Hartman FC, Petsko GA (1990) Structure of yeast triosephosphate isomerase at 19-A resolution. Biochemistry 29:6609–6618
Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of site specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol 21:1781–1791
Mayrose I, Mitchell A, Pupko T (2005) Site-specific evolutionary rate inference: taking phylogenetic uncertainty into account. J Mol Evol 60:345–353
Merritt EA, Bacon DJ (1997) Raster3d: photorealistic molecular graphics. Methods Enzymol 277:505–524
Nielsen R (2002) Mapping mutations on phylogenies. Syst Biol 51:729–739
Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936
Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18(Suppl 1):S71–S77
R Development Core Team (2006) R: a language and environment for statistical computing
Sainudiin R, Wong WS, Yogeeswaran K, Nasrallah JB, Yang Z, Nielsen R (2005) Detecting site-specific physicochemical selective pressures: applications to the Class I HLA of the human major histocompatibility complex and the SRK of the plant sporophytic self-incompatibility system. J Mol Evol 60:315–326
Sokal RR, Rholf FJ (1995) Biometry, 3rd edn. W. H. Freeman, New York
Verhoeven KJF, Simonsen K, McIntyre LM (2005) Implementing false discovery rate control: increasing your power. Oikos 108:643–647
Wong WS, Sainudiin R, Nielsen R (2006) Identification of physicochemical selective pressure on protein encoding nucleotide sequences. BMC Bioinformatics 7:148–148
Woolley S, Johnson J, Smith MJ, Crandall KA, Mcclellan DA (2003) TreeSAAP: selection on amino acid properties using phylogenetic trees. Bioinformatics 19:671–672
Xia X, Li WH (1998) What amino acid properties affect protein evolution? J Mol Evol 47:557–564
Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39:306–314
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford, UK
Yang Z, Kumar S, Nei M (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641–1650
Acknowledgments
This work was supported by Centre National de la Recherche Scientifique and Action Concertée Incitative “Informatique, Mathématiques et Physique pour la Biologie.” The author would like to thank Nicolas Galtier, Tal Pupko, Itay Mayrose, Adi Stern, Adi Doron, Eyal Privman, Nimrod Rubinstein, Ofir Cohen, Osnat Penn, David Burnstein, and Guillaume Achaz for helpful suggestions on this work, Nicolas Galtier for help with the writing of the manuscript, and Karine Jacquet for help with the ade4 package. This publication is contribution 2008-051 of the Institut des Sciences de l’Evolution de Montpellier (UMR 5554—CNRS).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dutheil, J. Detecting Site-Specific Biochemical Constraints Through Substitution Mapping. J Mol Evol 67, 257–265 (2008). https://doi.org/10.1007/s00239-008-9139-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-008-9139-8