Journal of Molecular Biology
Convergent Evolution of Enzyme Active Sites Is not a Rare Phenomenon
Introduction
How common an occurrence is the convergent evolution of active sites in enzymes? In convergent evolution of enzymes, non-homologous enzymes evolve in separate biological contexts to catalyse the same or similar biochemical transformation. Often such enzymes have nothing in common beyond their function. However, there are several documented occurrences of convergently evolved enzymes which, though structurally non-homologous, have identical or closely related active site residues with a very similar geometry. The first observation of convergence was in the 1970s when the catalytic Ser-His-Asp triad of the trypsin family of serine proteases1 was also observed with virtually the same geometry in the structurally distinct (and hence non-homologous) enzyme subtilisin.2., 3. Since then there have been additional reports of convergence of active sites (for instance in tyrosine phosphatases4 and aldo-keto reductases5), but to date there has been no comprehensive and systematic analysis to assess the relative frequencies of divergence and convergence. Here we use a recently developed database of enzyme active sites together with a program for comparing the position of residues in 3D to provide, for the first time, such a systematic survey.
Central to the description of the function of an enzyme are the concepts of the transformation it performs and the mechanistic strategy it employs. The transformation performed by an enzyme is described via the Enzyme Commission(EC) classification.6 Individual enzymes are assigned a four-digit EC number. The first of these numbers, the enzyme class, describes the overall chemical reaction that the enzyme catalyses with the subsequent numbers having different meanings depending on the class. The fourth digit describes the specificity of the enzyme reaction by defining the specific reaction substrate/product or the cofactors used. The mechanism by which this transformation is performed is the consequence of the functional residues in the protein with the key residues forming the active site. However, active sites are not consistently defined in the literature but a recent analysis7 has identified them as either (i) directly involved in the catalytic mechanism, (ii) exerting an effect on a residue or water directly involved in the mechanism, (iii) stabilizing a proposed transition-state intermediate, or (iv) affecting the substrate or cofactor so as to aid catalysis. This definition formed the basis for the development of the Catalytic Site Atlas (CSA)8 that uses literature descriptions to classify the active site residues in many of the enzymes of determined 3D structure.
The complex inter-relationships between enzyme structure and function are the results of different evolutionary processes. Homologous proteins can be identified by their adoption of a common three-dimensional structure even if the level of sequence similarity is below that detectable by current methods. Expert assessment of structural and functional features has lead to the assignment of the superfamily classification in databases such as structural classification of proteins (SCOP)9 and class architecture topology and homology (CATH).10 A superfamily groups homologous protein domains of determined structure but excludes those domains that adopt a similar fold considered to be the result of convergence.
Divergent evolution from a common ancestor often leads to enzymes with the same active site residues and the same underlying mechanism but acting on different, and often chemically related, substrates. More extensive divergence can lead to the modification of the active site residues but the enzymes are still involved in similar mechanisms.7., 11., 12.
Convergent evolution of enzyme function can be manifest in two distinct, but sometimes joint, effects. The first is when non-homologous enzymes deliver the same transformation as expressed by the same four-digit EC number and this has recently been studied systematically.13., 14., 15. In this case the enzymes involved can be termed transformational analogues. The other situation is when the same (same four-digit EC) or related (same three-digit EC) enzyme transformation is effected by a similar disposition of residues in the active site, as exemplified in the Ser-His-Asp catalytic triad shared by the trypsin family and subtilisin.16 Such enzymes are mechanistic analogues. This distinction between transformational and mechanistic analogues, which follows the one given by Doolittle,17 is not exclusive because two enzymes are assigned to both classes if they perform exactly the same overall reaction with the same mechanism.
The occurrence of transformational analogues has been studied by Galperin et al.13 and Hegyi and Gerstein.14 Galperin et al. identified 105 EC numbers, which are present in two or more apparently unrelated sequences. They were able to show that 34 of these 105 pairs of candidate analogous enzymes had distinct structural folds, while in ten cases the same fold was detected. No structural information was available for the remaining enzymes at that time. The authors argue that the most likely mechanism for the evolution of analogous enzymes appears to be the recruitment of existing enzymes that undergo a change in specificity or reaction mechanism. They also observe a correlation between the number of transformational analogues and the genome size of an organism and argue that biochemical diversity is a luxury enjoyed mostly by organisms with large genomes.
Hegyi and Gerstein cross-referenced SCOP, Swissprot18 and the ENZYME19 database to identify folds with many functions (evolutionary divergence) and functions mapping to many folds (evolutionary convergence). The authors found that more than half of the enzymatic functions are associated with at least two different folds, while less than half of the folds with enzymatic activity have at least two functions. They also found that for the enzyme-related folds there are on average 1.8 functions per fold and 2.5 folds per function and identified a list of 13 EC numbers mapping to more than one fold. This last work, operating at the fold level, potentially missed all those examples of convergent evolution involving evolutionary unrelated proteins possessing the same fold.
Neither of these two works considered the structure of the active sites and the details of the catalytic mechanisms involved. The purpose of the present work is therefore to integrate, for the first time, detailed structural and mechanistic information about active sites to identify and characterise examples of convergent evolution and to investigate how widespread this phenomenon is across enzyme space.
Section snippets
Approach
Figure 1 shows the procedure followed (see Materials and Methods for details). Starting with the Catalytic Site Atlas (CSA), 169 different three-digit EC groups were identified that included descriptions of the catalytic residues. The program Query3d20 was used to identify structurally similar active sites within each three-digit EC group. The SCOP database was then used to filter this list and identify 67 groups of three-digit EC numbers that involved matches of active site residues between
Mechanistic Analogues
The CSA contains 169 different three-digit EC groups which were considered here (catalytic sites spanning multiple chains and enzymes not classified in SCOP are not included in this work; see Materials and Methods). In 15% (26/169) of the three-digit EC groups there are two or more non-homologous proteins with similar catalytic sites (i.e. convergent evolution, mechanistic analogues) (see Figure 3(a)). This statistic represents the proportion of three-digit EC groups in which one or more
Transformational analogues in CSA
The analysis of transformational analogues involves identifying all those cases where unrelated enzymes perform the same chemical transformation (i.e. same four-digit EC number). Performing such an analysis on the same CSA dataset used for the identification of mechanistic analogues allows us to draw a parallel statistic and explore the relationship between these two modes of convergent evolution.
The enzymes contained in CSA and classified in SCOP represent 951 different four-digit EC numbers;
Discussion
One question which directly relates to this analysis is what types of evolutionary constraints lead to multiple independent inventions of the same reaction mechanism, or alternatively, to the development of different strategies to perform the same chemical reaction. Convergent evolution to the same reaction mechanism is very likely guided by specific mechanistic constraints inherent to the reaction being performed. A striking example of this is shown by the multiple mechanistic similarities
Coverage and Quality of the Datasets
The analysis of mechanistic analogues is limited to the CSA dataset and it is therefore important to assess its coverage. The 2.2.2 version of CSA used here contains 12,987 PDB codes that are present in SCOP, i.e. 98% of the 13,230 that are classified in SCOP and have an EC number assigned in PDBSprotEC234 (a database linking PDB chains with the EC classification system). The coverage in terms of EC numbers is also extensive since 951 four-digit EC numbers (excluding those where one of the
Concluding Remarks
This work lists several examples of structural matches between active sites, which have also been manually investigated to verify that the similarity of the structures actually reflects a functional relationship. These examples can therefore be used as a benchmark set for testing a local structural comparison algorithm and its ability to identify functionally meaningful structural similarities between proteins.
The key results of this systematic, general analysis of evolutionary convergence
Materials and Methods
The method is summarised in Figure 1. The aim is to compare structurally the active sites of enzymes sharing the first three digits of the EC number to identify instances of convergent evolution. The 2.2.2 version of the Catalytic Site Atlas (CSA)8 (downloaded on 1st March 2007) provided details of the catalytic residues. Each residue in the protein was represented as two points, the Cα atom and the geometric centroid of all the side-chain atoms. The active site was mapped by first calculating
Acknowledgements
The authors thank Dr Gabriele Ausiello for kindly providing the source code of the Query3d program. P.F.G. was supported by Telethon grant GGP04273 and M.N.W. was supported by a BBSRC grant.
References (235)
- et al.
Analysis of catalytic residues in enzyme active sites
J. Mol. Biol.
(2002) - et al.
Evolution of function in protein superfamilies, from a structural perspective
J. Mol. Biol.
(2001) - et al.
The relationship between protein structure and function: a comprehensive survey with application to the yeast genome
J. Mol. Biol.
(1999) Comparison of the active site stereochemistry and substrate conformation in-chymotrypsin and subtilisin bpn'
J. Mol. Biol.
(1972)Convergent evolution: the need to be explicit
Trends Biochem. Sci.
(1994)- et al.
Understanding nature's catalytic toolkit
Trends Biochem. Sci.
(2005) - et al.
Catalytic triads and their relatives
Trends Biochem. Sci.
(1998) - et al.
Protein folds and functions
Structure
(1998) - et al.
GDP-fucose synthetase from Escherichia coli: structure of a unique member of the short-chain dehydrogenase/reductase family that catalyzes two distinct reactions at the same active site
Structure
(1998) - et al.
Crystal structure of l-2-hydroxyisocaproate dehydrogenase from Lactobacillus confusus at 2.2 Å resolution. an example of strong asymmetry between subunits
J. Mol. Biol.
(1995)
A novel nad-binding protein revealed by the crystal structure of 2,3-diketo-l-gulonate reductase (yiak)
J. Biol. Chem.
Methanopyrus kandleri glutamyl-tRNA reductase
J. Biol. Chem.
The X-ray structure of Escherichia coli enoyl reductase with bound NAD+ at 2.1 Å resolution
J. Mol. Biol.
Crystallographic refinement of lignin peroxidase at 2 Å
J. Biol. Chem.
Crystal structure of an aromatic ring opening dioxygenase ligab, a protocatechuate 4,5-dioxygenase, under aerobic conditions
Structure
An archetypical extradiol-cleaving catecholic dioxygenase: the crystal structure of catechol 2,3-dioxygenase (metapyrocatechase) from Pseudomonas putida mt-2
Structure
The mechanism of dna cytosine-5 methylation. Kinetic and mutational dissection of hhai methyltransferase
J. Biol. Chem.
The active site of Escherichia coli udp-n-acetylglucosamine acyltransferase. chemical modification and site-directed mutagenesis
J. Biol. Chem.
Melatonin biosynthesis: the structure of serotonin n-acetyltransferase at 2.5 Å resolution suggests a catalytic mechanism
Mol. Cell
Analysis of the structure, substrate specificity, and mechanism of squash glycerol-3-phosphate (1)-acyltransferase
Structure
Crystallographic analysis of the reaction pathway of Zoogloea ramigera biosynthetic thiolase
J. Mol. Biol.
The 1.8 Å crystal structure and active-site architecture of beta-ketoacyl-acyl carrier protein synthase III (fabh) from Escherichia coli
Structure
The structure of enzyme iialactose from Lactococcus lactis reveals a new fold and points to possible interactions of a multicomponent system
Structure
The structure of a binary complex between a mammalian mevalonate kinase and ATP: insights into the reaction mechanism and human inherited disease
J. Biol. Chem.
Conformational changes in the reaction of pyridoxal kinase
J. Biol. Chem.
Structure of type iibeta phosphatidylinositol phosphate kinase: a protein kinase fold flattened for interfacial phosphorylation
Cell
Structure of an enzyme required for aminoglycoside antibiotic resistance reveals homology to eukaryotic protein kinases
Cell
Crystal structure of the complex of phosphofructokinase from Escherichia coli with its reaction products
J. Mol. Biol.
Conformational changes during the catalytic cycle of gluconate kinase as revealed by X-ray crystallography
J. Mol. Biol.
The structural mechanism of translocation and helicase activity in T7 RNA polymerase
Cell
The structure of truncated recombinant human bile salt-stimulated lipase reveals bile salt-independent conformational flexibility at the active-site loop and provides insights into heparin binding
J. Mol. Biol.
Crystal structure of the catalytic domain of the chemotaxis receptor methylesterase, cheb
J. Mol. Biol.
Structure and mechanism of human cytosolic phospholipase A(2)
Biochim. Biophys. Acta
Structure and function of the protein tyrosine phosphatases
Trends Biochem. Sci.
Crystal structure of Aspergillus niger ph 2.5 acid phosphatase at 2. 4 Å resolution
J. Mol. Biol.
Active site residues of human beta-glucuronidase. evidence for Glu(540) as the nucleophile and Glu(451) as the acid-base residue
J. Biol. Chem.
The three-dimensional structure of invertase (beta-fructosidase) from thermotoga maritima reveals a bimodular arrangement and an evolutionary relationship between retaining and inverting glycosidases
J. Biol. Chem.
1.68-Å crystal structure of endopolygalacturonase ii from Aspergillus niger and identification of active site residues by site-directed mutagenesis
J. Biol. Chem.
Tetrameric dipeptidyl peptidase I directs substrate specificity by use of the residual pro-part domain
FEBS Letters
Identification of serine 624, aspartic acid 702, and histidine 734 as the catalytic triad residues of mouse dipeptidyl-peptidase IV (cd26). a member of a novel family of non-classical serine hydrolases
J. Biol. Chem.
Three-dimensional structure of human gamma-glutamyl hydrolase. a class I glatamine amidotransferase adapted for a complex substate
J. Biol. Chem.
Crystal structure of an acylpeptide hydrolase/esterase from Aeropyrum pernix k1
Structure
The crystal structure of pyroglutamyl peptidase I from Bacillus amyloliquefaciens reveals a new structure for a cysteine protease
Structure
The structure of clpp at 2.3 Å resolution suggests a model for ATP-dependent proteolysis
Cell
Three-dimensional structure of tosyl-alpha-chymotrypsin
Nature
Serine proteases: structure and mechanism of catalysis
Annu. Rev. Biochem.
A comparison of the three-dimensional structures of subtilisin bpn' and subtilisin novo
Cold Spring Harbor Symp. Quant. Biol.
Crystal structure of bovine heart phosphotyrosyl phosphatase at 2.2-Å resolution
Biochemistry
Structure of 3 alpha-hydroxysteroid/dihydrodiol dehydrogenase complexed with NADP+
Biochemistry
Enzyme nomenclature. Reccomendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology
Cited by (95)
Structure-guided metagenome mining to tap microbial functional diversity
2023, Current Opinion in MicrobiologyStructural similarities between SAM and ATP recognition motifs and detection of ATP binding in a SAM binding DNA methyltransferase
2023, Current Research in Structural BiologyA novel Tetrahymena thermophila sterol C-22 desaturase belongs to the fatty acid hydroxylase/desaturase superfamily
2022, Journal of Biological ChemistryConformational Variation in Enzyme Catalysis: A Structural Study on Catalytic Residues
2022, Journal of Molecular BiologyCitation Excerpt :This initial description will be enhanced in the future, to aid in better understanding catalysis and designing enzymes of novel function. Τhe scope of this paper is to examine the phenomenon of 3D variation within identical or similar enzymes, without exploring plasticity of active sites6 and convergent evolution.41 However, the computational pipeline implemented during this work (CSA-3D) is designed to allow the study of plasticity, by generating active site 3D templates.42–45
A single residue determines substrate preference in benzylisoquinoline alkaloid N-methyltransferases
2020, PhytochemistryCitation Excerpt :These comparisons highlight how, in one case, distantly related and structurally divergent enzymes have arrived at roughly analogous mechanistic strategies to control substrate specificity and, in another, structurally similar enzyme families use identical variants (i.e. E/G substitution) to dictate either substrate specificity or reaction chemistry. Along with many similar observations in other biochemical contexts, our results further support the notion that biological catalysts rely on a small set of conserved physicochemical strategies, upon which unrelated macromolecules readily converge (Almonacid and Babbitt, 2011; Gherardini et al., 2007; Zhang and Klinman, 2016). CNMT and RNMT enzymes have important and distinct roles in the biosynthesis of BIAs.
Underground metabolism: network-level perspective and biotechnological potential
2018, Current Opinion in Biotechnology