Journal of Molecular Biology
Volume 372, Issue 3, 21 September 2007, Pages 817-845
Journal home page for Journal of Molecular Biology

Convergent Evolution of Enzyme Active Sites Is not a Rare Phenomenon

https://doi.org/10.1016/j.jmb.2007.06.017Get rights and content

Abstract

Since convergent evolution of enzyme active sites was first identified in serine proteases, other individual instances of this phenomenon have been documented. However, a systematic analysis assessing the frequency of this phenomenon across enzyme space is still lacking. This work uses the Query3d structural comparison algorithm to integrate for the first time detailed knowledge about catalytic residues, available through the Catalytic Site Atlas (CSA), with the evolutionary information provided by the Structural Classification of Proteins (SCOP) database.

This study considers two modes of convergent evolution: (i) mechanistic analogues which are enzymes that use the same mechanism to perform related, but possibly different, reactions (considered here as sharing the first three digits of the EC number); and (ii) transformational analogues which catalyse exactly the same reaction (identical EC numbers), but may use different mechanisms.

Mechanistic analogues were identified in 15% (26 out of 169) of the three-digit EC groups considered, showing that this phenomenon is not rare. Furthermore 11 of these groups also contain transformational analogues. The catalytic triad is the most widespread active site; the results of the structural comparison show that this mechanism, or variations thereof, is present in 23 superfamilies.

Transformational analogues were identified for 45 of the 951 four-digit EC numbers present within the CSA and about half of these were also mechanistic analogues exhibiting convergence of their active sites. This analysis has also been extended to the whole Protein Data Bank to provide a complete and manually curated list of the all the transformational analogues whose structure is classified in SCOP.

The results of this work show that the phenomenon of convergent evolution is not rare, especially when considering large enzymatic families.

Introduction

How common an occurrence is the convergent evolution of active sites in enzymes? In convergent evolution of enzymes, non-homologous enzymes evolve in separate biological contexts to catalyse the same or similar biochemical transformation. Often such enzymes have nothing in common beyond their function. However, there are several documented occurrences of convergently evolved enzymes which, though structurally non-homologous, have identical or closely related active site residues with a very similar geometry. The first observation of convergence was in the 1970s when the catalytic Ser-His-Asp triad of the trypsin family of serine proteases1 was also observed with virtually the same geometry in the structurally distinct (and hence non-homologous) enzyme subtilisin.2., 3. Since then there have been additional reports of convergence of active sites (for instance in tyrosine phosphatases4 and aldo-keto reductases5), but to date there has been no comprehensive and systematic analysis to assess the relative frequencies of divergence and convergence. Here we use a recently developed database of enzyme active sites together with a program for comparing the position of residues in 3D to provide, for the first time, such a systematic survey.

Central to the description of the function of an enzyme are the concepts of the transformation it performs and the mechanistic strategy it employs. The transformation performed by an enzyme is described via the Enzyme Commission(EC) classification.6 Individual enzymes are assigned a four-digit EC number. The first of these numbers, the enzyme class, describes the overall chemical reaction that the enzyme catalyses with the subsequent numbers having different meanings depending on the class. The fourth digit describes the specificity of the enzyme reaction by defining the specific reaction substrate/product or the cofactors used. The mechanism by which this transformation is performed is the consequence of the functional residues in the protein with the key residues forming the active site. However, active sites are not consistently defined in the literature but a recent analysis7 has identified them as either (i) directly involved in the catalytic mechanism, (ii) exerting an effect on a residue or water directly involved in the mechanism, (iii) stabilizing a proposed transition-state intermediate, or (iv) affecting the substrate or cofactor so as to aid catalysis. This definition formed the basis for the development of the Catalytic Site Atlas (CSA)8 that uses literature descriptions to classify the active site residues in many of the enzymes of determined 3D structure.

The complex inter-relationships between enzyme structure and function are the results of different evolutionary processes. Homologous proteins can be identified by their adoption of a common three-dimensional structure even if the level of sequence similarity is below that detectable by current methods. Expert assessment of structural and functional features has lead to the assignment of the superfamily classification in databases such as structural classification of proteins (SCOP)9 and class architecture topology and homology (CATH).10 A superfamily groups homologous protein domains of determined structure but excludes those domains that adopt a similar fold considered to be the result of convergence.

Divergent evolution from a common ancestor often leads to enzymes with the same active site residues and the same underlying mechanism but acting on different, and often chemically related, substrates. More extensive divergence can lead to the modification of the active site residues but the enzymes are still involved in similar mechanisms.7., 11., 12.

Convergent evolution of enzyme function can be manifest in two distinct, but sometimes joint, effects. The first is when non-homologous enzymes deliver the same transformation as expressed by the same four-digit EC number and this has recently been studied systematically.13., 14., 15. In this case the enzymes involved can be termed transformational analogues. The other situation is when the same (same four-digit EC) or related (same three-digit EC) enzyme transformation is effected by a similar disposition of residues in the active site, as exemplified in the Ser-His-Asp catalytic triad shared by the trypsin family and subtilisin.16 Such enzymes are mechanistic analogues. This distinction between transformational and mechanistic analogues, which follows the one given by Doolittle,17 is not exclusive because two enzymes are assigned to both classes if they perform exactly the same overall reaction with the same mechanism.

The occurrence of transformational analogues has been studied by Galperin et al.13 and Hegyi and Gerstein.14 Galperin et al. identified 105 EC numbers, which are present in two or more apparently unrelated sequences. They were able to show that 34 of these 105 pairs of candidate analogous enzymes had distinct structural folds, while in ten cases the same fold was detected. No structural information was available for the remaining enzymes at that time. The authors argue that the most likely mechanism for the evolution of analogous enzymes appears to be the recruitment of existing enzymes that undergo a change in specificity or reaction mechanism. They also observe a correlation between the number of transformational analogues and the genome size of an organism and argue that biochemical diversity is a luxury enjoyed mostly by organisms with large genomes.

Hegyi and Gerstein cross-referenced SCOP, Swissprot18 and the ENZYME19 database to identify folds with many functions (evolutionary divergence) and functions mapping to many folds (evolutionary convergence). The authors found that more than half of the enzymatic functions are associated with at least two different folds, while less than half of the folds with enzymatic activity have at least two functions. They also found that for the enzyme-related folds there are on average 1.8 functions per fold and 2.5 folds per function and identified a list of 13 EC numbers mapping to more than one fold. This last work, operating at the fold level, potentially missed all those examples of convergent evolution involving evolutionary unrelated proteins possessing the same fold.

Neither of these two works considered the structure of the active sites and the details of the catalytic mechanisms involved. The purpose of the present work is therefore to integrate, for the first time, detailed structural and mechanistic information about active sites to identify and characterise examples of convergent evolution and to investigate how widespread this phenomenon is across enzyme space.

Section snippets

Approach

Figure 1 shows the procedure followed (see Materials and Methods for details). Starting with the Catalytic Site Atlas (CSA), 169 different three-digit EC groups were identified that included descriptions of the catalytic residues. The program Query3d20 was used to identify structurally similar active sites within each three-digit EC group. The SCOP database was then used to filter this list and identify 67 groups of three-digit EC numbers that involved matches of active site residues between

Mechanistic Analogues

The CSA contains 169 different three-digit EC groups which were considered here (catalytic sites spanning multiple chains and enzymes not classified in SCOP are not included in this work; see Materials and Methods). In 15% (26/169) of the three-digit EC groups there are two or more non-homologous proteins with similar catalytic sites (i.e. convergent evolution, mechanistic analogues) (see Figure 3(a)). This statistic represents the proportion of three-digit EC groups in which one or more

Transformational analogues in CSA

The analysis of transformational analogues involves identifying all those cases where unrelated enzymes perform the same chemical transformation (i.e. same four-digit EC number). Performing such an analysis on the same CSA dataset used for the identification of mechanistic analogues allows us to draw a parallel statistic and explore the relationship between these two modes of convergent evolution.

The enzymes contained in CSA and classified in SCOP represent 951 different four-digit EC numbers;

Discussion

One question which directly relates to this analysis is what types of evolutionary constraints lead to multiple independent inventions of the same reaction mechanism, or alternatively, to the development of different strategies to perform the same chemical reaction. Convergent evolution to the same reaction mechanism is very likely guided by specific mechanistic constraints inherent to the reaction being performed. A striking example of this is shown by the multiple mechanistic similarities

Coverage and Quality of the Datasets

The analysis of mechanistic analogues is limited to the CSA dataset and it is therefore important to assess its coverage. The 2.2.2 version of CSA used here contains 12,987 PDB codes that are present in SCOP, i.e. 98% of the 13,230 that are classified in SCOP and have an EC number assigned in PDBSprotEC234 (a database linking PDB chains with the EC classification system). The coverage in terms of EC numbers is also extensive since 951 four-digit EC numbers (excluding those where one of the

Concluding Remarks

This work lists several examples of structural matches between active sites, which have also been manually investigated to verify that the similarity of the structures actually reflects a functional relationship. These examples can therefore be used as a benchmark set for testing a local structural comparison algorithm and its ability to identify functionally meaningful structural similarities between proteins.

The key results of this systematic, general analysis of evolutionary convergence

Materials and Methods

The method is summarised in Figure 1. The aim is to compare structurally the active sites of enzymes sharing the first three digits of the EC number to identify instances of convergent evolution. The 2.2.2 version of the Catalytic Site Atlas (CSA)8 (downloaded on 1st March 2007) provided details of the catalytic residues. Each residue in the protein was represented as two points, the Cα atom and the geometric centroid of all the side-chain atoms. The active site was mapped by first calculating

Acknowledgements

The authors thank Dr Gabriele Ausiello for kindly providing the source code of the Query3d program. P.F.G. was supported by Telethon grant GGP04273 and M.N.W. was supported by a BBSRC grant.

References (235)

  • F. Forouhar et al.

    A novel nad-binding protein revealed by the crystal structure of 2,3-diketo-l-gulonate reductase (yiak)

    J. Biol. Chem.

    (2004)
  • J. Moser et al.

    Methanopyrus kandleri glutamyl-tRNA reductase

    J. Biol. Chem.

    (1999)
  • C. Baldock et al.

    The X-ray structure of Escherichia coli enoyl reductase with bound NAD+ at 2.1 Å resolution

    J. Mol. Biol.

    (1998)
  • T.L. Poulos et al.

    Crystallographic refinement of lignin peroxidase at 2 Å

    J. Biol. Chem.

    (1993)
  • K. Sugimoto et al.

    Crystal structure of an aromatic ring opening dioxygenase ligab, a protocatechuate 4,5-dioxygenase, under aerobic conditions

    Structure

    (1999)
  • A. Kita et al.

    An archetypical extradiol-cleaving catecholic dioxygenase: the crystal structure of catechol 2,3-dioxygenase (metapyrocatechase) from Pseudomonas putida mt-2

    Structure

    (1999)
  • G. Vilkaitis et al.

    The mechanism of dna cytosine-5 methylation. Kinetic and mutational dissection of hhai methyltransferase

    J. Biol. Chem.

    (2001)
  • T.J. Wyckoff et al.

    The active site of Escherichia coli udp-n-acetylglucosamine acyltransferase. chemical modification and site-directed mutagenesis

    J. Biol. Chem.

    (1999)
  • A.B. Hickman et al.

    Melatonin biosynthesis: the structure of serotonin n-acetyltransferase at 2.5 Å resolution suggests a catalytic mechanism

    Mol. Cell

    (1999)
  • A.P. Turnbull et al.

    Analysis of the structure, substrate specificity, and mechanism of squash glycerol-3-phosphate (1)-acyltransferase

    Structure

    (2001)
  • Y. Modis et al.

    Crystallographic analysis of the reaction pathway of Zoogloea ramigera biosynthetic thiolase

    J. Mol. Biol.

    (2000)
  • C. Davies et al.

    The 1.8 Å crystal structure and active-site architecture of beta-ketoacyl-acyl carrier protein synthase III (fabh) from Escherichia coli

    Structure

    (2000)
  • P. Sliz et al.

    The structure of enzyme iialactose from Lactococcus lactis reveals a new fold and points to possible interactions of a multicomponent system

    Structure

    (1997)
  • Z. Fu et al.

    The structure of a binary complex between a mammalian mevalonate kinase and ATP: insights into the reaction mechanism and human inherited disease

    J. Biol. Chem.

    (2002)
  • M. Li et al.

    Conformational changes in the reaction of pyridoxal kinase

    J. Biol. Chem.

    (2004)
  • V.D. Rao et al.

    Structure of type iibeta phosphatidylinositol phosphate kinase: a protein kinase fold flattened for interfacial phosphorylation

    Cell

    (1998)
  • W.C. Hon et al.

    Structure of an enzyme required for aminoglycoside antibiotic resistance reveals homology to eukaryotic protein kinases

    Cell

    (1997)
  • Y. Shirakihara et al.

    Crystal structure of the complex of phosphofructokinase from Escherichia coli with its reaction products

    J. Mol. Biol.

    (1988)
  • L. Kraft et al.

    Conformational changes during the catalytic cycle of gluconate kinase as revealed by X-ray crystallography

    J. Mol. Biol.

    (2002)
  • Y.W. Yin et al.

    The structural mechanism of translocation and helicase activity in T7 RNA polymerase

    Cell

    (2004)
  • S.A. Moore et al.

    The structure of truncated recombinant human bile salt-stimulated lipase reveals bile salt-independent conformational flexibility at the active-site loop and provides insights into heparin binding

    J. Mol. Biol.

    (2001)
  • A.H. West et al.

    Crystal structure of the catalytic domain of the chemotaxis receptor methylesterase, cheb

    J. Mol. Biol.

    (1995)
  • A. Dessen

    Structure and mechanism of human cytosolic phospholipase A(2)

    Biochim. Biophys. Acta

    (2000)
  • E.B. Fauman et al.

    Structure and function of the protein tyrosine phosphatases

    Trends Biochem. Sci.

    (1996)
  • D. Kostrewa et al.

    Crystal structure of Aspergillus niger ph 2.5 acid phosphatase at 2. 4 Å resolution

    J. Mol. Biol.

    (1999)
  • M.R. Islam et al.

    Active site residues of human beta-glucuronidase. evidence for Glu(540) as the nucleophile and Glu(451) as the acid-base residue

    J. Biol. Chem.

    (1999)
  • F. Alberto et al.

    The three-dimensional structure of invertase (beta-fructosidase) from thermotoga maritima reveals a bimodular arrangement and an evolutionary relationship between retaining and inverting glycosidases

    J. Biol. Chem.

    (2004)
  • Y. van Santen et al.

    1.68-Å crystal structure of endopolygalacturonase ii from Aspergillus niger and identification of active site residues by site-directed mutagenesis

    J. Biol. Chem.

    (1999)
  • J.G. Olsen et al.

    Tetrameric dipeptidyl peptidase I directs substrate specificity by use of the residual pro-part domain

    FEBS Letters

    (2001)
  • F. David et al.

    Identification of serine 624, aspartic acid 702, and histidine 734 as the catalytic triad residues of mouse dipeptidyl-peptidase IV (cd26). a member of a novel family of non-classical serine hydrolases

    J. Biol. Chem.

    (1993)
  • H. Li et al.

    Three-dimensional structure of human gamma-glutamyl hydrolase. a class I glatamine amidotransferase adapted for a complex substate

    J. Biol. Chem.

    (2002)
  • M. Bartlam et al.

    Crystal structure of an acylpeptide hydrolase/esterase from Aeropyrum pernix k1

    Structure

    (2004)
  • Y. Odagaki et al.

    The crystal structure of pyroglutamyl peptidase I from Bacillus amyloliquefaciens reveals a new structure for a cysteine protease

    Structure

    (1999)
  • J. Wang et al.

    The structure of clpp at 2.3 Å resolution suggests a model for ATP-dependent proteolysis

    Cell

    (1997)
  • B.W. Matthews et al.

    Three-dimensional structure of tosyl-alpha-chymotrypsin

    Nature

    (1967)
  • J. Kraut

    Serine proteases: structure and mechanism of catalysis

    Annu. Rev. Biochem.

    (1977)
  • J. Drenth et al.

    A comparison of the three-dimensional structures of subtilisin bpn' and subtilisin novo

    Cold Spring Harbor Symp. Quant. Biol.

    (1972)
  • M. Zhang et al.

    Crystal structure of bovine heart phosphotyrosyl phosphatase at 2.2-Å resolution

    Biochemistry

    (1994)
  • M.J. Bennett et al.

    Structure of 3 alpha-hydroxysteroid/dihydrodiol dehydrogenase complexed with NADP+

    Biochemistry

    (1996)
  • E.C. Webb

    Enzyme nomenclature. Reccomendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology

    (1992)
  • Cited by (95)

    • Conformational Variation in Enzyme Catalysis: A Structural Study on Catalytic Residues

      2022, Journal of Molecular Biology
      Citation Excerpt :

      This initial description will be enhanced in the future, to aid in better understanding catalysis and designing enzymes of novel function. Τhe scope of this paper is to examine the phenomenon of 3D variation within identical or similar enzymes, without exploring plasticity of active sites6 and convergent evolution.41 However, the computational pipeline implemented during this work (CSA-3D) is designed to allow the study of plasticity, by generating active site 3D templates.42–45

    • A single residue determines substrate preference in benzylisoquinoline alkaloid N-methyltransferases

      2020, Phytochemistry
      Citation Excerpt :

      These comparisons highlight how, in one case, distantly related and structurally divergent enzymes have arrived at roughly analogous mechanistic strategies to control substrate specificity and, in another, structurally similar enzyme families use identical variants (i.e. E/G substitution) to dictate either substrate specificity or reaction chemistry. Along with many similar observations in other biochemical contexts, our results further support the notion that biological catalysts rely on a small set of conserved physicochemical strategies, upon which unrelated macromolecules readily converge (Almonacid and Babbitt, 2011; Gherardini et al., 2007; Zhang and Klinman, 2016). CNMT and RNMT enzymes have important and distinct roles in the biosynthesis of BIAs.

    View all citing articles on Scopus
    View full text