Emergence of diverse biochemical activities in evolutionarily conserved structural scaffolds of proteins

https://doi.org/10.1016/S1367-5931(02)00018-2Get rights and content

Abstract

Comparative analysis of numerous protein structures that have become available in the past few years, combined with genome comparison, has yielded new insights into the evolution of enzymes and their functions. In addition to the well-known diversification of substrate specificities, enzymes with several widespread catalytic folds, particularly the TIM barrel, the RRM-like domain and the double-stranded β-helix (cupin) domain, have been extensively explored in ‘reaction space’, resulting in the evolution of numerous, diverse catalytic activities supported by the same structural scaffold. Common protein folds differ widely in the diversity of catalyzed reactions. The biochemical plasticity of a fold seems to hinge on the presence of a generic, symmetrical substrate-binding pocket as opposed to highly specialized binding sites.

Introduction

On the basis of the sequences and structures of a limited set of proteins, Zuckerkandl and Pauling [1] proposed, almost 40 years ago, that proteins with disparate biochemical properties could evolve from a single ancestral protein. The accumulation of protein sequence and structure data since their pioneering work has only reinforced their contention, making it one of the central tenets of evolutionary biology. Subsequent analysis of sequences and structures has shown that the biochemical mechanism of catalysis typically relies on a constellation of a few amino acid residues that are embedded in a distinct globular domain. The rest of the globular domain functions as a structural scaffold supporting the catalytic centre and also contributes to interactions with the substrates, co-factors and other proteins. This principle had been extensively exploited by natural selection to generate the diversity seen in the universe of enzymes from a relatively small number of ancestral enzymes. Generally, the set of catalytic residues is highly conserved during evolution, whereas variations tend to occur in the substrate-binding and cofactor-binding sites. This results in families of enzymes exploring ‘substrate space’ by adapting essentially the same biochemical activity on a range of different substrates. Another common variation on this theme is the fusion of various accessory domains to the catalytic domain, which results in diversity of allosteric regulation [2] or cellular localization of the enzyme without changing the catalytic activity.

Analysis of the numerous, recently amassed protein crystal structures revealed other, less-obvious principles that govern evolutionary diversification of enzymes. On several occasions, enzymes with similar catalytic residues have been shown to explore considerable diversity in ‘reaction space’. For example, sequence and structure comparisons indicated that bacterial DnaG-type primases and most topoisomerases (except the topoisomerase IB) share an evolutionarily related catalytic domain (the so-called TOPRIM domain) with an identical core set of catalytic residues 3., 4., 5.. Thus, two distinct types of reactions, namely nucleotidyltransferase (transferase; class 2 according to the EC classification) and topoisomerase (isomerase; class 5 according to the EC classification), have evolved within the same family of enzymes. However, because these reactions involve manipulations of the same phosphoester bonds in similar substrates (nucleotides or polynucleotides), it is easy to conceive their common origin and dependence on the same set of catalytic residues. A more dramatic illustration of the exploration of the reaction space emerged from the crystal structures of several enzymes in the stem glycolytic (Embden–Meyerhoff) pathway: four of the glycolytic enzymes, 1,6 fructose bisphosphate aldolase, triose phosphate isomerase, enolase and pyruvate kinase, that catalyse three distinct reactions, have the same structural scaffold, the TIM barrel 6., 7.••. The TIM barrel (named after triose phosphate isomerase) comprises eight β-strands and eight α-helices, which are arranged in a cyclically symmetric pattern, with the strands lining the inner core of the barrel. Despite the differences in the catalyzed reactions, the substrate-binding pockets of the three TIM-barrel glycolytic enzymes are located in a similar spatial configuration, in the interior of the barrel lined by the β-strands 6., 7.••, 8.. However, the residues that directly participate in catalysis are different in each of these enzymes. These observations suggested that evolution of enzymes involved not merely changes in substrate specificity, but also considerable exploration of the reaction space. The evolutionary changes involved could range from the use of similar catalytic residues on similar bonds to dramatic variations in the catalytic residues, while retaining the same structural scaffold 9.••, 10.••.

Traditionally, the presence of a characteristic set of conserved residues had been the cornerstone of the identification of enzymes through sequence analysis. However, in cases such as the TIM barrel, where the structure is preserved but catalytic residues vary, detection of evolutionary relationships had been far more difficult. The explosion in the number of structures and sequences in the past few years facilitates the detection of such subtle relationships 7.••, 8.. This allows one to address meaningful general questions regarding the origin and evolution of diverse biochemical activities in related structural contexts. Here, we attempt to use the information from recently reported structures and sequences to provide a perspective on the major principles that seem to shape the emergence of diverse enzymatic activities within structurally similar and evolutionarily related scaffolds.

Section snippets

Structural classes and elementary comparative genomics of enzymatic domains

Like all other globular domains, the catalytic domains of enzymes come in four major structural classes:

  • 1.

    α/β domains with regularly repeating α–β units.

  • 2.

    α+β domains with interspersed and segregated α and β elements.

  • 3.

    β domains predominantly composed of β-strands.

  • 4.

    α-domains composed primarily of α-helices, including integral membrane enzymes whose catalytic residues are embedded within the trans-membrane helices.

Details of these structural classes are available through the SCOP [11••] (//scop.mrclmb.cam.ac.uk/scop/

Of shapes and functions: general structural properties of folds that favour diverse catalytic activities

The activities of two enzymes can be considered distinct when they involve completely different types of reactions or different bonds in substrates. Thus, hydrolysis of ATP and GTP are counted as the same activity because they involve the same bond in related substrates. By contrast, hydrolysis of ATP and hydrolysis of a polysaccharide are considered distinct activities because different types of bonds are involved. Likewise, hydrolysis of polysaccharides and isomerization of a sugar are

A timeline for the evolutionary colonization of ‘reaction space’ by different protein scaffolds

Comparative genomics allows us to address an important question of the relative timing of the derivation of various enzymatic activities during the evolution of a fold. Different catalytic activities can now be mapped onto the phylogenetic tree of life and the point of emergence of a given activity can be deduced from the resulting distribution. A survey of the major folds with respect to the temporal points of ‘invention’ of a particular activity reveals striking differences. In the case of

Conclusions

Natural selection has a strong tendency to preserve the basic folds of protein domains, while tinkering with their sequences to generate diverse biochemical properties. However, different folds show varying propensities with respect to accommodating entirely new catalytic properties. Among the common enzymatic folds, some are highly conservative in terms of diversification of catalytic activities, whereas others are extremely versatile. These differences arise from a range of subtle structural

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:

  • of special interest

  • ••

    of outstanding interest

References (51)

  • Z. Chen et al.

    Structure at 1.9 A resolution of a quinohemoprotein alcohol dehydrogenase from pseudomonas putida HK5

    Structure

    (2002)
  • L. Aravind et al.

    The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases

    Genome Biol.

    (2001)
  • R. Anand et al.

    Structure of oxalate decarboxylase from Bacillus subtilis at 1.75 Å resolution

    Biochemistry

    (2002)
  • S.C. Trewick et al.

    Oxidative demethylation by Escherichia coli AlkB directly reverts DNA base damage

    Nature

    (2002)
  • L. Aravind et al.

    Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches

    J. Mol. Biol.

    (1999)
  • S.K. Katti et al.

    Crystal structure of muconolactone isomerase at 3.3 Å resolution

    J. Mol. Biol.

    (1989)
  • K. Min et al.

    Crystal structure of human nucleoside diphosphate kinase A, a metastasis suppressor

    Proteins

    (2002)
  • A.A. Schaffer et al.

    IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices

    Bioinformatics

    (1999)
  • L. Aravind et al.

    Toprim–a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins

    Nucleic Acids Res.

    (1998)
  • M. Podobnik et al.

    A TOPRIM domain in the crystal structure of the catalytic core of Escherichia coli primase confirms a structural link to DNA topoisomerases

    J. Mol. Biol.

    (2000)
  • G.K. Farber et al.

    The evolution of alpha/beta barrel enzymes

    Trends Biochem. Sci.

    (1990)
  • G.J. Bartlett et al.

    Analysis of catalytic residues in enzyme active sites

    J. Mol. Biol.

    (2002)
  • D.W. Buchan et al.

    Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database

    Genome Res.

    (2002)
  • V. Anantharaman et al.

    Comparative genomics and evolution of proteins involved in RNA metabolism

    Nucleic Acids Res.

    (2002)
  • S.A. Teichmann et al.

    Advances in structural genomics

    Curr. Opin. Struct. Biol.

    (1999)
  • Cited by (126)

    • Structural and functional insights into the role of a cupin superfamily isomerase in the biosynthesis of Choi moiety of aeruginosin

      2019, Journal of Structural Biology
      Citation Excerpt :

      Analysis of the sequence motif via a NCBI Conserved Domain Search (Marchler-Bauer et al., 2017) shows that AerE is composed of two cupin domains. The cupin domain is a double-stranded β-helix domain that is widely distributed in various types of metal ion-dependent enzymes with different catalytic activities (Anantharaman et al., 2003; Dunwell et al., 2004), non-enzyme seed storage proteins (Dunwell, 1998) and proteins that contribute to plant protection against radioactive contamination during growth and reproduction (Gábrišová et al., 2016). The enzymes that contain cupin domains can be assigned to the cupin superfamily because they have an identical overall architecture and similar active site conformation.

    • Novel Families of Archaeo-Eukaryotic Primases Associated with Mobile Genetic Elements of Bacteria and Archaea

      2018, Journal of Molecular Biology
      Citation Excerpt :

      By contrast, archaea and eukaryotes encode homologous heterodimeric primases evolutionarily unrelated to DnaG and consisting of the small catalytic (PriS) and the large regulatory (PriL) subunits. The former contains a highly derived version of the RNA recognition motif fold, which is also present in viral RNA-dependent RNA polymerases, reverse transcriptases, cyclases, and DNA polymerases of the A/B/Y families [2,3], whereas the PriL consists of two largely α-helical domains, where the C-terminal domain contains a 4Fe–4S cluster [4–6]. Although archaeo-eukaryotic primases (AEPs) are not directly involved in bacterial DNA replication, many bacteria encode AEP homologs, some of which have important well-defined cellular functions.

    • Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases

      2016, Methods
      Citation Excerpt :

      Often, a superfamily is defined as a set of proteins with a similar structural fold and function. However, there are cases where structural folds are found to include a variety of functional types, as in the TIM barrel and Rossmann folds [50,51]. In addition, enzymes of similar function can have different structures.

    View all citing articles on Scopus
    View full text