Journal of Molecular Biology
Volume 333, Issue 3, 24 October 2003, Pages 621-639
Journal home page for Journal of Molecular Biology

Functional Recycling of C2 Domains Throughout Evolution: A Comparative Study of Synaptotagmin, Protein Kinase C and Phospholipase C by Sequence, Structural and Modelling Approaches

https://doi.org/10.1016/j.jmb.2003.08.052Get rights and content

Abstract

The C2 domain is one of the most frequent and widely distributed calcium-binding motifs. Its structure comprises an eight-stranded β-sandwich with two structural types as if the result of a circular permutation. Combining sequence, structural and modelling information, we have explored, at different levels of granularity, the functional characteristics of several families of C2 domains. At the coarsest level, the similarity correlates with key structural determinants of the C2 domain fold and, at the finest level, with the domain architecture of the proteins containing them, highlighting the functional diversity between the various sub-families. The functional diversity appears as different conserved surface patches throughout this common fold. In some cases, these patches are related to substrate-binding sites whereas in others they correspond to interfaces of presumably permanent interaction between other domains within the same polypeptide chain. For those related to substrate-binding sites, the predictions overlap with biochemical data in addition to providing some novel observations. For those acting as protein–protein interfaces, our modelling analysis suggests that slight variations between families are a result of not only complementary adaptations in the interfaces involved but also different domain architecture. In the light of the sequence and structural genomic projects, the work presented here shows that modelling approaches along with careful sub-typing of protein families will be a powerful combination for a broader coverage in proteomics.

Introduction

In recent years various sequencing projects have uncovered the complete genomes for several organisms,1., 2. amplifying the existing gap between structural and sequence data. Recent initiatives aim to address this deficiency by providing a representative structure for each protein family from which all other genomic sequences may be modelled to at least a medium level of accuracy.3., 4. Current efforts combine a variety of bioinformatic and modelling techniques to mine evolutionary constraints from sequences and structures of related proteins, providing clues about their function and specificity to experimentalists.

Identification and combination of homologues into multiple sequence alignments (MSA) allow deduction of common features between sequences from the level of conservation at every position of the alignment.5., 6. Conserved positions are usually structurally or functionally essential. Generally, an MSA of a considerable number and diversity of homologous sequences will present a mixture of orthologues and paralogues, i.e. of some proteins functionally equivalent and others with specific variations of a similar mechanism or even a new function. Such an MSA would represent a complete protein family divisible into sub-groups, i.e. protein sub-families. In this case, invariant residues across all sequences are structurally important for the acquisition and maintenance of a fold and/or retained as part of the active site, whereas positions conserved only within sub-families, i.e. class-specific residues, are likely to describe specific functional characteristics.

Several methods have been developed for the identification and exploitation of invariant and class-specific residues from MSA.7., 8., 9., 10. Irrespective of the method, the identified residues can be mapped, at a later stage, onto a representative protein structure. This enables the detection of areas with a relatively high density of conserved residues, which tend to correlate with regions of structural or functional significance. Recently, three-dimensional clustering analysis (3DCA) has incorporated structural information as an integral part of the calculation of conservation scores.11 This is achieved by considering the conservation of each residue as the average of the conservation of that amino acid and its spatial neighbours. All the above techniques have been successful in the identification and prediction of functional key residues in catalytic sites, ligand-binding pockets and protein–protein interfaces.7., 9., 10., 11., 12., 13., 14., 15., 16.

These methods assume that different sub-families maintain the same functional sites and that evolved differences are only small variations around an ancestral common active site. Problems could arise if variations have generated a considerable shift of the functional site between sub-families up to the point that there is no spatial overlap between them. This also applies to novel patches that could arise, for example, from a fusion between the domain of interest and other domains with which it will eventually establish permanent interactions. In this scenario, the functional sites would be averaged out, precluding their identification. This can be prevented, or at least minimised, by a prior sequence sorting into functionally consistent groups. This separation could be achieved by sequence similarity thresholding,17 identification of well-defined clusters in phylogenetic trees,10., 18. or sub-type profiling.15., 19., 20.

In this work, sequences and structures of the C2 domains of phospholipases C (PLC), classical isoforms of protein kinases C (PKC) and synaptotagmins (Syt) were studied in order to investigate the differential localisation of conserved and variable regions in their surface. The structural templates on which conservation was mapped included both atomic models determined experimentally and those computationally built based on the comparison to an existent related structure. Moreover, homology modelling was used to broaden not only the 3D repertoire of C2 domains but also of C2-containing protein complexes. The latter was achieved by combining 3D modelling and protein docking approaches to model the protein–protein interfaces of sequences related to known complexes. The various computational techniques employed can be connected into a logical flow as summarised in Figure 1.

C2 domain is an ideal model because: (i) it is an abundant domain, which translates into a wealth of sequence information, (ii) it presents two main structural types with solved structures available, (iii) it is found as individual domains or in stable interactions as part of multidomain proteins, (iv) its calcium and lipid-binding properties have undergone specific variations between sub-types reflecting the function of the proteins containing them, and (v) a wealth of biochemical information is available for most of the proteins considered, i.e. Syt, PKC and PLC.

The C2 domain fold is an eight-stranded β-sandwich in which the calcium-binding regions (CBR) are located in the loops at one end of the structure (Figure 2(a)).21., 22., 23. There exist two structural types, synaptotagmin-like (type-I)24 and PLC-like (type-II),25 in which the first strand in type-I, as if the result of a circular permutation, becomes the last strand in type-II (Figure 2(a) and (b)). As a consequence, both the N and C termini are adjacent in space to the CBRs in type-I domains, while they are at the opposite end in type-II. Additionally, type-I can be divided into two minor sub-types, which differ by the absence or presence of a helical insertion between the last two strands (Figure 2(a)). The former corresponds to the structure of the first C2 domain (C2A) of Syts24 as well as the C2 domain of α, β and γ-isoforms of PKC.26 The helical insertion seems characteristic of the second C2 domain (C2B) of Syts.27., 28.

Syts are involved in membrane fusion during the late stage of exocitosis of secretory vesicles with the plasma membrane.29., 30., 31. They contain an N-terminal transmembrane segment tethered to a pair of C2 domains in tandem that are separated by a short linker (Figure 2(c)).30 These two homologous domains bind to, and possibly insert into the membrane. However, they have developed subtle functional differences in terms of calcium and lipid binding.32

The three classical isoforms of PKC (namely α, β and γ) are multidomain proteins in which their single type-I C2 domain is accompanied by diacylglycerol (DAG)-binding and phosphotransferase domains (Figure 2(c)).33., 34. The PKC C2 domain binds specifically to phosphatidylserine-containing membranes in a calcium-regulated manner.

Eukaryotic phosphoinositide-specific PLC enzymes catalyse the hydrolysis of phosphatidylinositol-4,5-biphosphate to the second messengers, inositol-1,4,5-triphosphate and DAG.35., 36. The basic PLC core contains an EF-hand domain, a catalytic triosephosphate isomerase-like domain with two distinct sub-domains (PLCXC and PLCYC), followed by a C2 domain (Figure 2(c)). In mammals, additional regulatory domains are found at the N and C termini as well as in between the PLCXC and PLCYC domains. Some PLC classes present C2 domains involved in calcium-regulated membrane binding, whereas in others the C2 functionality has been lost or is unknown.

Here, we have compared the sequence and structure of several C2 domains at different levels of granularity, from general structural to specific functional features. Our computational analysis demonstrates the existence of common and type-specific structural residues and that the significantly conserved regions on C2 domains of Syt, PKC and PLC are located in different areas in their three-dimensional structures. To summarize our results, C2A and PKC-C2 have highly conserved calcium-binding sites at one end of the domains, which include loops containing aromatic residues known to insert into the membrane. C2B has a conserved patch at one side of the domain, just underneath its weaker calcium-binding site. This patch is rich in lysine residues and may mediate electrostatic interactions to phospholipids in the absence of calcium. The PLC-C2 domains present very variable CBR lengths. Their common conserved areas, found in the β-sandwich body, correspond to interfaces of interaction with their neighbouring domains. Also, the differential conservation between PLC sub-families reveals some class-specific residues that may be responsible for their varied mechanisms of action. Finally, we draw some general conclusions on the value of combining these computational tools for the analysis of functional sub-typing.

Section snippets

Sub-typing of C2-containing proteins

Sequences were retrieved by PSI-BLAST searches on the non-redundant database at the NCBI using as queries sequences of representative C2 domains with solved structure (Materials and Methods). The PKC group was homogeneous in the sense that it included only annotated PKCs (α, β and γ) or sequences of similar length and identical domain composition with known PKCs. Of all the 16 sequences in the final set, the least similar to the original query still shared 35% sequence identity.

The

Discussion

The sequence comparison between all sub-families highlights the conservation of the buried core of both type-I and type-II C2 domains. It also reveals two important structural positions involved in tight bends, a proline in CBR-2 and a glycine at the base of CBR-3. These two loops are usually very conserved in length and show a very good alignment in 3D, in contrast to the other loops that present higher flexibility. Interestingly, key structural differences between type-I and type-II highlight

Collecting C2 domain-containing proteins

A representative C2 domain sequence for each protein family (Syt, PKC and PLC) was chosen from solved structures and used as query in PSI-BLAST searches59 on a non-redundant database at the NCBI until convergence or a maximum of ten iterations with a threshold of 1E-10, 1E-15 and 1E-10, respectively. The C2 domain queries corresponded to: (i) β-PKC, 1a25 (159–277), (ii) δ-PLC, 1djx (628–756) and (iii) Syt III, 1dqv, including both C2A and C2B domains (298–566). For the significant matches of

Acknowledgements

We thank M. Katan, B. Davletov and N. McDonald for critical reading of the manuscript, and Michael P. Mitchell for technical assistance. Owing to space limitations, some of the articles from which data were extracted are not cited. References to these works can be found in the reviews included here.

References (74)

  • C.A. Wilson et al.

    Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores

    J. Mol. Biol

    (2000)
  • S.S. Hannenhalli et al.

    Analysis and prediction of functional sub-types from protein sequence alignments

    J. Mol. Biol

    (2000)
  • J. Rizo et al.

    C2-domains, structure and function of a universal Ca2+-binding domain

    J. Biol. Chem

    (1998)
  • R.B. Sutton et al.

    Structure of the first C2 domain of synaptotagmin I: a novel Ca2+/phospholipid-binding fold

    Cell

    (1995)
  • I. Fernandez et al.

    Three-dimensional structure of the synaptotagmin 1 C2B-domain: synaptotagmin 1 as a phospholipid binding machine

    Neuron

    (2001)
  • G. Schiavo et al.

    Synaptotagmins: more isoforms than functions?

    Biochem. Biophys. Res. Commun

    (1998)
  • R. Jahn et al.

    Membrane fusion

    Cell

    (2003)
  • B.A. Davletov et al.

    A single C2 domain from synaptotagmin I is sufficient for high affinity Ca2+/phospholipid binding

    J. Biol. Chem

    (1993)
  • W. Cho

    Membrane targeting by C1 and C2 domains

    J. Biol. Chem

    (2001)
  • M.R. Wing et al.

    Activation of phospholipase C-epsilon by heterotrimeric G protein betagamma-subunits

    J. Biol. Chem

    (2001)
  • S.L. Osborne et al.

    Calcium-dependent oligomerization of synaptotagmins I and II. Synaptotagmins I and II are localized on the same synaptic vesicle and heterodimerize in the presence of calcium

    J. Biol. Chem

    (1999)
  • A.F. Davis et al.

    Kinetics of synaptotagmin responses to Ca2+ and assembly with the core SNARE complex onto membranes

    Neuron

    (1999)
  • E.R. Chapman et al.

    Direct interaction of a Ca2+-binding loop of synaptotagmin with lipid bilayers

    J. Biol. Chem

    (1998)
  • D. Murray et al.

    Electrostatic control of the membrane targeting of C2 domains

    Mol. Cell

    (2002)
  • J. Bai et al.

    Membrane-embedded synaptotagmin penetrates cis or trans target membranes and clusters via a novel mechanism

    J. Biol. Chem

    (2000)
  • M. Fukuda et al.

    Functional diversity of C2 domains of synaptotagmin family. Mutational analysis of inositol high polyphosphate binding domain

    J. Biol. Chem

    (1995)
  • M. Medkova et al.

    Mutagenesis of the C2 domain of protein kinase C-alpha. Differential roles of Ca2+ ligands and membrane binding residues

    J. Biol. Chem

    (1998)
  • W.F. Ochoa et al.

    Additional binding sites for anionic phospholipids and calcium ions in the crystal structures of complexes of the C2 domain of protein kinase Cα

    J. Mol. Biol

    (2002)
  • A.A. Bogan et al.

    Anatomy of hot spots in protein interfaces

    J. Mol. Biol

    (1998)
  • L. Otterhag et al.

    N-terminal EF-hand-like domain is required for phosphoinositide-specific phospholipase C activity in Arabidopsis thaliana

    FEBS Letters

    (2001)
  • H. Pappa et al.

    Crystal structure of the C2 domain from protein kinase C-δ

    Structure

    (1998)
  • A.E. Todd et al.

    Sequence and structural differences between enzyme and non-enzyme homologs

    Structure

    (2002)
  • T.A. Tatusova et al.

    BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences

    FEMS Microbiol. Letters

    (1999)
  • A. Sali et al.

    Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming

    J. Mol. Biol

    (1990)
  • R.M. Jackson et al.

    Rapid refinement of protein interfaces incorporating solvation: application to the docking problem

    J. Mol. Biol

    (1998)
  • S.J. Hubbard et al.

    Molecular recognition. Conformational analysis of limited proteolytic sites and serine proteinase protein inhibitors

    J. Mol. Biol

    (1991)
  • E.S. Lander et al.

    Initial sequencing and analysis of the human genome

    Nature

    (2001)
  • Cited by (0)

    Supplementary data associated with this article can be found at doi: 10.1016/j.jmb.2003.08.052

    View full text