Evolutionary history and higher order classification of AAA+ ATPases
Introduction
A large part of the proteome of any organism is devoted to proteins that bind nucleoside triphosphates and, typically, utilize them as substrates in various reactions (reviewed in Vetter and Wittinghofer, 1999). Several distinct NTP-binding protein folds have been structurally characterized to date, but amongst these the P-loop NTPases (Saraste et al., 1990) are by far the most abundant class, which accounts for 10–18% of the predicted gene products in the sequenced prokaryotic and eukaryotic genomes (Koonin et al., 2000a). Proteins with P-loop NTPase domains are also present in the majority of viruses studied to date (Gorbalenya and Koonin, 1989). The P-loop NTPases are thought to be a monophyletic assemblage of protein domains, and several distinct versions of this domain are traceable to the last universal common ancestor (LUCA) of all modern cellular life forms (Kyrpides et al., 1999; Leipe et al., 2002). This suggests that the P-loop domain originated long before the time of the LUCA and had undergone considerable structural and functional diversification prior to this period. Thus, understanding the natural history of P-loop NTPases is critical for understanding the key aspects of life’s evolution, ranging from the early phases to the radiation of major organismal lineages.
Most members of the P-loop NTPase fold hydrolyze the β–γ phosphate bond of a bound nucleoside triphosphate, most often, ATP or GTP. The free energy of this hydrolysis reaction is typically utilized to induce conformational changes in other molecules. This constitutes the basis of the biochemical activities and biological functions of most P-loop fold proteins. In contrast, members of one major lineage of P-loop proteins, the kinases, transfer the ATP γ-phosphate to diverse substrates (Leipe et al., 2003). Structurally, P-loop domains adopt a 3-layered α/β sandwich configuration that contains regularly recurring α–β units with the β-strands forming a central, mostly parallel β-sheet surrounded on both sides by α-helices (Milner-White et al., 1991) (see also the SCOP database (Murzin et al., 1995): http://scop.mrc-lmb.cam.ac.uk/scop/). At the sequence level, P-loop NTPases are generally characterized by two conserved sequence motifs, the Walker A and B motifs, which bind, respectively, the β and γ phosphate moieties of the bound NTP, and a Mg2+ cation (Saraste et al., 1990; Vetter and Wittinghofer, 1999; Walker et al., 1982).
Sequence and structure analyses suggest that the primary diversification event in the evolution of the P-loop fold resulted in the two principal classes of the P-loop domains. The first of these, the KG (Kinase–GTPase) division includes the kinases and GTPases that share number of structural similarities, such as the adjacent placement of the P-loop and Walker B strands. The other class, the ASCE division (for additional strand, catalytic E), is characterized by an additional strand in the core sheet, which is located between the P-loop strand and the Walker B strand (Leipe et al., 2002, Leipe et al., 2003; Fig. 1). As opposed to kinases and GTPases, ATP hydrolysis by the proteins of the ASCE group typically depends on a conserved catalytic (proton-abstracting) glutamate that primes a water molecule for the nucleophilic attack on the γ-phosphate group of ATP. The ASCE division includes AAA+, ABC, PilT/VirD4, superfamily 1/2 (SF1/2) helicases, and RecA/F1/F0 superfamilies of ATPases, along with several additional, less confidently classified families.
Starting over a decade ago, the AAA ATPases (ATPases associated with a variety of cellular activities) were encountered in studies on an astonishing range of biochemical systems (Confalonieri and Duguet, 1995; Lupas and Martin, 2002; Ogura and Wilkinson, 2001). These included, among others, the eukaryotic proteasomal ATPases, CDC48, and FtsH, which are involved in processes related to protein stability and degradation in bacteria and eukaryotes, NSF, which is implicated in vesicular fusion, Pex1p, involved in peroxisome biogenesis, and Bcs1p, which participates in the assembly of mitochondrial membrane complexes. Approximately around the same time, a detailed computational analysis of various cellular and viral proteins involved in nucleic acid metabolism, such as DnaA, the MCM proteins, NtrC-type transcription factors, and helicases of various RNA and DNA viruses comprising the helicase “superfamily 3” (SF3), suggested that all these proteins shared a conserved ATPase domain (Koonin, 1993).
Solution of the X-ray structure of the NSF protein and its comparison with the clamp loader subunit structure supported the unification of these ATPases into a monophyletic group (Guenther et al., 1997; Lenzen et al., 1998). Concomitantly, we conducted a systematic analysis of these ATPase domains using advanced sequence profile analysis methods and structural comparisons, which resulted in the unification of the bona fide AAA ATPase and the DnaA/MCM/NtrC/SFIII-related proteins into a single, monophyletic “AAA+” class (Neuwald et al., 1999). Additionally, this analysis showed that various other ATPase domain families, such as ClpAB/Hsp100, ClpX, HslU, and Lon, which are involved in protein folding and degradation, the eukaryotic motor protein dynein, a large, conserved eukaryotic protein with 6 ATPase domains (subsequently termed midasin), magnesium and cobalt chelatases, the bacterial DNA-replication clamp loaders, and eukaryotic replication factor C subunits, also belonged to the AAA+ class. It was also proposed that the AAA+ domain might be a common denominator in the catalytic assembly or disassembly of large cellular complexes of polypeptides and nucleic acids and that the majority of AAA+ ATPases function as oligomeric ring structures, which provide symmetric or quasi-symmetric surfaces for interactions with other molecules or a central pore for threading molecules in an extended conformation (Neuwald, 1999; Neuwald et al., 1999).
Since the publication of the original analysis of the AAA+ class, a wealth of structural and biochemical studies have been published that have strongly reinforced the monophyly of AAA+ ATPases and elucidated intricate functional details of how oligomeric rings of AAA+ proteins could be deployed in various biological contexts (Dougan et al., 2002; Lupas and Martin, 2002; Ogura and Wilkinson, 2001). Currently over 15 structures of distinct types of the AAA+ domain are available (Fig. 1). This data, along with the genome sequences of diverse organisms from many of the principal phylogenetic lineages, provides for a “post-genomic” vantage point to address several issues, which have not been tractable previously: (1) A formal, unified definition of the AAA+ class that combines sequence and structural information. (2) The higher order relationships within the AAA+ class. (3) The earliest events in the evolution of AAA+ ATPases and its differentiation from the other ASCE ATPases. (4) The trends in colonization of various functional niches during the evolution of this class of ATPases. Here, we address these problems, particularly in light of the new information that became available since the previous survey of the AAA+ class fo ATPases (Neuwald et al., 1999).
Section snippets
The defining structural and catalytic features of the AAA+ ATPases
We collated all currently available structures of proteins that have been confidently assigned to the AAA+ class and prepared a multiple alignment of their sequences on the basis of their structure superposition. This allowed us to map all the major conserved sequence features to their 3D structural cognates in the core AAA+ domain. Furthermore, this structure-based alignment also enabled correction of the earlier alignments (Koonin, 1993; Neuwald et al., 1999), which were derived principally
General conclusions
Our understanding of the AAA+ ATPases, especially in structural and mechanistic terms, has vastly improved since the publication of the previous survey of this protein class (Neuwald et al., 1999). Using the wealth of structures and genomic information currently at our disposal, we identified the defining structural features of the AAA+ superclass and constructed an evolutionary classification along with a reconstruction of some major aspects of their evolutionary history. In particular, some
Materials and methods
Sequences of AAA+ proteins were extracted from the non-redundant (NR) protein sequence database (National Center for Biotechnology Information, NIH, Bethesda) using the PSI-BLAST program (Altschul et al., 1997), with the sequences of known AAA+ proteins as queries. Sequence similarity-based protein clustering was performed using the BLASTCLUST program (ftp://ftp.ncbi.nih.gov/blast/documents/README.bcl). Multiple alignments were constructed using the Clustal X (Thompson et al., 1997) or T-Coffee
Acknowledgements
On account of constraints of space we have, regrettably, been unable to cite a large number of primary papers on functions of AAA+ ATPases. We have mainly concentrated on works presenting structures and experimental papers that are directly relevant to the higher order classification. We apologize for the omissions of other important works. Supplementary material including all Genbank identifiers of AAA+ proteins identified in this work, a more inclusive sequence alignment, alignments for the
References (110)
Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains
J. Mol. Biol.
(2001)- et al.
Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches
J. Mol. Biol.
(1999) FtsK Is a DNA motor protein that activates chromosome dimer resolution by switching the catalytic state of the XerC and XerD recombinases
Cell
(2002)A six-stranded double-psi beta barrel is shared by several protein superfamilies
Structure Fold Des.
(1999)The solution structure of VAT-N reveals a ’missing link’ in the evolution of complex enzymes from a simple betaalphabetabeta element
Curr. Biol.
(1999)- et al.
Mechanisms of DNA replication
Curr. Opin. Chem. Biol.
(2000) FtsK: Maxwell’s Demon?
Mol. Cell
(2002)AAA+ proteins and substrate recognition, it all depends on their partner in crime
FEBS Lett.
(2002)- et al.
Archaea and the origin(s) of DNA replication proteins
Cell
(1997) - et al.
A phylogenomic study of DNA repair genes, proteins, and processes
Mutat. Res.
(1999)