Chapter 3 - Natural History of the Eukaryotic Chromatin Protein Methylation System
Introduction
The three superkingdoms of life utilize very distinct strategies for packaging their genomic DNA. Most bacteria utilize members of the IHF/HU family as their primary DNA-packaging protein.1 In addition, certain bacteria, such as chlamydiae, have specialized DNA-packaging proteins of the HC1/HC2 family that function in establishing the condensed chromatin that is typical of certain stages of their life cycle.2, 3 Archaea show a surprising diversity of DNA-packaging proteins that include members of the Alba, MC1, Sac7/Cren7/Sso7, and histone fold families.4, 5 The histone fold proteins, which they share with eukaryotes, are primarily observed only in two of the great divisions of archaea, namely the euryarchaea and the thaumarchaea. The archaeal histones represent a packaging strategy that appears to have been the precursor of the eukaryotic system. Currently characterized archaeal nucleosomes comprise a single or a pair of distinct histone subunits, assembling into a tetramer that wraps ~ 80 base-pairs (bp) of DNA around it (comparable to the eukaryotic histone H3–H4 tetrasomes).4 The origin of the eukaryotes was accompanied by a dramatic development of this ancestral histone template. First, there was proliferation followed by divergence resulting in four distinct histones (H2A, H2B, H3, and H4) that are conserved throughout eukarya.4 Second, these histones assembled into an octamer, as opposed to the archaeal tetrasome, and wrapped nearly twice as much DNA (~ 146 bp).4 Third, the eukaryotic histones acquired extensions to the N-terminus and/or C-terminus of the globular DNA-binding histone fold, that are enriched in positively charged residues.6 These extensions are known as the histone tails and provide additional surfaces that neutralize the negative charges of the DNA backbone.
Emergence of the histone octamer-based packaging in eukaryotes was also accompanied by several other major structural innovations pertaining to chromosomal organization.5 Right in the common ancestor of all extant eukaryotes a transition was made from the predominantly circular chromosomes of prokaryotes to multiple linear ones whose ends are capped by telomeres. Further, the chromosomes were separated from the rest of the cell by a membrane bi-layer, resulting in the quintessential feature of the eukaryotes, the nucleus.7, 8, 9 Emergence of the nucleus decoupled cytoplasmic translation from nuclear transcription and marked a major departure from the prokaryotic situation. This appears to have relaxed the constraints on the eukaryotic genes allowing them to be colonized by introns, as mRNA was no longer translated during transcription.6 However, emergence of introns favored the origin of a new set of large protein complexes: the spliceosomal complexes that associated with transcribed genes, and acted on the intron-containing primary transcripts.10 Emergence of the nucleus also appears to have favored the emergence of a distinct subnuclear organelle, the nucleolus, where the ribosomal proteins could be combined with the freshly synthesized rRNAs to generate functional ribosomal subunits.11 Thus, the landscape of eukaryotic chromatin diverged considerably from that of the prokaryotes, with spliceosomal, rRNA processomal, and telomerase ribonucleoprotein complexes adding to the protein and nucleic acid mass of the chromosomes beyond just the genome and the histone octamers.
In terms of protein structure, the origin of the eukaryotes was characterized by an expansion of low-complexity sequences in proteins.6, 12 These form nonglobular segments of proteins that typically exist as disordered or unstructured random coils, and tend to be enriched in a single or few amino acids. In addition to histone tails, such low-complexity regions are also abundant in eukaryotic nuclear proteins such as transcription factors (TFs) and spliceosomal proteins (e.g., RGG and SR repeats), and might play roles in protein–protein interactions and low-specificity nucleic acid interactions.12 These low-complexity regions offered a niche for the diversification of a veritable ecosystem of enzymes in eukaryotes that catalyze addition of covalent modifications to the amino acid side chains or the N- and C-termini of polypeptides.13, 14, 15, 16, 17 There also arose a corresponding array of enzymes that catalyzed the removal of such covalent modifications, to restore the given peptide to its unmodified state. In addition to histone tails, the other targets of this flux of modifications were peptides from proteins that are more transient or long-term residents of chromosomes. These modifications span a dramatic range in terms of molecular weight and biochemical diversity.13, 14, 15, 16, 17 The simplest of these are low-molecular weight adducts (methyl, phosphate, and acetyl groups). Somewhat higher molecular weight modifications include mono-ADP-ribosylation, biotinylation, and spermidinylation. The largest modifications involve addition of whole biopolymers such as branched or linear polyADP-ribose (up to 200 ADP-ribose units), peptides such as polyglutamate or polyglycine (up to 20 amino acids) and polypeptides of the ubiquitin family such as ubiquitin (Ub) and Sumo. In addition to these adducts, there are covalent modifications that directly modify the amino acid side chain. These include citrullination that results from the deimidation of the guanidino group of arginine (releasing the ammonium ion) and ornithination that results from hydrolysis of the guanidino group (releasing urea).18, 19 Other direct modifications are hydroxylations of the side chains of proline, lysine, and asparagine which generate the corresponding hydroxy amino acids.16, 20, 21
Among chromatin proteins, the ɛ-amino group of lysine is the most prominent target for modification, and receives adducts such as methyl, acetyl, biotinyl, and ubiquitin-like polypeptides.16 The target amino can accept up to 3 methyl groups, resulting in distinct mono-, di-, and tri- methyl forms of lysine. In contrast, the guanidino group of the other basic amino acid, arginine, is primarily the target for a single adduct, methylation. In this case, methylation can result in three distinct modifications namely monomethylarginine and either asymmetric dimethylarginine where both methyl groups are linked to a single nitrogen atom of the guanidine group or symmetric dimethylarginine with one methyl group on each of the two available nitrogens.19 The alcoholic amino acids serine and threonine are the primary targets of phosphorylation, but tyrosine is also similarly modified, predominantly in the animal lineage.15 Serine and threonine can also be glycanated by N-acetylglucosamine, the significance of which is only recently beginning to be understood.22 The acidic side chain of glutamate is a target for several modifications such as mono- and poly-ADP-ribosylation, polyglutamylation, polyglycination, and potentially also methylation.13, 14 The amino termini of chromatin proteins are also often subject to processing followed by acetylation. These adducts, along with direct modifications of side chains (hydroxylation and citrullination), have a profound consequence on the biochemical properties of histones and other chromatin proteins. The most prevalent modifications of histones are acetylation and methylation.15, 17 The former has been observed on at least 13 lysine side chains distributed across the four standard octameric histones. Methylation target sites are also distributed across the four core histones, with six of those being arginine and the remaining seven being lysine.15, 17, 23 These are followed by phosphorylation with at least six sites, ubiquitin-system modifications with at least five target sites and poly-ADP-ribosylation on a single site across the classic core histones or their variants like centromeric H3.15, 24, 25 Other than the core histones, the linker histone H1 is also subject to various modifications, such as methylation (e.g., at H1.4K26).26
In a direct sense, all of these modifications can affect both the surface electrostatics and the net size of the modified polypeptide, and sterically affect its interactions with nucleic acids and proteins. For example, the acetylation of lysines can reduce the net positive charge, phosphorylation and polyglutamylation can increase the net negative charge, and ubiquitination and poly-ADP-ribosylation can drastically alter the size of the polypeptide.13, 14, 15, 17 Additionally, many of these modifications carry epigenetic information, commonly termed “the histone code.” The introduction of these modifications by specific enzymes can be seen as a coding step, in which extragenetic information is “written” into the histones and transmitted through subsequent cell divisions.15, 17 Discrimination between modified and unmodified peptides by specific peptide-binding domains, which might then recruit other chromatin remodeling or modifying activities to chromatin, can be conceptualized as the “interpretation” of the epigenetic code.17, 27 Finally, the removal of these marks by other enzymes can be conceived as “resetting” of the epigenetic information and usually accompanies major differentiation events or transitions such as postzygotic development. These protein-based marks also functionally interact with both DNA modifications and the RNAi system to comprise the complete complex of epigenetic coding in eukaryotes.17 Over the past two decades, biochemical and biological studies have unleashed an avalanche of information regarding the structural, mechanistic, and organismal dimensions of these systems of epigenetic information. In particular, a combination of computational analysis of proteins sequences and structures and experimental investigations have identified most of the major enzyme classes involved in the generation and erasure of epigenetic marks on proteins as well as the domains that discriminate among them.
A key realization from the studies on chromatin protein modifications has been that, though most eukaryotes possess sizeable complements of proteins catalyzing the major modifications, they can all be unified into a relatively small set of protein superfamilies. Likewise, a relatively small set of structural scaffolds has been used repeatedly among the binding domains that discriminate modified from unmodified peptides in chromatin. Protein acetyltransferases can be unified as members of the GCN5-like acetyltransferase (GNAT) fold.28, 29 The deacetylases belong to two major folds, namely the HDAC-arginase-like fold that contains the prototypical histone deacetylase Rpd3, and the classical Rossmann fold which includes the deacetylases of the Sir2 superfamily.30, 31, 32 Among kinases, most belong to the eukaryote-type protein kinase fold, though the recently characterized WSTF (that phosphorylates H2A.X on tyrosine 142) appears to define a novel structural scaffold for protein kinases.33 Ubiquitin- and SUMO-conjugating systems follow a three enzyme cascade (E1, E2, E3), of which all histone-modifying E3s contain a treble-clef domain of the RING finger superfamily as their catalytic element.34, 35, 36 The deubiquitinating isopeptidases acting on histones contain a catalytic domain with either the papain-like fold, or the metal-dependent JAB domain of the deaminase-like fold. Likewise, the catalytic domains of methylating and demethylating enzymes, which are the focus of this chapter, belong to a small set of ancient structural scaffolds.
The chromatin protein methylation system can be defined as comprising lysine and arginine methylases, the corresponding demethylases, and the arginine deiminases that regulate arginine methylation by its conversion to citrulline. Domains that discriminate methylated peptides from their unmethylated counterparts (i.e., readers of the epigenetic code established by the above enzymes) may also be considered as an immediate extension of this system. In contrast to the several surveys that discuss chromatin protein methylation from a functional angle with a focus on human or yeast models, we adopt an evolutionary perspective and exploit the genomic information that has become available across the eukaryotic tree. We present a structural overview of the main types of protein methylases, demethylases and deiminases followed by an evolutionary consideration of each of the catalytic domains. We then briefly survey the structural diversity of the peptide-binding domains involved in discrimination of methyl marks and their potential role in recruiting other activities to chromatin. Thereafter, we consider the major trends in the domain architectures of enzymes belonging to the methylation system and discuss the emergent syntactical features in the context of the functions of these proteins. Finally, we try to place the evolutionary history of protein methylation in the context of the other major mediators of epigenetic information, namely the DNA methylation and the RNAi systems.
Section snippets
The Categories of Protein Methylases and Their Role in Chromatin Protein Methylation
Protein methylases have evolved among two structurally unrelated folds. The first group of protein methylases belongs to the classical methyltransferase superfamily along with numerous other methylases, and possesses the Rossmann fold.37, 38, 39 The second group of currently known protein methylases, the SET domain superfamily, contains the β-clip fold.40 Among the classical Rossmann fold-type methylases are several distinct protein methylase families, and two of these methylate histones and
Enzymatic Mechanisms That Preempt or Reverse the Action of Protein Methylases in Chromatin
Protein methylation in chromatin proteins, both by Rossmann fold and SET domain methylases, is modulated by catalytic activities that either preempt or reverse the methyl marks.15 The primary preemptive mechanism that has been characterized is citrullination of histone arginines. Demethylation affects both methylated arginines and lysines, and represents an important regulatory mechanism.
Domains Involved in Discrimination of Methylated Peptides
The discrimination of the methylation states of modified peptides in chromatin proteins is mediated by a number of structurally diverse domains. A comprehensive discussion of the proteins containing these domains is beyond the scope of this work, as it would involve most major groups of chromatin proteins. Hence, in this chapter, we briefly discuss the major structural scaffolds involved in binding modified peptides and discriminating their methylation status. Currently, methylated peptide
Associations with DNA-Binding and Modified-Peptide-Recognition Domains
Analysis of domain architectures, and the network representation of the total set of architectures that are found among enzymes involved in the methylation system, reveal certain interesting patterns with considerable functional significance (Fig. 6). First, there is a striking difference in the architectures of the Rossmann fold methylases and deiminases on one side, and the SET domain methylases and JOR/JmjC and LSD1 demethylases on the other side (Fig. 6). The former show practically no
Evolutionary Considerations
Complementary evidences from comparative genomics, sequence analysis, and structural biology have uncovered several key aspects of the provenance and the history of the eukaryotic protein methylation system and its integration with other regulatory mechanisms such as DNA methylation and RNAi. The evidence from comparative genomics indicates that many key players in each of these mechanisms have emerged in the bacterial world, as a part of different systems that were under selection for
General Conclusions
The past 15 years have witnessed an extraordinary expansion of studies pertaining to chromatin protein methylation and its functional significance.17 In face of the enormous literature that has accumulated in this field, it is difficult to discern key new directions that might help in filling major lacunae. However, one important direction of study would be to create comprehensive regulatory networks that link all methylated proteins to their corresponding methylating or demethylating enzymes
Acknowledgments
Work by the authors is supported by the intramural funds of the National Library of Medicine, National Institutes of Health, USA. We would like to acknowledge the numerous contributions of various researchers in the protein methylation and chromatin field, which we were regrettably unable to cite due to sheer enormity of the literature under review.
References (267)
Nucleoid-associated proteins and bacterial physiology
Adv Appl Microbiol
(2009)- et al.
Comparative genomics and structural biology of the molecular innovations of eukaryotes
Curr Opin Struct Biol
(2006) - et al.
Comparative genomics of transcription factors and chromatin proteins in parasitic protists and other eukaryotes
Int J Parasitol
(2008) - et al.
Loopy proteins appear conserved in evolution
J Mol Biol
(2002) - et al.
Polyglutamylation is a post-translational modification with a broad range of substrates
J Biol Chem
(2008) Chromatin modifications and their function
Cell
(2007)- et al.
High-resolution profiling of histone methylations in the human genome
Cell
(2007) - et al.
Psh1 is an E3 ubiquitin ligase that targets the centromeric histone variant Cse4
Mol Cell
(2010) - et al.
An E3 ubiquitin ligase prevents ectopic localization of the centromeric histone H3 variant via the centromere targeting domain
Mol Cell
(2010) - et al.
Dynamic Histone H1 Isotype 4 Methylation and Demethylation by Histone Lysine Methyltransferase G9a/KMT1C and the Jumonji Domain-containing JMJD2/KDM4 Proteins
J Biol Chem
(2009)