Journal of Molecular Biology
Volume 325, Issue 4, 24 January 2003, Pages 595-622
Journal home page for Journal of Molecular Biology

Review
N-terminal Acetyltransferases and Sequence Requirements for N-terminal Acetylation of Eukaryotic Proteins

https://doi.org/10.1016/S0022-2836(02)01269-XGet rights and content

Abstract

Nα-terminal acetylation occurs in the yeast Saccharomyces cerevisiae by any of three N-terminal acetyltransferases (NAT), NatA, NatB, and NatC, which contain Ard1p, Nat3p and Mak3p catalytic subunits, respectively. The N-terminal sequences required for N-terminal acetylation, i.e. the NatA, NatB, and NatC substrates, were evaluated by considering over 450 yeast proteins previously examined in numerous studies, and were compared to the N-terminal sequences of more than 300 acetylated mammalian proteins. In addition, acetylated sequences of eukaryotic proteins were compared to the N termini of 810 eubacterial and 175 archaeal proteins, which are rarely acetylated. Protein orthologs of Ard1p, Nat3p and Mak3p were identified with the eukaryotic genomes of the sequences of model organisms, including Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Mus musculus and Homo sapiens. Those and other putative acetyltransferases were assigned by phylogenetic analysis to the following six protein families: Ard1p; Nat3p; Mak3p; CAM; BAA; and Nat5p. The first three families correspond to the catalytic subunits of three major yeast NATs; these orthologous proteins were identified in eukaryotes, but not in prokaryotes; the CAM family include mammalian orthologs of the recently described Camello1 and Camello2 proteins whose substrates are unknown; the BAA family comprise bacterial and archaeal putative acetyltransferases whose biochemical activity have not been characterized; and the new Nat5p family assignment was on the basis of putative yeast NAT, Nat5p (YOR253W). Overall patterns of N-terminal acetylated proteins and the orthologous genes possibly encoding NATs suggest that yeast and higher eukaryotes have the same systems for N-terminal acetylation.

Introduction

During protein synthesis and maturation, the N-terminal protein sequences of both intracellular and extracellular proteins undergo a number of modifications. Proteins from prokaryotes, mitochondria and chloroplasts initiate with formylmethionine, whereas proteins from the cytosol of eukaryotes initiate with methionine. The initial methionine may be deformylated; it may be removed; and the N-terminal residue may be acetylated or modified with another chemical group. In case of extracellular proteins and certain mitochondrial, endoplasmic reticulum, Golgi, vacuolar or vesicular proteins, i.e. proteins targeted to the specific cell compartments, a portion of the N-terminal protein sequence may be cleaved off, usually 15–30 amino acid residues long, exposing new N-terminal residues that may be further modified. Methionine cleavage and N-terminal acetylation are two major types of protein modifications.1., 2. Additional modifications of protein N termini include the following: methylation, mostly of alanine, methionine and proline residues; myristoylation of glycine residues; and the addition of more rare blocking groups, including α-amino acyl, pyroglutamate, pyruvoyl, α-ketobutyl, glucuronyl, glucose and murein.3., 4. There also some examples of double N-terminal modifications, particularly acetylation and phosphorylation, involving serine and threonine residues.5 Many of these reactions take place cotranslationally, when the N terminus of the nascent polypeptide emerges from the ribosome and is only 20–50 residues long or still attached to the ribosome,6., 7. indicating that the susceptibility for these modifications is determined primarily by the N-terminal region of the protein.

The methionine at N termini is cleaved from nascent chains of most prokaryotic and eukaryotic proteins. Cleavage of N-terminal methionine residues is by far the most common modification and is catalyzed cotranslationally by methionine aminopeptidases (MAP); one enzyme is described in bacteria and archaea, MAP type I and MAP type II, respectively. The archaeal MAP is not highly homologous to the bacterial enzyme. The bacteria and archaea MAPs have similar specificity, and resemble, respectively, MAP I and MAP II type enzymes found in eukaryotes.2., 8. It is also possible that eukaryotic organelles with their own translation machinery might contain different MAP isoforms, as was shown recently for chloroplasts and mitochondria of Arabidopsis thaliana.9 Removal of N-terminal methionine is an essential function in yeast, as in prokaryotes, but the process can be carried out by either of two enzymes.8 Experiments with altered iso-1-cytochromes c (iso-1) from yeast were the basis for the hypothesis that methionine is cleaved from penultimate residues having radii of gyration of 1.29 Å or less (glycine, alanine, serine, cysteine, threonine, proline, and valine residues),10 a hypothesis that was confirmed in other studies with prokaryotic systems in vivo and in vitro11., 12. and other eukaryotic systems in vivo and in vitro.13., 14. The lack of methionine aminopeptidase action on proteins with large penultimate residues, as discussed above, can be now explained by steric hindrance, as deduced from the crystal structure of MAP.15

N-terminal acetylation is an enzyme-catalyzed reaction in which the protein α-amino group accepts the acetyl group from acetyl-CoA. The enzyme N-terminal acetyltransferase (NAT) has been found in all kingdoms, prokaryotes, archaea and eukaryotes, but N-terminal acetylation is likely to be cotranslational only in eukaryotes (see also below). There are some examples of viral protein acetylation but it normally employs the host cell NAT system. In vitro studies indicated that N-terminal acetylation of eukaryotic proteins occurs when there are between 25 and 50 residues extruding from the ribosome.6., 16. Similar to methionine cleavage, N-terminal acetylation is one of the most common protein modifications in eukaryotes, occurring on approximately 80–90% of the different varieties of cytosolic mammalian proteins16., 17., about 50% in yeast,18 but rarely on prokaryotic19 or archaeal proteins. The percentage of acetylated proteins in plants is not known, but the number of known N-terminal sequences of mature proteins in SWISS-PROT is relatively small and includes mostly cytochromes c, histones and some metabolic enzymes. Similarly, in invertebrates the number of characterized proteins is limited and their N-terminal acetylation appears to be rare. Although both protein sets from yeast and mammals considered herein obviously do not include the entire proteomes they nevertheless represent a large variety of all possible N termini and may be considered as a basis to generalize to all proteins.

Eukaryotic proteins susceptible to N-terminal acetylation have a variety of different N-terminal sequences, with no simple consensus motifs, and with no dependence on a single type of residue.1., 6. Proteins with serine and alanine termini are the most frequently acetylated, and these residues, along with methionine, glycine, and threonine account for over 95% of the N-terminal acetylated residues.6., 16., 17., 20. Our studies with N-terminally altered iso-1 from yeast and identification of three different NATs and their substrate helped to establish the basic patterns for acetylation.1

In normal yeast strains, the N-terminal methionine of iso-1-cytchrome c is cleaved and the newly exposed threonine residue is not acetylated. However, during the course of numerous studies spanning three decades, many mutant forms of iso-1 were found to have N termini processed in different ways,21., 22. as illustrated in Table 1. Because of the dispensability of the N-terminal region, and the ease of generating altered sequences by transformation with synthetic oligonucleotides, the iso-1 system has been used to systematically investigate N-terminal processing.21., 22., 23., 24., 25. The study of mutationally altered forms of iso-1-cytochrome c was critical in deciphering the amino acid requirements for the two N-terminal processes, methionine cleavage and acetylation, as well as for identifying the substrate specificities for each of Ard1p, Nat3p and Mak3p, the catalytic subunits of the three N-terminal acetyltransferases.24., 26., 27.

We present here a comprehensive analysis of N-terminal mature sequences for more than 450 yeast proteins, over 300 mammalian proteins, mostly human, bovine and mouse origin, and we also compare these eukaryotic proteins to the mature sequences of a large subset of over 810 eubacterial and 175 archaeal proteins. This protein database mainly constitutes cytosolic soluble proteins or other cellular proteins whose N termini are not processed in a manner of cotranslocational cleavage of signal sequences or secretion. Also by using BLAST programs and amino acid sequence alignment, we have identified the orthologs of N-terminal acetyltransferases from the genomes of the completely or partially sequenced model organisms representing all live kingdoms; and we have constructed a NAT phylogenetic tree. We have uncovered six NAT protein families: three of them, Ard1p, Nat3p and Mak3p, are related to each of the yeast catalytic subunits, Ard1p, Nat3p and Mak3p, respectively; a fourth, CAM, is composed of Camello1 and Camello2 putative acetyltransferase proteins, which most likely is evolutionarily related to the Mak3p family; a fifth, BAA family, is composed of diverged bacterial and archaeal NATs, some being related to Escherichia coli Rim acetyltransferases, which act on certain ribosomal subunits; and a new, hypothetical Nat5p family, with unknown substrate specificity.

N-terminal amino acid sequences of yeast proteins, presented in Table 2, were taken from the results of acetylation analysis of mutationally altered iso-1-cytochromes c;21., 22., 23., 24., 25. mutationally altered β-galactosidases;28 abundant proteins;29., 30. ribosomal proteins;31 and 19 S and 20 S proteasome subunits.32 Also, some sequences, mostly for non-acetylated N termini, were taken from Proteome, Inc. database (now Incyte Genomics)33 for individual proteins with N termini determined experimentally and from the original published reports. N-terminal protein sequences of known acetylated mammalian, bacterial and archaeal proteins are shown in Table 3, Table 4, respectively, and were taken from SWISS-PROT database34 and from the original literature references. To search the acetylated protein subset from each organism in SWISS-PROT, FtDescription (Feature) option from Sequence Retrieval System has been used. Since N-terminal processing is determined predominantly by the sequence at the beginning of the protein, only the first six amino-terminal residues were taken into account. Proteins that were assumed to be acetylated on the basis of similarity or homology to orthologous acetylated proteins were not included. In isolated cases, we included the proteins that have been described as N-terminally blocked, although the nature of the blocking group was not identified. In these cases, the proteins are designated in the Tables by a footnote. The protein N-terminal sequences in Table 2, Table 3, Table 4 are organized in alphabetical order. We combined acetylated and non-acetylated protein sequences in one table for Saccharomyces cerevisiae, representing lower eukaryotes, or grouped the proteins from related species of the same kingdom, such as human, bovine and mouse, into one mammalian proteins subset, acetylated or non-acetylated. The summary tables for both, yeast and mammalian proteins, are also presented (Table 5, Table 6, respectively).

In order to avoid duplications for different isomers, we considered all proteins from one organism with the same protein name and having the same N-terminal sequence as one unique entry (for example, numerous human MTA1, MTA2 or HTA2 proteins, were counted as one unique sequence for each case). It should be mentioned also that multiple isoforms for many eukaryotic proteins often are observed due to the differential gene expression, splicing, posttranslational modifications, like phosphorylation and glycosylation, or partial modification and this fact complicates acetylation analysis, especially in higher eukaryotes. Finally, we did not take into account the acetylation of regulatory peptides or hormones, like β-endorphin and melanotropic hormone, α-MSH, or other small popypeptides because they normally undergo extra proteolytic cleavage steps and their acetylation is posttranslational. Some of these regulatory macromolecules are synthesized enzymatically without ribosomes.

The analysis of the mature N termini of yeast proteins presented in Table 2 indicates that 43% of all proteins are acetylated, which is comparable to about the 50% estimate made previously by 2D-gel technique for cytosolic soluble proteins.18 The small difference could be due to the fact that, in our protein set, the abundant proteins are over-represented and might not reflect the random protein population. In addition to amino acid sequences, Table 2 contains the identified or suggested NAT substrate types for all acetylated proteins, the NAT deficient mutants used in the analysis, the method used to detect acetylation, and the original reference. The data presented in Table 2 and summarized in Table 5 showed that in N-terminal sequences of yeast, acetylated proteins have termini predominantly of serine (124), methionine (29), alanine (19) and threonine (15) residues. Serine and alanine residues together contribute more than 74% of all acetylated proteins. Besides the four mentioned amino acid residues, only a few examples are found for other acetylated N termini; three for glycine and one each for cysteine and valine, with the two latter residues most likely being only partially modified. Notably, methionine is clearly the second, after serine, most common acetylated residue in yeast, in contrast to mammalian proteins (see below) where serine and alanine are the most preferentially modified. Also, the effect of penultimate residue on acetylation is profound; acidic aspartic or glutamic residues stimulate acetylation, whereas proline inhibits acetylation and positively charged lysine and arginine usually but not always inhibit acetylation (Table 5). All methionine residues of Met-Asp- or Met-Glu- N termini (NatB substrates) were acetylated, as well as all serine and alanine residues in the same context. Hydrophobic aromatic or branched residues like leucine, isoleucine, phenyalanine, and tryptophan at penultimate position cause the methionine acetylation in about 50% of the cases (Nat C substrates); other structural features may interfere with the NAT action. However, we observed that in such sequences the presence of an acidic residue at the third position often inhibits acetylation.1., 25.

The majority of mammalian proteins presented in Table 3 are acetylated, totaling about 89%, which is in good agreement with an earlier estimate.6., 16. The data provided in Table 6 summary shows that in N-terminal sequences of mammalian acetylated proteins, alanine (103), serine (67) and methionine (33) are predominant terminal residues following much smaller numbers for glycine, threonine, valine and cysteine residues. In the entire set of mammalian proteins, a substantially larger number of mature sequences begin with serine, alanine, or methionine residues, which are most often acetylated, 78% compared to 58% in yeast. Actin proteins with N-terminal glutamic acid and aspartic acid are acetylated by a unique protein processing system and will be considered in a separate section.

N-terminal serine residues are almost equally well acetylated in lower and higher eukaryotes. However, a significantly higher number of alanine residues are acetylated in the mammalian protein set compared to the yeast set, 99% versus 30%, respectively. The same is true for glycine and threonine residues. Also, while in yeast cysteine and valine residues are rarely modified by an acetyl group, in mammals it occurs more often. Both yeast and mammalian Met-Glu- and Met-Asp- proteins are always acetylated, but in mammals a variety of other types of N-terminal sequences with retained initial methionine (shown in Table 5, Table 6) is much less, with only ten such proteins as compared to 33 in yeast. This is consistent with an earlier view that retention of methionine and the lack of its acetylation are more characteristic of evolutionarily simpler genomes, especially bacterial and archaeal.6 Particularly, prokaryotic proteins with retained methionine often have Met-Lys- sequences that are not observed frequently in mammals. Overall, eukaryotic proteins appear less prone to retain their N-terminal methionine residues. The stimulating effect of the acidic residues, like aspartic and glutamic acids, on N-terminal acetylation and inhibitory effect of basic residues, like lysine, arginine and proline residues, at penultimate position of mammalian proteins can be clearly seen from the Tables for both lower and higher eukaryotes.

It is also interesting to note that the larger proportion of acetylated proteins in higher eukaryotes could be explained, at least in part, by higher representation of acidic residues at the penultimate position. Specifically, in yeast the N-terminal sequences X-Glu-, or X-Asp-, where X designates Ser-, Ala-, Thr- or Met- termini, are almost always acetylated except for a few cases of Thr-Glu-, Thr-Asp- proteins; these X-Glu-, and X-Asp- termini comprise only 17% of all mature Ser-, Ala-, Thr- and Met- N termini in yeast, but that number is more than 39% in mammals (Table 5, Table 6). More frequent acetylation of N-terminal cysteine and valine residues in mammals may occur by the same reason. On the other hand, in yeast the number of N-terminal X-Lys-, X-Arg- or X-Pro- sequences, where X designates Ser-, Ala-, Thr- or Met-, are seldom acetylated, and these comprise about 21% of all Ser-, Ala-, Thr- and Met- N termini, while in mammals they comprise only 3%. Nevertheless, the NAT substrate specificities for yeast and mammals still appear to be the same.

In general, the acetylation patterns in yeast and mammals are very similar and may be evolutionarily conserved. However, a greater number of N-terminal protein sequences from higher eukaryotes are acetylated, probably reflecting some form of selection during evolution. Three lines of evidence, discussed above, support this conclusion, in which mammalian proteins contain the following: (1) a higher representation of most likely acetylated residues, serine, alanine, or methioine, at the first position; (2) a much higher representation of stimulating acidic residues at the penultimate position; and (3) a significantly lower representation of inhibitory basic residues at the penultimate position. The biological significance of such evolved difference remains to be determined.

The N-terminal amino acid composition of the soluble proteins from a cell-free extract of E. coli determined by dinitrophenyl- and phenyl-thiohydantoin methods showed that methionine, alanine, serine, threonine and aspartic and glutamic acid residues, with the latter in minor amounts, account for close to 95% of the end groups recovered.19 N-terminal acetylation does not appear to be widespread in prokaryotes. However, systematic N-terminal characterization of bacterial and archaeal proteins has not been undertaken and the counterparts to eukaryotic NATs have not been identified. In E. coli, three NATs, RimI, RimJ and RimL, specifically modify single ribosomal proteins S18, S5 and L12, with Ala-Arg-, Ala-His- and Ser-Ile- N termini, respectively,35., 36. but there is no evidence that they act cotranslationally. For example, the family of large subunit ribosomal proteins L7/L12 is present in each 50 S subunit in four copies organized as two dimers and together with L10 is assembled in E. coli ribosomes on the conserved region of 23 S rRNA, termed the GTPase-associated domain.37 The L7/L12 dimer probably interacts with elongation factor EFTu. Because the L7 and L12 proteins have identical amino acid sequences in the N-terminal region, and because only L7 is N-terminally acetylated, this modification apparently occurs posttranslationally after partial or complete ribosome assembly, and RimL most likely recognizes some certain protein structure and not just the very N terminus. It is also not known if Rim enzymes act on other substrates in vivo.

We have searched the SWISS-PROT database for E. coli and other bacterial N-terminally acetylated proteins by the same procedure that was applied for mammalian proteins. We found only five such examples, which include the three ribosomal proteins mentioned above, E. coli EFTu elongation factor and one acetylated ribosomal protein, L7, from Micrococcus luteus that normally is present in the cell in two almost identical forms, one of which is acetylated and second which is not38 (Table 4). This M. luteus protein probably corresponds to L7/L12 protein of E. coli. It is interesting to note that its acetylated N-terminal sequence, Met-Asn-Lys-Glu-Gln, is different from the E. coli L7/L12, Ser-Ile-Thr-Lys-Asp, suggesting that L7 protein acetylation is a conserved function in bacteria, but it does not depend on the first few N-terminal amino acid residues. We also found only the following three archaeal N-terminally acetylated proteins in SWISS-PROT: ribosomal proteins S7 and L31e from Haloarcula marismortui; and glutamate dehydrogenase, DHE2, from Sulfolobus solfataricus (Table 5). Eight other archaeal and six E. coli proteins were annotated as N-terminally blocked but the nature of the block groups was not known. Some of the blocking groups definitely could be other than acetyl modification, for example, L11 protein of E. coli is actually trimethylated.39

However, we were able to find a relatively large number of archaeal proteins with experimentally determined and non-acetylated N-terminal sequences, and many of them are ribosomal proteins. From a total of 97 mature N-terminal sequences, 28 were started with alanine, 26 with methionine, 16 with serine, ten with proline, seven with threonine, six with glycine and four with valine. More importantly, 807 out of 810 E. coli proteins with verified N-terminal sequences and listed in EcoGene Web Site† were not acetylated. Thus, most bacterial and archaeal proteins with characterized N-terminal sequences obviously are not acetylated, even though their counterparts are acetylated in eukaryotes. The few acetylated bacterial and archaeal proteins probably reflect an important functional requirement of resulting charge at the amino terminus.

In addition, the SWISS-PROT database was specifically searched for N-terminal acetylation of Met-Glu- and Met-Asp- proteins from bacteria, archaea and eukaryotes. As stressed above, all Met-Glu- and Met-Asp- proteins from eukaryotes are acetylated. The search, presented in Table 7, revealed that out of 47 mature N-terminal protein sequences from bacteria and archaea only one protein was found acetylated, DHE2 from archaea S. solfataricus with the sequence Met-Glu-Glu-Val-Leu-. In contrast, all 13 yeast and 51 mammalian proteins with Met-Glu- and Met-Asp- termini were acetylated. These results add to the conviction that N-terminal acetylation of eukaryotic proteins fundamentally differs from the N-terminal acetylation of bacteria and archaea proteins.

Studies with yeast S. cerevisiae so far revealed three different N-terminal acetyltransferases, NatA, NatB and NatC, that act on groups of substrates, with each group containing degenerate motifs.1 Polevoda et al.24 characterized their substrate specificity in vivo by investigation of acetylation of several subsets of yeast proteins from various NAT deletion mutants. As described above, Ard1p, Nat3p and Mak3p are related to each other by amino acid sequence, and are believed to be the catalytic subunits of three NATs, NatA, NatB, and NatC, respectively, with each NAT acting on different sets of proteins having different N-terminal regions (Table 8). NatA is a major NAT in yeast cells with multiple substrates in vivo.18 Ard1p activity requires at least two subunits, Ard1p itself, and Nat1p.26 The MAK3 gene encodes a NAT that is required for the N-terminal acetylation of the killer viral major coat protein, gag, with a Met-Leu-Arg-Phe- terminus,28 two subunits of the 20 S proteasome32 and probably some mitochondrial proteins. The co-purification of Mak3p, Mak10p and Mak31p suggests that these three subunits form a complex that is required for N-terminal acetylation. Recently we have shown that all three subunits are required for NatC activity but not for acetylation of NatA or NatB substrate types.25 Nat3p was originally identified on the basis of similarities of its amino acid sequence to those of Ard1p and Mak3p, and Nat3p complex contains three other subunits, Mdm20p and proteins with molecular masses about 47 kDa and 16 kDa (B.P., T. Cardillo, G. Bedi & F.S., unpublished results). NatB substrates in vivo include actin, Act1p, and Rnr4p,25 two ribosomal proteins31 and three subunits of 26 S proteasome.32 All acetylated proteins in yeast can be assigned to one of the NatA, NatB or NatC substrates. Furthermore, we do not know of any acetylated proteins in yeast that could not reasonably be a NatA, NatB or NatC substrate. Nevertheless, it remains to be seen if there are other NATs, acting on rarer substrates.

The similarity in the pattern of N-terminally acetylated proteins from higher eukaryotes and S. cerevisiae suggest that the same systems may operate in all eukaryotes, including the presence of homologous N-terminal acetyltransferases that are the members of a larger acetyltransferase family, PF00583 (GNAT).40 Although three different NATs in yeast are not highly similar in their amino acid sequences, the similarity in the regions of their putative Ac-CoA-binding motifs A–D is much stronger, indicative of a conserved protein function. On the other hand, the protein sequences of the yeast NATs are sufficiently diverged to allow the identification of proteins corresponding to sets of the same ortholog from other species. We have used the general BLAST server from the National Center for Biotechnology Information (NCBI) to identify such orthologs in different model organisms. In some cases, to limit the search options or to identify the candidates with the highest similarity, we ran BLAST searches against individual organism proteomes, which were completely or incompletely sequenced. Protein sequence alignments and phylogenetic analysis were undertaken after the candidate proteins with the closest homology to a particular NAT were identified. If necessary, some corrections were made at this point and less likely candidate proteins were discarded. Multalin program41 was used for protein alignment and the MegAlin program from LaserGene99 package (DNAStar, Madison, WI) was used for phylogenetic analysis.

The presence of the orthologous genes encoding the three different N-terminal acetyltransferases in worms, flies, plants and mammals serves as an additional evidence that the same or similar N-terminal acetylation system may be operating in higher eukaryotes as in yeast. Species containing orthologs of the yeast Ard1p include Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, A. thaliana, Trypanosoma brucei, Dictyostelium discoideum, Mus musculus and Homo sapiens; of the yeast Nat3p include S. pombe, C. elegans, D. melanogaster, A. thaliana, Leishmania donovani, M. musculus and H. sapiens; and of the yeast Mak3p include S. pombe, C. elegans, D. melanogaster and A. thaliana (Figure 1). Several highly homologous proteins, the so-called Camello proteins, from rat, mouse and human form a special NAT group, that evolutionarily could be linked to Mak3p. Bacterial and archaeal proteins are generally not very similar to eukaryotic NATs and are even more diverged between themselves. The presence of multiple bacterial enzymes for antibiotic inactivation by acetylation, for example chloramphenycol acetyltransferases, sometimes complicates the NAT homology searches because the amino acid sequences of motifs A–D responsible for acetyl-CoA binding in such proteins are very close, as was noted earlier.27

The identified NATs from different species were also analyzed by a phylogenetic approach. The following six NAT families were detected on the basis of their protein similarity: Ard1p, the Ard1p related group; Nat3p, the Nat3p related group; Mak3p, the Mak3p related group; CAM, the Camello1 and Camello2 related group;42 BAA, bacterial and archaeal putative acetyltransferases; and Nat5p, the newly uncovered hypothetical yeast Nat5p (YOR253W) related group (Figure 2). All of these groups are distantly related to each other, except for the CAM family, which is phylogeneticly related more closely to the Mak3p family and which most likely diverged from an ancestral Mak3p. Although it was recently shown that Camello proteins play an essential role in embryo development in Xenopus levis,43 no substrate for Camello enzymes has been so far identified.

The BAA family form a well isolated branch in the NAT phylogenetic tree with broader diversity of eubacterial and archaeal NATs, but none of the proteins has been shown to have biochemical activity. Thus, substrate specificity of those proteins also is unknown. Although some members of the BAA family are annotated in databases as acetyltransferases related to E. coli Rim proteins, primarily RimI, which act on ribosomal proteins, none of the Rim proteins themselves is present in our NAT phylogenetic tree. Instead, another E. coli protein, accession number P46854, was identified phylogenetically as the closest to the three eukaryotic NATs (Figure 1). Although initially we included all three Rim proteins on the basis of amino acid sequence alignment, only the RimI protein showed significant homology to the eukaryotic NATs. Both RimI and P46854 are more similar to each other, but are relatively dissimilar to the three major yeast NATs (Table 9), although P46854 protein had a higher match in the conserved NAT region. It appears as if known eukaryotic NATs evolved from primordial forms of RimI and P46854. The analysis of the putative bacterial acetyltransferases was strengthened by the fact that the overall homology between three major eukaryotic NATs is low and may reflect the diversity of the substrates they act on. There is no information on which domains or residues are involved in protein substrate binding or if any other subunit of NAT complexes specifies the substrate recognition. Although we have considered the BAA family as putative acetyltransferases, obviously further analyses are required for definitive conclusions concerning their activity, function and relationship to eukaryotic NATs.

Nat5p represents a family of the putative NATs with orthlogous proteins identified in yeast, S. pombe, C. elegans, D. melanogaster, A. thaliana and H. sapiens. The finding of this new family is only based on sequence similarity of Nat5p (YOR253Wp) to other NATs. Our attempts to detect any Nat5p substrates in yeast by 2D-gel electrophoresis has been so far unsuccessful, but this may reflect the rarity of the substrates in vivo or that Nat5p is acting on the smaller polypetides with mobility parameters undetectable by our regular 2D-gel procedure (R. Svensson, B.P., F.S. & A. Blomberg, unpublished result).

With availability of the increased number of completely sequenced eukaryotic genomes and powerful computer search programs, it is now possible to search for the presence of NAT isoforms for particular organisms. Recently such an approach was applied for identification of MAPs in the A. thaliana genome.9 Six new MAP cDNAs were found, MAP1A–MAP1D, which are located at different genomic loci, and which are closely related to yeast Map1p (and E. coli MetAP) in their protein sequences; and the duplicated MAP2A and MAP2B, which are closely related to yeast Map2p and nearly identical in protein sequences, but are located on different chromosomes. Three MAP isoforms were expressed and localized in cytoplasm, MAP1A and both MAP2s; one, MAP1B, was detected exclusively in plastids; and the others, MAP1C and MAP1D, localized in both mitochondria and plastids. The three MAP1B–MAP1D enzymes that localize to organelles possess the unique N-terminal pre-sequences to direct each protein to its proper cell compartment, but otherwise they are very similar to each other in catalytic domain. Multiple isoforms of another N-terminal processing enzyme, protein deformylase, that localize to mitochondria and plastids, also were detected in the A. thaliana genome.9

These findings with A. thaliana encouraged us to search for NAT isoforms located in cellular compartments where de novo protein synthesis occurs, even though eukaryotic organelles were derived from ancestral endosymbiotic eubacteria that lacked cotranslational N-terminal acetylation. Using regular BLAST searches, we were unable to find NAT isoforms in human or mouse genomes, unlike those multiple MAPs in A. thaliana. However, it is still possible that distinct NATs may be found in mammalian and plant organelles that acetylate individual proteins posttranslationally, similar, for example, to E. coli Rim enzymes. In support of this, three proteins synthesized in spinach chloroplasts were described as both N-terminally acetylated and phosphorylated.5

Previous attempts to predict N-terminal acetylation on the basis of the properties of amino acid residues distributed along the N-terminal region were unsuccessful. A computer program, Pattern Learn, was used in an attempt to distinguish the patterns in 56 Ac-Ala- acetylated and 104 Ala- non-acetylated eukaryotic proteins by comparing the first 40 amino acid residues for their statistical assignment as secondary structure formers, breakers or neutrals.44 Some distinguishing features were found in the sequences mainly between 1–10 residues, smaller features at 16–24 and 30–40 residues, but the precise nature of these features was not determined. However, new insight on this problem has been provided by using yeast mutants deleted in one or another NAT genes. The substrate specificities for each of the Ard1p, Nat3p and Mak3p enzymes were deduced from considering the lack of acetylation of the different protein subsets and the corresponding substrate types were designated NatA, NatB, NatC.1., 23. As was summarized earlier,1 subclasses of proteins with Ser-, Ala-, Gly- or Thr termini were not acetylated in ard1-Δ mutants (NatA substrates); proteins with Met-Glu- or Met-Asp- termini and subclasses of proteins with Met-Asn- and Met-Met- N termini were not acetylated in nat3-Δ mutants (NatB substrates); and subclasses of proteins with Met-Ile-, Met-Leu-, Met-Trp- or Met-Phe- termini were not acetylated in mak3-Δ mutants (NatC substrates). In addition, a special subclass of NatA substrates with Ser-Glu-, Ser-Asp-, Ala-Glu-, or Gly-Glu- termini, designated NatA′ substrates, were also only partially acetylated in nat3-Δ and mak3-Δ mutants.

The NatA substrates appear to be the most degenerate, encompassing a wide range of sequences, especially those with N-terminal residues of serine or alanine. Nevertheless, it has not been excluded that new NATs may be discovered, especially for proteins with unusual and rare N-terminal sequences that are not substrates for NatA, NatB, or NatC. For example, the acetylation of Cys-Asp- actin in yeast45 is not, as expected, a NatA substrate.

Generally, acetylation cannot be definitively predicted from the primary amino acid sequence. Only the NatB substrates have common sequences that can be easily deciphered and normally are acetylated. But even NatB substrate acetylation could be diminished by the presence of inhibitory residues. For example, altered iso-1, Ac-Met-Asp-Pro- was only 67% acetylated and one can assume that adjacent proline residue diminished the action of Nat3p. While the reason for the lack of acetylation is unclear, the N-terminal region of many of the non-acetylated proteins related to both NatA and NatB substrates contain basic residues, lysine, arginine, and histidine, as well as proline residues. At the same time, the N termini of non-acetylated proteins related to yeast NatC substrate contain acidic residues, such as glutamic acid at their N termini. As we mentioned above, normally acidic residues stimulate acetylation of substrates NatA and B. Moreover, the stimulating and inhibitory residues may occupy sites further than the fifth amino acid position from the N terminus.1

We suggested earlier that NATs act on substrates with specific but degenerate sequences, and that the activities can be diminished by suboptimal residues.1., 22. We further suggested that acetylation can be diminished by the inhibitory residues situated anywhere on the nascent chain at the time of this addition. Thus, the degree of acetylation is the net effect of positive optimal or suboptimal residues, and negative inhibitory residues. Furthermore, this lack of acetylation could be due to the absence of required residues or the presence of inhibitory residues. Because the identities of required and inhibitory residues are not completely understood, the ability of a protein to be acetylated cannot be definitively predicted from the primary sequence. Because the required and inhibitory residues may affect acetylation to various degrees, and because inhibitory residues may possibly occupy various sites in the nascent chain, predicting acetylated and non-acetylated sequences is still not absolutely reliable; however, considering our new studies presented herein, the acetylation of many proteins can now be predicted with a high degree of accuracy.

The biological significance of N-terminal modification varies with the particular protein, with some proteins requiring acetylation for function, whereas others do not. For example, the 30-fold increased dissociation of HbF1 form of human fetal hemoglobin compared with normal HbF is most likely due to the presence of N-terminal acetyl group at the juncture where αγ dimers assemble to form tetramer.46 Also, N-terminal acetylation of tropomyosin is required for its binding to actin (also see below).47 The recombinant enzyme rat glycine N-methyltransferase (GNMT) expressed in E. coli and lacking an N-terminal acetyl group exhibited similar kinetic patterns to the GNMT purified from liver but showed hyperbolic kinetics at low pH in contrast to the sigmoidal behavior of native protein.48 In some cases, a loss of acetylation leads to decreased thermal stability of protein, kinetic parameters or less efficiency in the complex assembly. An earlier suggestion was made that N-terminal acetylation protects protein from degradation, but in those examples, the proteins lacking acetylated termini also had other differences in amino acid sequences. Clearly, N-terminal acetylation does not necessarily protect proteins from degradation, as often supposed, nor does it play any obvious role in protection of proteins from degradation by the “N-end rule” pathway.

A significant means for assessing the general importance of N-terminal acetylation comes from the phenotypic defects in mutants lacking one or another of the NATs. The lack of N-terminal acetylation of the viral major coat protein, gag, in mak3 strains prevents assembly or maintenance of the viral particle.27 Also mak3 strains do not utilize non-fermentatable carbon sources at 37 °C, probably because of the lack of acetylation of a still unidentified mitochondrial protein.25., 27.

We have previously reported that nat3-Δ mutants exhibit multiple defective phenotypes, including slow growth, lack of growth on YPG medium at 37 °C, reduced growth on medium containing NaCl, and reduced mating.24 Such defects could arise from the lack of acetylation of any number of proteins essential for different processes. While the unacetylated proteins responsible for these defects are not easily identified, the temperature and NaCl sensitivity could be attributed to lack of acetylation of actin (Act1p), which contains a normal N-terminal sequence Ac-Met-Asp-Ser-Glu-. In addition, it has been shown that acetylation at the N terminus of actin strengthens weak interaction between actin and myosin.49

Actin cable formation requires tropomyosin for stability. The N-terminal tail of tropomyosin and its acetylation status is very important for protein function.47 Furthermore, yeast tropomyosins Tpm1p and Tpm2p, with N termini Met-Asp- and Met-Glu-, respectively, very likely are the substrates for NatB. It was found that Mdm20p, a NatB subunit (Table 8) is necessary for actin–tropomyosin interaction but the protein role was not determined.50 Previous work by Hermann et al.51 revealed that mdm20-Δ strains were defective in mitochondrial inheritance and actin cables (bundles of actin filaments), and that extra copies of TPM1, a gene encoding the actin filament-binding protein tropomyosin, suppress mitochondrial inheritance defects and partially restore actin cables in mdm20-Δ cells. Synthetic lethality was also observed between mdm20 and tpm1 mutant strains, and certain dominant alleles of ACT1 and TPM1 suppressed mdm20-Δ. Interestingly, one of the mdm20 deletion mutant suppressors was TPM1-5 allele containing altered protein N terminus with extended seven amino acid residues and utilizing earlier ATG start, resulting in Met-His-, instead of the native Met-Asp- terminus. Although Mdm20p does not co-localize with actin or tropomyosin in the growing cables,50 it is nevertheless required for association of these proteins.52 Using the TAP-protocol, we recently found that Mdm20p is a subunit of NatB (Table 8) (B.P., T. Cardillo, G. Bedi & F.S., unpublished results) and we suggest that protein acetylation is required for proper actin–tropomyosin interaction.

In contrast, many non-acetylated recombinant proteins are fully active. For example, the N-terminal acetylation of chaperonin Hsp10 protein is not necessary for the correct folding of the protein and also is not important for chaperonin activity or mitochondrial import.53 Similarly, other proteins that normally contain an acetylated N terminus, such as alcohol dehydrogenase, are stable and fully functional.54 Results with annexin II tetramer (AIIt) indicate that N-terminal acetylation does not affect the in vitro activities or conformational stability of the protein.55 The number of examples of proteins either requiring or not requiring N-terminal acetylation undoubtedly will continue to be augmented. Not only can the lack of acetylation result in various defects, but abnormal acetylation also can prevent normal functions. For example, the acetylation of the N-terminal catalytic threonine residue of various 20 S proteosome subunits causes the loss of specific peptidase activities.56

Obviously, both N-terminal acetylation and the lack of N-terminal acetylation have evolved to meet the individual requirements of specific proteins. The viability of ard1-Δ, nat1-Δ, mak3-Δ and nat3-Δ mutants lacking NATs suggests that the role of acetylation may be subtle and not absolute for most proteins. Possibly only a subset of proteins actually requires this modification for activity or stability, whereas the remainder are acetylated only because their amino termini fortuitously correspond to consensus sequences.

Actin is a major contractile protein in both muscle and non-muscle eukaryotic cells. All actins are highly homologous and contain several acidic amino acid residues at N termini, which are required for function (see above). Apparently all actin isoforms from all eukaryotes undergo the normal cotranslational processing of methionine cleavage and acetylation as described above for typical proteins. However, extensive studies by Rubenstein and colleagues revealed that at least some actins from many eukaryotes undergo additional specific posttranslational processing, including actins from the slime mold D. discoideum,57 the fruit fly D. melanogaster,58 birds, and mammals.59., 60., 61. However, additional posttranslational processing of actin does not occur in the protozoa Acanthamoeba castellanii,62 or in the fungi S. cerevisiae, Aspergillus nidulans, S. pombe, and Candida albicans.45 The posttranslational processing of actin requires an N-acetylaminopeptidase (ANAP), which specifically removes N-terminal Ac-Met or Ac-Cys from actin to leave an acidic N-terminal residue, and which has been isolated from rat liver and partially characterized.63., 64.

The specific posttranslational processing of actin can now be assigned to the following two types, Type I and Type II, which are shown above, and which consider the more recently studied general cotranslational systems: (While only single examples of the actins are depicted, Type I actins include both Met-Asp- and Met-Glu- proteins.)

These specific posttranslational processing events have obviously evolved to produce actin with Ac-Asp- or Ac-Glu- termini, reflecting the requirement for an acidic amino acid at the N terminus. It should be noted that proteins with just Asp- or Glu- at the N terminus would be unstable, as they would be degraded by the N-end rule degradation system;65 thus, acetylation may be required in part for stabilization of the actins in some but not all organisms with acidic residues at the termini (see below). So far, no NAT specifically acting on actins with aspartic acid, glutamic acid or cysteine termini have been identified. On the other hand, actins from S. cerevisiae (Ac-Met-Asp-Ser-Glu-), other fungi, and A. castellanii (Ac-Gly-Asp-Glu-) have evolved without requiring acidic residues at the immediate N terminus and without requiring posttranslational processing, although nearby acidic residues are required for normal function. Thus, the different actins have different N-terminal sequence requirements. In this regard as discussed above, we previously suggested that the slow growth phenotype, lack of growth on non-fermentable carbon sources, temperature and salt sensitivity in nat3-Δ yeast mutants, lacking Nat B, could all be attributed primarily to the lack of actin acetylation.24

However, the ACT88F actin isoform from D. melanogaster is normally N-terminally processed in vivo by the cleavage of Ac-Cys, but the resulting N-terminal aspartic acid residue is not acetylated.66 Nevertheless, the actin with the free α-amino aspartic acid residue is stable. Furthermore, Schmitz et al.66 reported that D. melanogaster carrying the mod mutation failed to complete post-translational processing of the ACT88F actin. They proposed that the mod gene product is normally responsible for removing Ac-Cys from actin, and may correspond to an ANAP. The biological significance of this process was demonstrated by observations that retention of the Ac-Cys- at the terminus of ACT88F affected the flight muscle function of mod flies. Clearly, the N terminus requirement varies with different actins.

In addition to the N-terminal acetylation occurring cotranslationally, there are numerous examples of acetylation of the ε-amino group of lysine residues at various positions occurring posttranslationally.67 The most studied example is histones H2A, H2B and H3, in which the modification occurs at multiple sites in the N-terminal domains. In contrast to N-terminal acetylation, ε-Lys acetylation of histones is reversible, due to the action of histone deacetylases.68 There is no evidence for deacetylases that act on N-terminal acetylated proteins.

However, there are acylamino acid-releasing enzymes (AARE) (also designated acylaminoacyl-peptide hydrolase), which cleave Ac-Ala, Ac-Thr, Ac-Met, Ac-Gly, and Ac-Ser from the N-terminal end of short peptides, but are not known to act on N-terminal acetylated proteins.69., 70., 71. AARE have been isolated from eukaryotes and an archaeon, but not from prokaryotes.72 On the basis of their in vitro properties, AARE have been suggested to possibly act on short nascent chains during translation, although their physiological function is unknown. It is also unknown how AARE is related to the acetyl-Met and acetyl-Cys hydrolase, which are involved in type I and type II actin processing, although they are clearly different. We favor the view that AARE play an important role in the recycling of amino acid residues for protein synthesis, but are not involved in cotranslational or posttranslational processing of N-terminal acetylated proteins.

Section snippets

Acknowledgements

Supported by National Institute of Health Grant R01 GM12702.

References (76)

  • B. Polevoda et al.

    N-terminal acetylation of eukaryotic proteins

    J. Biol. Chem.

    (2000)
  • R.A. Bradshaw et al.

    N-terminal processing: the methionine aminopeptidase and Nα-acetyl transferase families

    Trends Biochem. Sci.

    (1998)
  • F. Wold

    In vivo chemical modifications of proteins

    Annu. Rev. Biochem.

    (1981)
  • K.-K. Han et al.

    Post-translational chemical modifications of proteins. III. Current developments in analytical procedures of identification and quantitation of post-translational chemically modified amino acid(s) and its derivatives

    Int. J. Biochem.

    (1993)
  • H. Michel et al.

    Tandem mass spectrometry reveals that three photosystem II proteins of spinach chloroplasts contain N-acetyl-O-phosphothreonine at their NH2 termini

    J. Biol. Chem.

    (1988)
  • H.P. Driessen et al.

    The mechanism of N-terminal acetylation of proteins

    CRC Crit. Rev. Biochem.

    (1985)
  • R.L. Kendall et al.

    Cotranslational amino-terminal process

    Methods Enzymol.

    (1990)
  • X. Li et al.

    Amino-terminal protein processing in Saccharomyces cerevisiae is an essential function that requires two distinct methionine aminopeptidases

    Proc. Natl Acad. Sci.

    (1995)
  • C. Giglione et al.

    Identification of eukaryotic peptide deformylases reveals universality of N-terminal protein processing mechanisms

    EMBO J.

    (2000)
  • F. Sherman et al.

    Methionine or not methionine at the beginning of a protein

    BioEssays

    (1985)
  • H.-P. Hirel et al.

    Extent of N-terminal methionine excision from Escherichia coli proteins is governed by the side-chain length of the penultimate amino acid

    Proc. Natl Acad. Sci. USA

    (1989)
  • H. DalbÏge et al.

    In vivo processing of N-terminal methionine in E. coli

    FEBS Letters

    (1990)
  • S. Huang et al.

    Specificity of cotranslational amino-terminal processing of proteins in yeast

    Biochemistry

    (1987)
  • J.P. Boissel et al.

    Cotranslational amino-terminal processing of cytosolic proteins: cell-free expression of site-directed mutants of human hemoglobin

    J. Biol. Chem.

    (1988)
  • W.T. Lowther et al.

    Structure and function of the methionine aminopeptidases

    Biochim. Biophys. Acta

    (2000)
  • B. Persson et al.

    Structures of N-terminally acetylated proteins

    Eur. J. Biochem.

    (1985)
  • H. Jönvall

    Acetylation of protein N-terminal amino groups: structural observations on α-amino acetylated proteins

    J. Theor. Biol.

    (1973)
  • F.J. Lee et al.

    Nα-acetyltransferase deficiency alters protein synthesis in Saccharomyces cerevisiae

    FEBS Letters

    (1989)
  • J.-P. Walker

    The NH2-terminal residues of the proteins from cell-free extracts of E. coli

    J. Mol. Biol.

    (1963)
  • C. Flinta et al.

    Sequence determinants of cytosolic N-terminal protein processing

    Eur. J. Biochem.

    (1986)
  • S. Tsunasawa et al.

    Amino-terminal processing of mutant forms of yeast iso-1-cytochrome c: the specificities of methionine aminopeptides and acetyltransferase

    J. Biol. Chem.

    (1985)
  • R.P. Moerschell et al.

    The specificities of yeast methionine aminopeptidase and acetylation of amino-terminal methionine in vivo: processing of altered iso-1-cytochromes c created by oligonucleotide transformation

    J. Biol. Chem.

    (1990)
  • F. Sherman et al.

    N-terminal acetylation of mutationally altered form of iso-1-cytochromes c in normal and nat1 strains deficient in the major N-terminal acetyl transferase of the yeast Saccharomyces cerevisiae

  • B. Polevoda et al.

    Identification and specificities of N-terminal acetyltransferases from Saccharomyces cerevisiae

    EMBO J.

    (1999)
  • B. Polevoda et al.

    NatC N-terminal acetyltransferase of yeast contains three subunits, Mak3p, Mak10p, and Mak31p

    J. Biol. Chem.

    (2001)
  • J.R. Mullen et al.

    Identification and characterization of genes and mutants for an N-terminal acetyltransferase from yeast

    EMBO J.

    (1989)
  • J.C. Tercero et al.

    MAK3 encodes an N-acetyltransferase whose modification of the L-A gag NH2 terminus is necessary for virus particle assembly

    J. Biol. Chem.

    (1992)
  • J.C. Tercero et al.

    Specificity of the yeast MAK3N-acetyltransferase that modifies gag of the L-A dsRNA virus

    J. Bacteriol.

    (1993)
  • M. Perrot et al.

    Two-dimentional gel protein database of Saccharomyces cerevisiae (update 1999)

    Electrophoresis

    (1999)
  • J.I. Garrels et al.

    Proteome studies of Saccharomyces cerevisiae: identification and characterization of abundant proteins

    Electrophoresis

    (1997)
  • R. Arnold et al.

    The action of N-terminal acetyltransferases on yeast ribosomal proteins

    J. Biol. Chem.

    (1999)
  • Y. Kimura et al.

    Nα-Acetylation and proteolytic activity of the yeast 20 S proteasome

    J. Biol. Chem.

    (2000)
  • M.C. Costanzo et al.

    YPD™, PombePD™, and WormPD™: model organism volumes of the BioKnowledge™ library, an integrated resource for protein information

    Nucl. Acids Res.

    (2001)
  • A. Bairoch et al.

    The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000

    Nucl. Acids Res.

    (2000)
  • A. Yoshikawa et al.

    Cloning and nucleotide sequencing of the genes rimI and rimJ which encode enzymes acetylating ribosomal proteins S18 and S5 of Escherichia coli K12

    Mol. Gen. Genet.

    (1987)
  • S. Tanaka et al.

    Cloning and molecular characterization of the gene rimL which encodes an enzyme acetylating ribosomal protein L12 of Escherichia coli K12

    Mol. Gen. Genet.

    (1989)
  • T. Uchiumi et al.

    Replacement of L7/L12.L10 protein complex in Escherichia coli ribosomes with the eukaryotic counterpart changes the specificity of elongation factor binding

    J. Biol. Chem.

    (1999)
  • T. Itoh

    Primary structure of an acidic ribosomal protein from Micrococcus lysodeikticus

    FEBS Letters

    (1981)
  • Cited by (368)

    • Advances in proteome-wide analysis of plant lysine acetylation

      2022, Plant Communications
      Citation Excerpt :

      In addition, each member of the NAT family has complex preferences for different substrates in N-terminal residues, and protein NTA occurs independent of a single residue type or simple consensus motifs. By contrast, KATs target only lysine, and LysAc occurs at conserved motifs around preferred amino acid residues (Polevoda and Sherman, 2003; Bienvenut et al., 2012). In multicellular organisms, KAT occurs more frequently than NTA (Linster and Wirtz, 2018).

    View all citing articles on Scopus
    View full text