Analysis of oligonucleotide AUG start codon context in eukariotic mRNAs
Introduction
The modulation of translation efficiency is one of the major post-transcriptional regulation mechanisms of gene expression. The characteristics of the 5′untranslated regions of the mRNA, such as length, presence of complex secondary structures possibly interacting with specific RNA-binding proteins, presence of upstream AUG (uAUG) or upstream open reading frames (uORFs) may strongly affect the efficiency of translation initiation, which is described as the crucial step in the regulation of translation in an increasing number of reports (Gallie, 1996; Gray and Wickens, 1998; Kozak, 1999; van der Velden and Thomas, 1999).
Translational initiation of the majority of eukaryotic cellular mRNAs starts with the binding of the small 40S ribosomal subunit and of some eukaryotic initiation factors (eIF) to the 7-methylguanosine cap structure (m7G) at the 5′ end of the mRNA. After binding the mRNA, the 40S complex, including also the Met-tRNA and GTP, scans the 5′untranslated region (5′UTR) for the AUG initiation codon where it stops to assemble with the large ribosomal subunits forming the 80S ribosome which begins protein synthesis. Long 5′UTRs, uAUG, uORF or stable secondary structures can severely hamper the scanning process thus down regulating translation efficiency (Gray and Wickens, 1998; Kozak, 1999; van der Velden and Thomas, 1999). It is noteworthy that these suboptimal features of 5′UTRs are often observed in messengers encoding proteins involved in developmental processes, such as growth factors, proto-oncogenes and transcription factors, which require fine tuning of gene expression. Specific secondary structures located in the 5′UTR, denoted as ‘internal ribosome entry sites’ (IRES), can mediate cap-independent internal ribosome binding thus circumventing unfavorable features of 5′UTRs and allowing a more efficient translation under specific physiological conditions (Le and Maizel, 1997; Martinez-Salas, 1999). More recently it has been observed that short oligonucleotides complementary to the 18S rRNA can also function as IRES (Chappell et al., 2000).
The recognition of the correct AUG start codon relies on its oligonucleotide context, which presents some conserved features in all eukaryotes. Previous analyses have shown that the most frequent, and therefore regarded as optimal, start codon context in vertebrate genes is (G/A)CCaugG (Kozak, 1987; Kozak, 1999). The most crucial positions within the context consensus are considered to be purines, usually A, 3 nucleotides (nt) before the AUG start codon (p−3) and G in p+4 with the rest of the context positions, mainly p−2 and p−1, contributing only marginally.
Start codon context deviating from the optimal consensus at one or more of the crucial positions may stop scanning less efficiently and allow, also in response to the peculiar features of the 5′ leader sequence, for a translation regulated by the cellular conditions and requirements.
We have previously shown, by analyzing a non-redundant set of 5814 human genes that mRNAs corresponding to genes located in different isochores show specific features of the 5′UTRs and of the start codon context (Pesole et al., 1999). In particular, genes located in GC-rich isochores have shorter 5′UTRs and stronger avoidance of upstream AUG than genes located in GC-poor isochores thus suggesting an higher incidence of translation regulated genes in GC-poor isochores.
We report here the analysis of the 5′UTRs and of the start codon context in eukaryotic mRNAs subdivided in some taxonomic groups. In particular the functional relevance of the different context positions has been measured as a function of the shift between the base composition measured at that given position and that averaged over the entire 5′UTR. Our analysis shows a strong compositional shift, particularly evident from p−3 to p+4, in all eukaryotic mRNAs thus suggesting some functional relevance also for p−2 and p−1, which have been so far considered to contribute only marginally to the context quality.
Furthermore, the same analysis carried out separately on human genes belonging to different isochores, confirming previously observed isochore-specific features, provides a measure of the functional constraints under the control of natural selection acting on each site of the start codon context.
Section snippets
Materials and methods
The UTRdb specialized database (Pesole et al., 2000b) was used as the source of sequence data for the present study. It contains non-redundant collections of 5′ and 3′ untranslated sequences of eukaryotic mRNAs split in seven taxonomic divisions following the structure of the EMBL datalibrary, namely human, other mammals, rodents, other vertebrates, invertebrates, plants and fungi. We further divided the collection of ‘other vertebrates’ into warm and cold-blooded vertebrates, and the
Occurrence of uAUG, uORFs and IRES in 5′UTRs
The scanning model (Kozak, 1987; Kozak, 1989; Kozak, 1999) predicts that in most natural eukaryotic mRNAs the translation initiates at the first AUG encountered by the 40S ribosomal subunit starting from the 5′m7G cap. In order to evaluate the general validity of this principle, denoted as ‘first AUG rule’ we calculated the percentage of AUG containing 5′UTRs in mRNAs belonging to different taxonomic groups (Table 1). It is evident that such a rule is not obeyed in a remarkable fraction of
Acknowledgements
This work was supported by Training and Mobility Research European project ERB-FMRX-CT98-0221, Programma Biotecnologie legge 95/95 (MURST 5%), ‘Progetto Strategico Genetica Molecolare’ and PRIN project ‘Bioinformatics and Genomics’ (MURST).
References (18)
Isochores and the evolutionary genomics of vertebrates
Gene
(2000)Initiation of translation in prokaryotes and eukaryotes
Gene
(1999)Internal ribosome entry site biology and its use in expression vectors
Curr. Opin. Biotechnol.
(1999)- et al.
Isochore specificity of AUG initiator context of human genes
FEBS Lett.
(1999) - et al.
Statistical analysis of the 5′ untranslated region of human mRNA using ‘Oligo-Capped’ cDNA libraries
Genomics
(2000) - et al.
The role of the 5′ untranslated region of an mRNA in translation regulation during development
Int. J. Biochem. Cell Biol.
(1999) The human genome: organization and evolutionary history
Annu. Rev. Genet.
(1995)- et al.
A 9-nt segment of a cellular mRNA can function as an internal ribosome entry site (IRES) and when present in linked multiple copies greatly enhances IRES activity
Proc. Natl. Acad. Sci. USA
(2000) Translational control of cellular and viral mRNAs
Plant Mol. Biol.
(1996)
Cited by (63)
Conservation and Variability of the AUG Initiation Codon Context in Eukaryotes
2019, Trends in Biochemical SciencesCitation Excerpt :Whereas monocot mRNAs contain higher frequencies of −3G/+4G nucleotides, −3A/+4G are more frequent in dicot mRNAs. It should be pointed out that the 5′-UTRs of monocot mRNAs are GC-rich, whereas those of dicots are AU-rich sequences [31–33,40,41,43]. Moreover, we have noticed that, among the consensus sequences of the plant species here scrutinized, the −2M position is well conserved (Table 2).
Direct Head-to-Head Evaluation of Recombinant Adeno-associated Viral Vectors Manufactured in Human versus Insect Cells
2017, Molecular TherapyCitation Excerpt :Randomly modifying nucleotides up- or downstream of the AUG would not be a realistic approach because the complexity of the possible TIS sequences spanning the relevant stretch of eight residues is 65,536 possible permutations. Moreover, the consensus Kozak sequence appears to be different for yeast,14 higher plants,15 invertebrates,16 or vertebrates.17 Therefore, one way to rationalize the screening of attenuated TISs was to utilize the empirical heatmap of all possible mammalian TIS permutations derived by Noderer et al.18 whereby all possible combinations of TISs were assigned “initiation efficiency” values relative to the consensus Kozak sequence.
Mutations affecting synaptic levels of neurexin-1β in autism and mental retardation
2012, Neurobiology of DiseaseTranscription and translation in a package deal: The TISU paradigm
2012, GeneCitation Excerpt :A weak stem–loop(s) downstream of the AUG may enhance translation fidelity (Elfakess et al., 2011; Kozak, 1990) as it causes 40S subunit pausing which provides sufficient time for the P site to be properly arranged over the AUG codon. The prevalence of uAUGs in mRNAs is higher than one would expect, as it has been estimated that nearly 50% of all human and Drosophila mRNAs contains uAUG(s) in their 5′ UTR (Davuluri et al., 2000; Medenbach et al., 2011; Pesole et al., 2000; Rogozin et al., 2001; Suzuki et al., 2000). Considering the cap‐dependent ribosome scanning mechanism, at least some of these uAUGs would be expected to inhibit translation from the major ORF, and indeed this has been demonstrated in several specific cases (see for example Hinnebusch, 1997; Medenbach et al., 2011; Meijer and Thomas, 2002; Morris and Geballe, 2000).