Elsevier

Gene

Volume 261, Issue 1, 30 December 2000, Pages 85-91
Gene

Analysis of oligonucleotide AUG start codon context in eukariotic mRNAs

https://doi.org/10.1016/S0378-1119(00)00471-6Get rights and content

Abstract

The AUG start codon context features have been investigated by analyzing eukaryotic mRNAs belonging to various taxonomic groups. The functional relevance of each specific position surrounding the AUG start codon has been established as a function of the measured shift between base composition observed at that particular position, and base composition averaged over all the 5′untranslated regions. A more detailed analysis carried out on human genes belonging to different isochores showed significant isochore-specific fea-tures that cannot be explained only by a mutational bias effect. The most represented heptamers spanning from position −3 to +4 with respect to the initiator AUG have been determined for mRNAs belonging to different taxonomic groups and a web page utility has been set up (http://bigarea.area.ba.cnr.it:8000/BioWWW/ATG.html) to determine the relative abundance of a user submitted oligonucleotide context in a given species or taxon.

Introduction

The modulation of translation efficiency is one of the major post-transcriptional regulation mechanisms of gene expression. The characteristics of the 5′untranslated regions of the mRNA, such as length, presence of complex secondary structures possibly interacting with specific RNA-binding proteins, presence of upstream AUG (uAUG) or upstream open reading frames (uORFs) may strongly affect the efficiency of translation initiation, which is described as the crucial step in the regulation of translation in an increasing number of reports (Gallie, 1996; Gray and Wickens, 1998; Kozak, 1999; van der Velden and Thomas, 1999).

Translational initiation of the majority of eukaryotic cellular mRNAs starts with the binding of the small 40S ribosomal subunit and of some eukaryotic initiation factors (eIF) to the 7-methylguanosine cap structure (m7G) at the 5′ end of the mRNA. After binding the mRNA, the 40S complex, including also the Met-tRNA and GTP, scans the 5′untranslated region (5′UTR) for the AUG initiation codon where it stops to assemble with the large ribosomal subunits forming the 80S ribosome which begins protein synthesis. Long 5′UTRs, uAUG, uORF or stable secondary structures can severely hamper the scanning process thus down regulating translation efficiency (Gray and Wickens, 1998; Kozak, 1999; van der Velden and Thomas, 1999). It is noteworthy that these suboptimal features of 5′UTRs are often observed in messengers encoding proteins involved in developmental processes, such as growth factors, proto-oncogenes and transcription factors, which require fine tuning of gene expression. Specific secondary structures located in the 5′UTR, denoted as ‘internal ribosome entry sites’ (IRES), can mediate cap-independent internal ribosome binding thus circumventing unfavorable features of 5′UTRs and allowing a more efficient translation under specific physiological conditions (Le and Maizel, 1997; Martinez-Salas, 1999). More recently it has been observed that short oligonucleotides complementary to the 18S rRNA can also function as IRES (Chappell et al., 2000).

The recognition of the correct AUG start codon relies on its oligonucleotide context, which presents some conserved features in all eukaryotes. Previous analyses have shown that the most frequent, and therefore regarded as optimal, start codon context in vertebrate genes is (G/A)CCaugG (Kozak, 1987; Kozak, 1999). The most crucial positions within the context consensus are considered to be purines, usually A, 3 nucleotides (nt) before the AUG start codon (p−3) and G in p+4 with the rest of the context positions, mainly p−2 and p−1, contributing only marginally.

Start codon context deviating from the optimal consensus at one or more of the crucial positions may stop scanning less efficiently and allow, also in response to the peculiar features of the 5′ leader sequence, for a translation regulated by the cellular conditions and requirements.

We have previously shown, by analyzing a non-redundant set of 5814 human genes that mRNAs corresponding to genes located in different isochores show specific features of the 5′UTRs and of the start codon context (Pesole et al., 1999). In particular, genes located in GC-rich isochores have shorter 5′UTRs and stronger avoidance of upstream AUG than genes located in GC-poor isochores thus suggesting an higher incidence of translation regulated genes in GC-poor isochores.

We report here the analysis of the 5′UTRs and of the start codon context in eukaryotic mRNAs subdivided in some taxonomic groups. In particular the functional relevance of the different context positions has been measured as a function of the shift between the base composition measured at that given position and that averaged over the entire 5′UTR. Our analysis shows a strong compositional shift, particularly evident from p−3 to p+4, in all eukaryotic mRNAs thus suggesting some functional relevance also for p−2 and p−1, which have been so far considered to contribute only marginally to the context quality.

Furthermore, the same analysis carried out separately on human genes belonging to different isochores, confirming previously observed isochore-specific features, provides a measure of the functional constraints under the control of natural selection acting on each site of the start codon context.

Section snippets

Materials and methods

The UTRdb specialized database (Pesole et al., 2000b) was used as the source of sequence data for the present study. It contains non-redundant collections of 5′ and 3′ untranslated sequences of eukaryotic mRNAs split in seven taxonomic divisions following the structure of the EMBL datalibrary, namely human, other mammals, rodents, other vertebrates, invertebrates, plants and fungi. We further divided the collection of ‘other vertebrates’ into warm and cold-blooded vertebrates, and the

Occurrence of uAUG, uORFs and IRES in 5′UTRs

The scanning model (Kozak, 1987; Kozak, 1989; Kozak, 1999) predicts that in most natural eukaryotic mRNAs the translation initiates at the first AUG encountered by the 40S ribosomal subunit starting from the 5′m7G cap. In order to evaluate the general validity of this principle, denoted as ‘first AUG rule’ we calculated the percentage of AUG containing 5′UTRs in mRNAs belonging to different taxonomic groups (Table 1). It is evident that such a rule is not obeyed in a remarkable fraction of

Acknowledgements

This work was supported by Training and Mobility Research European project ERB-FMRX-CT98-0221, Programma Biotecnologie legge 95/95 (MURST 5%), ‘Progetto Strategico Genetica Molecolare’ and PRIN project ‘Bioinformatics and Genomics’ (MURST).

References (18)

There are more references available in the full text version of this article.

Cited by (63)

  • Conservation and Variability of the AUG Initiation Codon Context in Eukaryotes

    2019, Trends in Biochemical Sciences
    Citation Excerpt :

    Whereas monocot mRNAs contain higher frequencies of −3G/+4G nucleotides, −3A/+4G are more frequent in dicot mRNAs. It should be pointed out that the 5′-UTRs of monocot mRNAs are GC-rich, whereas those of dicots are AU-rich sequences [31–33,40,41,43]. Moreover, we have noticed that, among the consensus sequences of the plant species here scrutinized, the −2M position is well conserved (Table 2).

  • Direct Head-to-Head Evaluation of Recombinant Adeno-associated Viral Vectors Manufactured in Human versus Insect Cells

    2017, Molecular Therapy
    Citation Excerpt :

    Randomly modifying nucleotides up- or downstream of the AUG would not be a realistic approach because the complexity of the possible TIS sequences spanning the relevant stretch of eight residues is 65,536 possible permutations. Moreover, the consensus Kozak sequence appears to be different for yeast,14 higher plants,15 invertebrates,16 or vertebrates.17 Therefore, one way to rationalize the screening of attenuated TISs was to utilize the empirical heatmap of all possible mammalian TIS permutations derived by Noderer et al.18 whereby all possible combinations of TISs were assigned “initiation efficiency” values relative to the consensus Kozak sequence.

  • Transcription and translation in a package deal: The TISU paradigm

    2012, Gene
    Citation Excerpt :

    A weak stem–loop(s) downstream of the AUG may enhance translation fidelity (Elfakess et al., 2011; Kozak, 1990) as it causes 40S subunit pausing which provides sufficient time for the P site to be properly arranged over the AUG codon. The prevalence of uAUGs in mRNAs is higher than one would expect, as it has been estimated that nearly 50% of all human and Drosophila mRNAs contains uAUG(s) in their 5′ UTR (Davuluri et al., 2000; Medenbach et al., 2011; Pesole et al., 2000; Rogozin et al., 2001; Suzuki et al., 2000). Considering the cap‐dependent ribosome scanning mechanism, at least some of these uAUGs would be expected to inhibit translation from the major ORF, and indeed this has been demonstrated in several specific cases (see for example Hinnebusch, 1997; Medenbach et al., 2011; Meijer and Thomas, 2002; Morris and Geballe, 2000).

View all citing articles on Scopus
View full text