Messenger RNA processing sites in Trypanosoma brucei

https://doi.org/10.1016/j.molbiopara.2005.05.008Get rights and content

Abstract

In Kinetoplastids, protein-coding genes are transcribed polycistronically by RNA polymerase II. Individual mature mRNAs are generated from polycistronic precursors by 5′ trans splicing of a 39-nt capped leader RNA and 3′ polyadenylation. It was previously known that trans splicing generally occurs at an AG dinucleotide downstream of a polypyrimidine tract, and that polyadenylation is coupled to downstream trans splicing. The few polyadenylation sites that had been examined were 100–400 nt upstream of the polypyrimidine tract which marked the adjacent trans splice site. We wished to define the sequence requirements for trypanosome mRNA processing more tightly and to generate a predictive algorithm.

By scanning all available Trypanosoma brucei cDNAs for splicing and polyadenylation sites, we found that trans splicing generally occurs at the first AG following a polypyrimidine tract of 8–25 nt, giving rise to 5′-UTRs of a median length of 68 nt. We also found that in general, polyadenylation occurs at a position with one or more A residues located between 80 and 140 nt from the downstream polypyrimidine tract. These data were used to calibrate free parameters in a grammar model with distance constraints, enabling prediction of polyadenylation and trans splice sites for most protein-coding genes in the trypanosome genome. The data from the genome analysis and the program are available from: http://web.cgb.ki.se/daniel/splicemodel.php.

Introduction

The Kinetoplastid protists and their close relatives are the only eukaryotes known in which trans splicing is an obligatory step in maturation of all mRNAs. In Kinetoplastids, transcription of protein-coding genes by RNA polymerase II is polycistronic [1], [2], [3] and polycistrons can be many tens of kb long [3], [4], [5]. Although RNA polymerase II appears to initiate transcription at the boundaries between divergent polycistrons, or between polycistrons and regions transcribed by other RNA polymerases, attempts to identify consensus promoter sequences have failed [6], [7] and multiple initiation sites can be used for transcription of a single locus [5]. In addition, some protein-coding gene arrays, such as the VSG and procyclin expression sites, are transcribed by RNA polymerase I ([8] and references therein).

Mature mRNAs must be monocistronic, and require a cap and a poly(A) tail, in order to be exported from the nucleus and translated efficiently. In Kinetoplastids monocistronic mRNAs are generated by processing of the polycistronic precursors. Capped 5′-ends are formed by a trans splicing reaction in which a capped spliced leader (SL) sequence [9], [10], [11] of about 40 nt is added to the 5′-end of each mRNA [12]. As in cis splicing, the acceptor site is an AG dinucleotide, which is preceded by a polypyrimidine tract of variable length [12], [13], [14]. Addition of a poly(A) tail is required to complete mRNA processing. There is no consensus polyadenylation signal in the 3′-untranslated region (3′-UTR). Instead, evidence obtained from a very small number of loci suggests that polyadenylation occurs within a short region (rather than at a single defined site) 100–400 nt upstream of the next polypyrimidine trans splice signal [13], [15], [16], [17], [18]. In these experiments, plasmids bearing reporter genes were introduced into trypanosomes, and the exact sites of polyadenylation of the reporter transcripts were determined. Changes in the sequences downstream of the reporter gene consistently showed dependence of polyadenylation on the position of a downstream trans splicing acceptor site.

Experiments with permeabilised trypanosomes indicate that trans splicing of a transcript precedes the polyadenylation of its upstream partner and that inhibition of trans splicing leads to concomitant loss of polyadenylation [19]. Abrogation of either one of these mutually dependent RNA processing reactions, either through heat shock [20], RNA interference directed against processing factors [21], or sinefungin treatment (which inhibits trans splicing by preventing methylation of the SL RNA cap [22]) results in the accumulation of dicistronic or longer transcripts [23].

Although very broad outlines of the requirements for mRNA processing have been recognised, details have so far not been examined and the number of loci studied in detail has been very small and restricted to highly abundant transcripts. In addition, it is known that features in addition to polypyrimidine tracts are required for trans splicing and polyadenylation to occur. The intergenic regions of trypanosome transcripts very often contain multiple polypyrimidine tracts, yet a preference for the tract immediately preceding an open reading frame (ORF) has been observed [24]. It is not clear whether the ORF, the 5′-UTR [25], or both are important in determining this preference. In contrast, trans splicing to AG sites lacking any obvious downstream ORF can also occur and direct polyadenylation of the upstream transcript [15]. This can result in the generation of ORF-less processed RNAs and provides a possible explanation for the means by which the most 3′ mRNA in a polycistron is polyadenylated.

Conventional gene-recognition programmes identify many potential trypanosome ORFs which will never be translated into a functional protein because they lack the requisite mRNA processing signals. Identification of these signals will enable genome-wide recognition of the splice sites that serve as indicators of the functionality of potential ORFs [26]. A good algorithm for the prediction of trans splicing signals would therefore be a very useful tool for annotating kinetoplastid genomes as well as determining untranslated regions of mature transcripts.

In the work presented here, we have analysed all available splicing and polyadenylation sites from T. brucei for common sequence signals and used the information to generate a prediction algorithm for trypanosome mRNA processing sites.

Section snippets

In silico mapping of mRNA processing sites

Our starting dataset consisted of all available trypanosome cDNA sequences in the EMBL database (downloaded from HYPERLINK “http://srs.ebi.ac.uk/http://srs.ebi.ac.uk using “mRNA” and “Trypanosoma brucei” as keywords) and raw cDNA sequences generated by Shahi and co-workers [27]. Poly(A) tails were identified as stretches of at least 10 contiguous A or T residues at the ends of sequences or bordering polylinkers, while trans splicing acceptor sites were identified by the presence of at least 11

Characteristics of trans splice acceptor sites and 5′-UTRs

We found a total of 124 trans splice acceptor sites by searching all available T. brucei cDNA sequences for at least 9 nt of the spliced leader sequence (or its complement), or for other evidence that the cDNA was full-length (see Section 2). The 5′-UTR lengths ranged from 7 nt (40S ribosomal protein S16, Tb07.29K4.40), to 1030 nt (conserved hypothetical protein, Tb07.22O10.6200). However, 92% of the 5′-UTRs were smaller than 300 nt; the median length was 68 nt and the mean was 129 (Fig. 1A). Two

Conclusions

By analysing all available cDNA sequences from T. brucei, we found that trans splicing generally occurs at the first AG following a polypyrimidine tract of 8–25 nt, as for cis splicing in other species. From previous experimental data, we do not know the minimum length of the polypyrimidine tract, to what extent interrupting purines are tolerated, or how variations affect splicing efficiency. We also do not know if there is a requirement for a degenerate branch-point consensus sequence, as too

Acknowledgements

We thank Christiane Hertz-Fowler for help with GeneDB. This work was supported by the Deutsche Forschungsgemeinschaft (D.L.G. & C.B.) and the Swedish Technology Research Council (D.N.).

References (41)

  • C. Tschudi et al.

    Polygene transcripts are precursors to calmodulin mRNAs in trypanosomes

    EMBO J

    (1988)
  • P.D. McDonagh et al.

    The unusual gene organization of Leishmania major chromosome 1 may reflect novel transcription processes

    Nucleic Acids Res

    (2000)
  • P.J. Myler et al.

    Leishmania major Friedlin chromosome 1 has an unusual distribution of protein-coding genes

    Proc Natl Acad Sci USA

    (1999)
  • N.M. El-Sayed et al.

    The sequence and analysis of Trypanosoma brucei chromosome II

    Nucleic Acids Res

    (2003)
  • N. Hall et al.

    The DNA sequence of chromosome I of an African trypanosome: gene content, chromosome organisation, recombination and polymorphism

    Nucleic Acids Res

    (2003)
  • A. Gunzl et al.

    RNA polymerase I transcribes procyclin genes and variant surface glycoprotein gene expression sites in Trypanosoma brucei

    Eukaryot Cell

    (2003)
  • K.L. Perry et al.

    Trypanosome mRNAs have unusual “cap 4” structures acquired by addition of a spliced leader

    Proc Natl Acad Sci USA

    (1987)
  • X. Liang et al.

    Trans and cis splicing in trypanosomatids: mechanism, factors, and regulation

    Euk Cell

    (2003)
  • K.R. Matthews et al.

    A common pyrimidine-rich motif governs trans-splicing and polyadenylation of tubulin polycistronic pre-mRNA in trypanosomes

    Genes Dev

    (1994)
  • J. Huang et al.

    Requirement of a poly-pyrimidine tract for trans-splicing in trypanosomes: discriminating the PARP promoter from the immediately adjacent 3′ splice acceptor site

    EMBO J

    (1991)
  • Cited by (72)

    • Gene expression in Trypanosoma brucei: Lessons from high-throughput RNA sequencing

      2011, Trends in Parasitology
      Citation Excerpt :

      Experiments in T. brucei show that depletion of Ccr4-associated factor 1 (CAF1), an enzyme that degrades poly(A) tails, increases mRNA stability [23]. A detailed mutational analysis revealed that the same sequence motifs affect polyadenylation of one gene and trans splicing of the neighboring downstream gene in the same PTU, suggesting that both reactions are functionally coupled [4,21]. The organization of genes in large PTUs provides no obvious way of regulating the transcription of individual genes.

    • Transcription of long hypothetical orfs in Trypanosoma cruzi: The epimastigote stage uses trans-splicing sites that generate short 5' UTRs

      2011, Experimental Parasitology
      Citation Excerpt :

      In trypanosomatids, these segments convey precise information about the transcriptional landmark of the genes, and point out the beginning and the end of the full transcript. Analysis of UTRs from T. cruzi and other trypanosomatids have been reported for genes with orf lengths no longer than 5000 base pairs (bp) (Benz et al., 2005; Campos et al., 2008; Brandão and Jiang, 2009). The UTR lengths vary in accordance with the extent of the gene.

    • Development of a dual reporter system to identify regulatory cis-acting elements in untranslated regions of Trypanosoma cruzi mRNAs

      2011, Parasitology International
      Citation Excerpt :

      Whilst the AAUAAA eukaryotic sequence consensus for polyadenylation is not found in trypanosomatids, a polypyrimidine-rich tract located within intergenic regions guides SL addition and polyadenylation, resulting in mature mRNAs. Sequence requirements involved in the T. cruzi, T. brucei and Leishmania mRNA processing have been recently investigated by comparing expressed sequence tags or EST and cDNA sequences with genomic sequences [8–10]. These studies allowed determination of SL addition site and polyadenylation average distances from the polypyrimidine tract, as well as the median lengths of 5′ and 3′ untranslated regions (UTR) [10].

    • The Genome and Its Implications

      2011, Advances in Parasitology
      Citation Excerpt :

      Whilst no sequence consensus for polyadenylation or SL addition was found, several studies have demonstrated that polypyrimidine-rich tracts located within intergenic regions guide both reactions, SL addition and polyadenylation, resulting in mature mRNAs (Liang et al., 2003; Fig. 10.2). Sequence requirements involved in the trypanosomatid mRNA processing have been more thoroughly investigated by comparing ESTs and/or cDNAs with genomic sequences (Benz et al., 2005; Campos et al., 2008; Smith et al., 2008). More recently, using high-throughput RNA sequencing, or RNAseq, other groups identified 5′ splice-acceptor sites and polyadenylation sites for a large number of T. brucei genes, revealing an extensive heterogeneity of 5′- and 3′-ends of the respective mRNAs (Siegel et al., 2010).

    View all citing articles on Scopus
    View full text