Messenger RNA processing sites in Trypanosoma brucei
Introduction
The Kinetoplastid protists and their close relatives are the only eukaryotes known in which trans splicing is an obligatory step in maturation of all mRNAs. In Kinetoplastids, transcription of protein-coding genes by RNA polymerase II is polycistronic [1], [2], [3] and polycistrons can be many tens of kb long [3], [4], [5]. Although RNA polymerase II appears to initiate transcription at the boundaries between divergent polycistrons, or between polycistrons and regions transcribed by other RNA polymerases, attempts to identify consensus promoter sequences have failed [6], [7] and multiple initiation sites can be used for transcription of a single locus [5]. In addition, some protein-coding gene arrays, such as the VSG and procyclin expression sites, are transcribed by RNA polymerase I ([8] and references therein).
Mature mRNAs must be monocistronic, and require a cap and a poly(A) tail, in order to be exported from the nucleus and translated efficiently. In Kinetoplastids monocistronic mRNAs are generated by processing of the polycistronic precursors. Capped 5′-ends are formed by a trans splicing reaction in which a capped spliced leader (SL) sequence [9], [10], [11] of about 40 nt is added to the 5′-end of each mRNA [12]. As in cis splicing, the acceptor site is an AG dinucleotide, which is preceded by a polypyrimidine tract of variable length [12], [13], [14]. Addition of a poly(A) tail is required to complete mRNA processing. There is no consensus polyadenylation signal in the 3′-untranslated region (3′-UTR). Instead, evidence obtained from a very small number of loci suggests that polyadenylation occurs within a short region (rather than at a single defined site) 100–400 nt upstream of the next polypyrimidine trans splice signal [13], [15], [16], [17], [18]. In these experiments, plasmids bearing reporter genes were introduced into trypanosomes, and the exact sites of polyadenylation of the reporter transcripts were determined. Changes in the sequences downstream of the reporter gene consistently showed dependence of polyadenylation on the position of a downstream trans splicing acceptor site.
Experiments with permeabilised trypanosomes indicate that trans splicing of a transcript precedes the polyadenylation of its upstream partner and that inhibition of trans splicing leads to concomitant loss of polyadenylation [19]. Abrogation of either one of these mutually dependent RNA processing reactions, either through heat shock [20], RNA interference directed against processing factors [21], or sinefungin treatment (which inhibits trans splicing by preventing methylation of the SL RNA cap [22]) results in the accumulation of dicistronic or longer transcripts [23].
Although very broad outlines of the requirements for mRNA processing have been recognised, details have so far not been examined and the number of loci studied in detail has been very small and restricted to highly abundant transcripts. In addition, it is known that features in addition to polypyrimidine tracts are required for trans splicing and polyadenylation to occur. The intergenic regions of trypanosome transcripts very often contain multiple polypyrimidine tracts, yet a preference for the tract immediately preceding an open reading frame (ORF) has been observed [24]. It is not clear whether the ORF, the 5′-UTR [25], or both are important in determining this preference. In contrast, trans splicing to AG sites lacking any obvious downstream ORF can also occur and direct polyadenylation of the upstream transcript [15]. This can result in the generation of ORF-less processed RNAs and provides a possible explanation for the means by which the most 3′ mRNA in a polycistron is polyadenylated.
Conventional gene-recognition programmes identify many potential trypanosome ORFs which will never be translated into a functional protein because they lack the requisite mRNA processing signals. Identification of these signals will enable genome-wide recognition of the splice sites that serve as indicators of the functionality of potential ORFs [26]. A good algorithm for the prediction of trans splicing signals would therefore be a very useful tool for annotating kinetoplastid genomes as well as determining untranslated regions of mature transcripts.
In the work presented here, we have analysed all available splicing and polyadenylation sites from T. brucei for common sequence signals and used the information to generate a prediction algorithm for trypanosome mRNA processing sites.
Section snippets
In silico mapping of mRNA processing sites
Our starting dataset consisted of all available trypanosome cDNA sequences in the EMBL database (downloaded from HYPERLINK “http://srs.ebi.ac.uk/” http://srs.ebi.ac.uk using “mRNA” and “Trypanosoma brucei” as keywords) and raw cDNA sequences generated by Shahi and co-workers [27]. Poly(A) tails were identified as stretches of at least 10 contiguous A or T residues at the ends of sequences or bordering polylinkers, while trans splicing acceptor sites were identified by the presence of at least 11
Characteristics of trans splice acceptor sites and 5′-UTRs
We found a total of 124 trans splice acceptor sites by searching all available T. brucei cDNA sequences for at least 9 nt of the spliced leader sequence (or its complement), or for other evidence that the cDNA was full-length (see Section 2). The 5′-UTR lengths ranged from 7 nt (40S ribosomal protein S16, Tb07.29K4.40), to 1030 nt (conserved hypothetical protein, Tb07.22O10.6200). However, 92% of the 5′-UTRs were smaller than 300 nt; the median length was 68 nt and the mean was 129 (Fig. 1A). Two
Conclusions
By analysing all available cDNA sequences from T. brucei, we found that trans splicing generally occurs at the first AG following a polypyrimidine tract of 8–25 nt, as for cis splicing in other species. From previous experimental data, we do not know the minimum length of the polypyrimidine tract, to what extent interrupting purines are tolerated, or how variations affect splicing efficiency. We also do not know if there is a requirement for a degenerate branch-point consensus sequence, as too
Acknowledgements
We thank Christiane Hertz-Fowler for help with GeneDB. This work was supported by the Deutsche Forschungsgemeinschaft (D.L.G. & C.B.) and the Swedish Technology Research Council (D.N.).
References (41)
- et al.
Post-transcriptional control of the differential expression of phosphoglycerate kinase genes in Trypanosoma brucei
J Mol Biol
(1988) - et al.
Transcription of Leishmania major Friedlin chromosome 1 initiates in both directions within a single region
Mol Cell
(2003) - et al.
Discontinuously synthesized RNA from Trypanosoma brucei contains the highly methylated 5′ cap structure, m7GpppA*A*C(2′(-O)mU*A
J Biol Chem
(1988) - et al.
Identification of a small RNA containing the trypanosome spliced leader donor of the shared 5′ sequences of trypanosome mRNAs
Cell
(1984) - et al.
Heat-shock disruption of trans-splicing in trypanosomes: effect on hsp70, hsp85 and tubulin mRNA
Gene
(1989) - et al.
TbCPSF30 depletion by RNA interference disrupts polycistronic RNA processing in Trypanosoma brucei
J Biol Chem
(2003) - et al.
Effect of multiple downstream splice sites on polyadenylation in Trypanosoma brucei
Mol Biochem Parasitol
(1998) - et al.
Post-transcriptional elements regulating expression of mRNAs from the amastin/tuzin gene cluster of Trypanosoma cruzi
J Biol Chem
(1995) - et al.
Biochemical and functional characterization of the cis-spliceosomal U1 small nuclear RNP from Trypanosoma brucei
Mol Biochem Parasitol
(2002) - et al.
Tests of heterologous promoters and intergenic regions in Leishmania major
Mol Biochem Parasitol
(2000)
Polygene transcripts are precursors to calmodulin mRNAs in trypanosomes
EMBO J
The unusual gene organization of Leishmania major chromosome 1 may reflect novel transcription processes
Nucleic Acids Res
Leishmania major Friedlin chromosome 1 has an unusual distribution of protein-coding genes
Proc Natl Acad Sci USA
The sequence and analysis of Trypanosoma brucei chromosome II
Nucleic Acids Res
The DNA sequence of chromosome I of an African trypanosome: gene content, chromosome organisation, recombination and polymorphism
Nucleic Acids Res
RNA polymerase I transcribes procyclin genes and variant surface glycoprotein gene expression sites in Trypanosoma brucei
Eukaryot Cell
Trypanosome mRNAs have unusual “cap 4” structures acquired by addition of a spliced leader
Proc Natl Acad Sci USA
Trans and cis splicing in trypanosomatids: mechanism, factors, and regulation
Euk Cell
A common pyrimidine-rich motif governs trans-splicing and polyadenylation of tubulin polycistronic pre-mRNA in trypanosomes
Genes Dev
Requirement of a poly-pyrimidine tract for trans-splicing in trypanosomes: discriminating the PARP promoter from the immediately adjacent 3′ splice acceptor site
EMBO J
Cited by (72)
Gene expression in Trypanosoma brucei: Lessons from high-throughput RNA sequencing
2011, Trends in ParasitologyCitation Excerpt :Experiments in T. brucei show that depletion of Ccr4-associated factor 1 (CAF1), an enzyme that degrades poly(A) tails, increases mRNA stability [23]. A detailed mutational analysis revealed that the same sequence motifs affect polyadenylation of one gene and trans splicing of the neighboring downstream gene in the same PTU, suggesting that both reactions are functionally coupled [4,21]. The organization of genes in large PTUs provides no obvious way of regulating the transcription of individual genes.
Transcription of long hypothetical orfs in Trypanosoma cruzi: The epimastigote stage uses trans-splicing sites that generate short 5' UTRs
2011, Experimental ParasitologyCitation Excerpt :In trypanosomatids, these segments convey precise information about the transcriptional landmark of the genes, and point out the beginning and the end of the full transcript. Analysis of UTRs from T. cruzi and other trypanosomatids have been reported for genes with orf lengths no longer than 5000 base pairs (bp) (Benz et al., 2005; Campos et al., 2008; Brandão and Jiang, 2009). The UTR lengths vary in accordance with the extent of the gene.
Development of a dual reporter system to identify regulatory cis-acting elements in untranslated regions of Trypanosoma cruzi mRNAs
2011, Parasitology InternationalCitation Excerpt :Whilst the AAUAAA eukaryotic sequence consensus for polyadenylation is not found in trypanosomatids, a polypyrimidine-rich tract located within intergenic regions guides SL addition and polyadenylation, resulting in mature mRNAs. Sequence requirements involved in the T. cruzi, T. brucei and Leishmania mRNA processing have been recently investigated by comparing expressed sequence tags or EST and cDNA sequences with genomic sequences [8–10]. These studies allowed determination of SL addition site and polyadenylation average distances from the polypyrimidine tract, as well as the median lengths of 5′ and 3′ untranslated regions (UTR) [10].
The Genome and Its Implications
2011, Advances in ParasitologyCitation Excerpt :Whilst no sequence consensus for polyadenylation or SL addition was found, several studies have demonstrated that polypyrimidine-rich tracts located within intergenic regions guide both reactions, SL addition and polyadenylation, resulting in mature mRNAs (Liang et al., 2003; Fig. 10.2). Sequence requirements involved in the trypanosomatid mRNA processing have been more thoroughly investigated by comparing ESTs and/or cDNAs with genomic sequences (Benz et al., 2005; Campos et al., 2008; Smith et al., 2008). More recently, using high-throughput RNA sequencing, or RNAseq, other groups identified 5′ splice-acceptor sites and polyadenylation sites for a large number of T. brucei genes, revealing an extensive heterogeneity of 5′- and 3′-ends of the respective mRNAs (Siegel et al., 2010).
Sequence-based functional annotation: what if most of the genes are unique to a genome?
2010, Trends in Parasitology