Intragenomic spliced leader RNA array analysis of kinetoplastids reveals unexpected transcribed region diversity in Trypanosoma cruzi
Introduction
Comprised of free-living, commensal and parasitic organisms, members of the order Kinetoplastidae are found associated with a broad range of plants and animals. Among the many peculiarities of kinetoplastid biology is their reliance on a process known as trans-splicing, wherein a 39-nt spliced leader (SL) is transferred from the SL RNA to the 5′ end of every nucleus-encoded mRNA, allowing the generation of monocistronic mRNA from polycistronic precursors (Campbell et al., 2003). SL RNA are found as tandem repeats present on the order of one hundred copies per array (Aksoy et al., 1992).
Each SL RNA repeat contains a well-defined promoter, a transcribed region including an exon and an intron, a T-tract transcription terminator , and a non-transcribed intergenic spacer (Campbell et al., 2003). The 39-nt SL length and sequence is conserved among all but four characterized kinetoplastids. The intron is variable in both sequence and length but forms a consistent secondary structure including three stem-loops (Bruzik et al., 1988). The intergenic region is divergent in sequence content and size, and in some species contains an active 5S rRNA (Bruzik et al., 1988).
The SL RNA (a.k.a. mini-exon) repeat is a PCR marker for kinetoplastid diversity (Podlipaev et al., 2004, Westenberger et al., 2004). SL RNA makes an ideal target for preliminary analyses within the trypanosomatids by PCR as it is a multicopy gene with a highly conserved sequence. In general, PCR is performed on the query species using universal SL RNA primers based on the nearly invariant SL sequence (Murthy et al., 1992), from which products are typically cloned and sequenced. Minimal sequence variation among repeats within the array is assumed, and when a single repeat unit is sequenced the exon sequence underlying the amplification primers is obscured. When multiple or tandem repeats were analyzed, slight variations were documented in the SL RNA repeat sequence in Leishmania donovani (Fernandes et al., 1994) and Phytomonas (Dollet et al., 2001).
Since each cloned repeat represents approximately 1% of the SL RNA sequence information in a cell, the use of a single random sequence could skew cluster analysis in an environment of high intragenomic variability (Harris and Crandall, 2000). A straightforward way to evaluate this possibility is to analyze all repeats within an array.
In this study we obtained data generated by the Leishmania major Genome Project and the Trypanosoma cruzi Genome Project, collaborative efforts of The Sanger Institute, The Institute for Genome Research, Seattle Biomedical Research Institute, and the Karolinska Institute, to screen for L. major and T. cruzi SL RNA repeats. All repeats from L. major or T. cruzi arrays cluster within their species and discrete typing units (DTUs) (Tibayrenc, 1995), respectively, indicating that the use of SL RNA is a reliable marker for kinetoplastid diversity. Unexpectedly, a heterogeneous SL RNA population composed of two major sequence classes and a spectrum of intermediate classes were found in T. cruzi strain CL Brener.
Section snippets
Sources of sequence information
The L. major sequences used in this study were obtained through the Sanger FTP website as it stood on in October 2003 (http://www.sanger.ac.uk/) two months after sequencing was frozen at its final nine-fold coverage. Data were organized into rough draft chromosomes and obtained in FASTA format. T. cruzi sequences were obtained in November 2003 through the TIGR public FTP website (http://www.tigr.org/). The T. cruzi sequences were obtained as whole genome shotgun reads in FASTA format.
Searching for SL RNA sequences
A perl
General overview
For the purposes of this paper, a ‘site’ is defined as a nucleotide position or range of contiguous nucleotide positions in an SL RNA that is variable (i.e., SNPs and indels) when compared to any other copy found in the same array. ‘Sequence classes’ are SL RNA with unique genotype combinations for the collection of sites. Sites are numbered from zero to n − 1 where n equals the number of sites analyzed by the χ2 analysis program (e.g., 0, 1, 2, 3, 4, 5, …, n − 1).
All L. major SL RNA sequences were
Discussion
In this study SL RNA sequences from L. major and T. cruzi were identified from genome project data and examined to determine if the use of the SL RNA repeat as a marker for kinetoplastids is affected by sequence variation within the array. The presence of variability within SL RNA arrays was found to be concentrated at particular sites, and confirmed that SL RNA is a reliable marker for kinetoplastid classification. Some linkage was observed for mutations in close proximity to one another in
Acknowledgements
We thank Bob Hitchcock, Jesse Zamudio, and Gusti Zeiner for critical reading of the manuscript; Peter Myler and Najib El-Sayed for personal communications and helpful discussions; and the authors of ActiveState Perl and www.perlmonks.org for creating valuable tools and references. Sequence data for L. major was obtained from The Sanger Institute website at http://www.sanger.ac.uk/. Sequencing of L. major was accomplished by the Leishmania Genome Network with support by The Wellcome Trust.
References (29)
- et al.
Spliced leader RNA sequences of Trypanosoma rangeli are organized within the 5S rRNA-encoding genes
Gene
(1992) - et al.
Overexpression of miniexon gene decreases virulence of Leishmania major in BALB/c mice in vivo
Mol. Biochem. Parasitol.
(2000) - et al.
A phylogenetic analysis of the Trypanosoma cruzi genome project CL Brener reference strain by multilocus enzyme electrophoresis and multiprimer random amplified polymorphic DNA fingerprinting
Mol. Biochem. Parasitol.
(1998) - et al.
Transcription in kinetoplastid protozoa: why be normal?
Microbes Infect.
(2003) - et al.
Spliced leader RNA gene promoter sequence heterogeneity in CL-Brener Trypanosoma cruzi reference strain
Infect. Genet. Evol.
(2004) - et al.
Mini-exon gene variation in human pathogenic Leishmania species
Mol. Biochem. Parasitol.
(1994) - et al.
Structural alterations of chromosome 2 in Leishmania major as evidence for diploidy, including spontaneous amplification of the mini-exon array
Mol. Biochem. Parasitol.
(1989) - et al.
Developmental regulation of spliced leader RNA gene in Leishmania donovani amastigotes is mediated by specific polyadenylation
J. Biol. Chem.
(1999) - et al.
On the role of exon and intron sequences in trans-splicing utilization and cap 4 modification of the trypanosomatid Leptomonas collosoma SL RNA
J. Biol. Chem.
(2002) - et al.
Characterization of the spliced leader genes and transcripts in Trypanosoma cruzi
Gene
(1989)