Elsevier

Gene

Volume 352, 6 June 2005, Pages 100-108
Gene

Intragenomic spliced leader RNA array analysis of kinetoplastids reveals unexpected transcribed region diversity in Trypanosoma cruzi

https://doi.org/10.1016/j.gene.2005.04.002Get rights and content

Abstract

The spliced leader RNA gene (SL RNA) repeat is present in large multicopy arrays and has been used as a marker for the diversity of kinetoplastid protozoans. Intra-array variation could affect conclusions made using a randomly isolated repeat as a marker. We examined the Leishmania major (Friedlin) and Trypanosoma cruzi (CL Brener) genome projects for SL RNA repeat sequences in order to assess their homogeneity and the possible effects of sequence variation on taxonomic interpretation. Of the dozens of distinct sequence classes examined, no single copy would bias clustering analyses with regard to other closely related species or isolates. Six dimorphic sites within the T. cruzi transcribed region were found to be linked and are predicted to yield a heterogeneous SL RNA population. The variation that exists among the repeats paints a picture of the broad mechanisms of array maintenance and evolution where site-specific mutations in a single repeat may be spread throughout the array and recombined with existing repeats to create new sequence classes, all occurring under selective pressure to maintain or increase the fitness of the cell line in which these events occur.

Introduction

Comprised of free-living, commensal and parasitic organisms, members of the order Kinetoplastidae are found associated with a broad range of plants and animals. Among the many peculiarities of kinetoplastid biology is their reliance on a process known as trans-splicing, wherein a 39-nt spliced leader (SL) is transferred from the SL RNA to the 5′ end of every nucleus-encoded mRNA, allowing the generation of monocistronic mRNA from polycistronic precursors (Campbell et al., 2003). SL RNA are found as tandem repeats present on the order of one hundred copies per array (Aksoy et al., 1992).

Each SL RNA repeat contains a well-defined promoter, a transcribed region including an exon and an intron, a T-tract transcription terminator , and a non-transcribed intergenic spacer (Campbell et al., 2003). The 39-nt SL length and sequence is conserved among all but four characterized kinetoplastids. The intron is variable in both sequence and length but forms a consistent secondary structure including three stem-loops (Bruzik et al., 1988). The intergenic region is divergent in sequence content and size, and in some species contains an active 5S rRNA (Bruzik et al., 1988).

The SL RNA (a.k.a. mini-exon) repeat is a PCR marker for kinetoplastid diversity (Podlipaev et al., 2004, Westenberger et al., 2004). SL RNA makes an ideal target for preliminary analyses within the trypanosomatids by PCR as it is a multicopy gene with a highly conserved sequence. In general, PCR is performed on the query species using universal SL RNA primers based on the nearly invariant SL sequence (Murthy et al., 1992), from which products are typically cloned and sequenced. Minimal sequence variation among repeats within the array is assumed, and when a single repeat unit is sequenced the exon sequence underlying the amplification primers is obscured. When multiple or tandem repeats were analyzed, slight variations were documented in the SL RNA repeat sequence in Leishmania donovani (Fernandes et al., 1994) and Phytomonas (Dollet et al., 2001).

Since each cloned repeat represents approximately 1% of the SL RNA sequence information in a cell, the use of a single random sequence could skew cluster analysis in an environment of high intragenomic variability (Harris and Crandall, 2000). A straightforward way to evaluate this possibility is to analyze all repeats within an array.

In this study we obtained data generated by the Leishmania major Genome Project and the Trypanosoma cruzi Genome Project, collaborative efforts of The Sanger Institute, The Institute for Genome Research, Seattle Biomedical Research Institute, and the Karolinska Institute, to screen for L. major and T. cruzi SL RNA repeats. All repeats from L. major or T. cruzi arrays cluster within their species and discrete typing units (DTUs) (Tibayrenc, 1995), respectively, indicating that the use of SL RNA is a reliable marker for kinetoplastid diversity. Unexpectedly, a heterogeneous SL RNA population composed of two major sequence classes and a spectrum of intermediate classes were found in T. cruzi strain CL Brener.

Section snippets

Sources of sequence information

The L. major sequences used in this study were obtained through the Sanger FTP website as it stood on in October 2003 (http://www.sanger.ac.uk/) two months after sequencing was frozen at its final nine-fold coverage. Data were organized into rough draft chromosomes and obtained in FASTA format. T. cruzi sequences were obtained in November 2003 through the TIGR public FTP website (http://www.tigr.org/). The T. cruzi sequences were obtained as whole genome shotgun reads in FASTA format.

Searching for SL RNA sequences

A perl

General overview

For the purposes of this paper, a ‘site’ is defined as a nucleotide position or range of contiguous nucleotide positions in an SL RNA that is variable (i.e., SNPs and indels) when compared to any other copy found in the same array. ‘Sequence classes’ are SL RNA with unique genotype combinations for the collection of sites. Sites are numbered from zero to n  1 where n equals the number of sites analyzed by the χ2 analysis program (e.g., 0, 1, 2, 3, 4, 5, …, n  1).

All L. major SL RNA sequences were

Discussion

In this study SL RNA sequences from L. major and T. cruzi were identified from genome project data and examined to determine if the use of the SL RNA repeat as a marker for kinetoplastids is affected by sequence variation within the array. The presence of variability within SL RNA arrays was found to be concentrated at particular sites, and confirmed that SL RNA is a reliable marker for kinetoplastid classification. Some linkage was observed for mutations in close proximity to one another in

Acknowledgements

We thank Bob Hitchcock, Jesse Zamudio, and Gusti Zeiner for critical reading of the manuscript; Peter Myler and Najib El-Sayed for personal communications and helpful discussions; and the authors of ActiveState Perl and www.perlmonks.org for creating valuable tools and references. Sequence data for L. major was obtained from The Sanger Institute website at http://www.sanger.ac.uk/. Sequencing of L. major was accomplished by the Leishmania Genome Network with support by The Wellcome Trust.

References (29)

Cited by (0)

View full text