- Split View
-
Views
-
Cite
Cite
Frédéric Bringaud, Daniella C. Bartholomeu, Gaëlle Blandin, Arthur Delcher, Théo Baltz, Najib M. A. El-Sayed, Elodie Ghedin, The Trypanosoma cruzi L1Tc and NARTc Non-LTR Retrotransposons Show Relative Site Specificity for Insertion, Molecular Biology and Evolution, Volume 23, Issue 2, February 2006, Pages 411–420, https://doi.org/10.1093/molbev/msj046
- Share Icon Share
Abstract
The trypanosomatid protozoan Trypanosoma cruzi contains long autonomous (L1Tc) and short nonautonomous (NARTc) non–long terminal repeat retrotransposons. NARTc (0.25 kb) probably derived from L1Tc (4.9 kb) by 3′-deletion. It has been proposed that their apparent random distribution in the genome is related to the L1Tc-encoded apurinic/apyrimidinic endonuclease (APE) activity, which repairs modified residues. To address this question we used the T. cruzi (CL-Brener strain) genome data to analyze the distribution of all the L1Tc/NARTc elements present in contigs larger than 10 kb. This data set, which represents 0.91× sequence coverage of the haploid nuclear genome (∼55 Mb), contains 419 elements, including 112 full-length L1Tc elements (14 of which are potentially functional) and 84 full-length NARTc. Approximately half of the full-length elements are flanked by a target site duplication, most of them (87%) are 12 bp long. Statistical analyses of sequences flanking the full-length elements show the same highly conserved pattern upstream of both the L1Tc and NARTc retrotransposons. The two most conserved residues are a guanine and an adenine, which flank the site where first-strand cleavage is performed by the element-encoded endonuclease activity. This analysis clearly indicates that the L1Tc and NARTc elements display relative site specificity for insertion, which suggests that the APE activity is not responsible for first-strand cleavage of the target site.
Introduction
Retrotransposons are ubiquitous mobile genetic elements that transpose through an RNA intermediate and are found in the genome of most eukaryotes (Capy et al. 1998). They can be divided into two lineages that utilize completely different mechanisms of integration. Those elements with long terminal repeats (LTR), called LTR-retrotransposons, are similar both in structure and retrotransposition mechanism to retroviruses (Whitcomb and Hughes 1992). Those elements that lack LTR, called non-LTR retrotransposons or retroposons, use a simpler mechanism of transposition. The current model for transposition of non-LTR retrotransposons was proposed based on the analysis of the insect R2 element (Luan et al. 1993). This model predicts that an element-encoded endonuclease performs a single-strand nick of the target DNA, generating an exposed 3′-hydroxyl that serves as a primer for reverse transcription of the element's RNA. The complementary strand of the new DNA copy of the element is thus directly synthesized onto the chromosome by the element-encoded reverse transcriptase. The second single-strand nick is carried out on the other strand, a few base pairs downstream of the first nick, by the same element-encoded endonuclease, generating a primer for the second-strand synthesis of the retroelement. Consequently, the non-LTR retroelements are flanked by a direct repeat corresponding to the sequence between the two single-strand nicks performed by the element-encoded endonuclease, called target site duplication (TSD). They also have a variable length poly(A) or A-rich 3′-tail due to the involvement of an RNA intermediate.
Non-LTR retroelements are very diverse in structure, can insert into a wide variety of DNA target sequences, and have been divided into five groups depending on phylogenetic analyses (Eickbush and Malik 2002). Members of the R2 group integrate within very specific sequences, such as rDNA genes (R2 and R4) and the spliced leader (SL) RNA genes (NeSL-1, SLACS, CZAR, CRE1, and CRE2) (Craig 1997). The site specificity is due to the element-encoded integrase-like domain, which presents characteristics of restriction enzymes (Yang, Malik, and Eickbush 1999; Volff et al. 2001). In contrast, most of the non-LTR retroelements constituting the four other groups as exemplified by the human L1 element are considered to be randomly distributed in the genome. All these retroelements encode an endonuclease domain homologous to apurinic/apyrimidinic endonucleases (APE), not related to the integrase-like domain of the R2 group. However, the observed bias in the base composition at the L1 insertion site correlates with the relative sequence specificity of the L1-encoded APE-like domain, indicating that the distribution of these retroelements is not random (Feng et al. 1996; Jurka 1997; Cost and Boeke 1998; Tatout, Lavie, and Deragon 1998).
Trypanosomatids are unicellular protists including human pathogens responsible for Chagas' disease (Trypanosoma cruzi), African sleeping sickness (Trypanosoma brucei), and leishmaniasis (Leishmania ssp.). Recently, the genome sequence of these three trypanosomatid parasites has been completed (Berriman et al. 2005; El-Sayed et al. 2005; Ivens et al. 2005). The non-LTR retrotransposons constitute the most abundant mobile elements described in the genome of T. cruzi and T. brucei (∼3% of nuclear genome), while no potentially active mobile elements have been characterized so far in Leishmania major. The few T. cruzi CZAR (7.25 kb) and T. brucei SLACS (6.3 kb) are site-specific retroelements only found in the SL RNA genes (Aksoy et al. 1987; Villanueva et al. 1991). However, the most abundant non-LTR elements are L1Tc and NARTc in T. cruzi (Martin et al. 1995; Bringaud et al. 2002), ingi and RIME in T. brucei (Hasan, Turner, and Cordingley 1984; Kimmel, Ole-MoiYoi, and Young 1987; Murphy et al. 1987), with 320 (L1Tc), 133 (NARTc), 115 (ingi), and 86 (RIME) copies per haploid genomes (El-Sayed et al. 2005). In T. cruzi, the first 250 bp of the autonomous L1Tc (4.9 kb) and the nonautonomous NARTc (0.25 kb) elements share the first 78 residues and other conserved blocks (fig. 1A), suggesting that NARTc was derived from L1Tc by a 3′-deletion. Similarly, the nonautonomous T. brucei RIME (0.5 kb) appears as a truncated version of the autonomous T. brucei ingi (5.25 kb) by deletion of the central 4.7 kb fragment. The potentially functional L1Tc and ingi encode a large single protein (1,574 and 1,657 amino acids, respectively) responsible for their retrotransposition. This protein contains the central reverse transcriptase (Garcia-Perez et al. 2003) and RNAse H (Olivares et al. 2002) domains, a C-terminal DNA-binding domain (Pays and Murphy 1987), and an N-terminal APE-like domain (Olivares, Alonso, and Lopez 1997) (fig. 1B). In contrast, the nonautonomous NARTc and RIME elements presumably use the L1Tc- or ingi-encoded enzymatic activities for their own transposition, as previously shown for the nonautonomous human Alu and eel UnaSINE1 elements, which take advantage of the L1 and UnaL2 machinery, respectively (Kajikawa and Okada 2002; Dewannieux, Esnault, and Heidmann 2003).
The T. cruzi L1Tc/NARTc and T. brucei ingi/RIME were considered to be randomly distributed in the genome (S. Bhattacharya, Bakre, and A. Bhattacharya 2002). However, it has recently been observed that the T. brucei ingi and RIME elements display a relative site specificity for insertion (Bringaud et al. 2004). Here, we show that T. cruzi L1Tc and NARTc are inserted downstream of a highly conserved motif. According to the current model for retrotransposition of non-LTR retrotransposons, this conserved motif is probably the binding site of the L1Tc-encoded APE-like domain.
Materials and Methods
Database Mining
To identify all the L1Tc/NARTc elements in T. cruzi assembled genome segments larger than 10 kb (or contigs >10 kb), we used complementary Blast approaches. A BlastN search was performed on contigs >10 kb with the NARTc nucleotide sequence (250 bp), which shares with L1Tc the first 78 bp and the last 13 bp (plus the poly(A) tail) (fig. 1), to identify all NARTc and full-length or 3′-truncated L1Tc. To detect the 5′-truncated L1Tc and confirm the other L1Tc elements previously detected, a TBlastN search was performed with the full-length L1Tc protein sequence as query (1,574 amino acids).
Statistical Analysis
The consensus sequence located upstream of the trypanosome retroelements was also shown using a graphic representation called “sequence logo” (Schneider and Stephens 1990; Crooks et al. 2004). This analysis was performed on the same data set as for the chi-square test with the online program (http://weblogo.berkeley.edu/logo.cgi).
Results
Identification of All the L1Tc and NARTc in the T. cruzi Genome
The nuclear genome of T. cruzi (CL-Brener strain) was recently completed using a whole-genome shotgun strategy (El-Sayed et al. 2005). The current draft of the T. cruzi genome assembly at 15× sequence coverage consists of 784 scaffolds greater than 5 kb built by 3,954 contigs, totaling 60 Mb. The longest contig was 256 kb (the smallest is 215 bp) and the contig N50 size was only 25.8 kb (i.e., half of the nucleotides incorporated into contigs are in contigs greater than 25.8 kb) due to the repetitive and hybrid nature of the CL-Brener genome (El-Sayed et al. 2005). Because the L1Tc non-LTR retrotransposon is ∼4.9 kb long, contigs longer than 10 kb have been retained for this analysis. This data set is composed of 1,701 contigs, totaling 49.9 Mb (83% of the annotated data set) and representing approximately 0.91× sequence coverage of the haploid nuclear genome (∼55 Mb). To identify all the L1Tc (4.9 kb) and NARTc (0.25 kb) retroelements in this data set, we have used Blast approaches, as described in Materials and Methods. We identified 419 L1Tc and NARTc elements, including 24 truncated elements, which could be either L1Tc or NARTc because they only contain the 78-bp N-terminal conserved sequence (table 1). Among the 296 identified L1Tc elements, 59 are not complete due to their location at one extremity of the contigs. Approximately one half of the completely sequenced L1Tc are truncated at their 5′, 3′, or both extremities (118, 4, and 3 elements, respectively). Interestingly, most of the incomplete L1Tc are 5′-truncated, as observed for other non-LTR retrotransposons as a result of the low processivity of the element-encoded reverse transcriptase (George, Burke, and Eickbush 1996; Kazazian and Moran 1998). The other half is full-length L1Tc (112 elements). Although there is no certainty as to what represents a functional L1Tc element, we consider that the few elements which contain a single long open reading frame (ORF) encoding a 1,574-amino acids protein are probably functional (14 elements). Contigs larger than 10 kb also contain 98 NARTc elements, most of them being full length (84 elements), the other being 5′-truncated (7 elements), or 3′-truncated (7 elements). This data set contains an additional NARTc element truncated because of its location at the extreme end of the contig. The analyzed data set containing 419 elements represents 0.91× sequence coverage of the haploid nuclear genome, indicating that the T. cruzi haploid genome (1×) contains ∼460 non-LTR retrotransposons (∼345 L1Tc and ∼115 NARTc), including ∼15 L1Tc which potentially codes for functional retrotransposition enzymes. These data are consistent with our previous estimate based on the analysis of the T. cruzi genome survey sequences (GSS) database (286 L1Tc and 140 NARTc per haploid genome) (Bringaud et al. 2002). The genome contains three times more L1Tc than NARTc and about half of the elements are full length. It is noteworthy that the genome of the hybrid T. cruzi CL-Brener strain contains two distinct diploid haplotypes (El-Sayed et al. 2005), consequently, the number of non-LTR retrotransposons per nuclear genome would range between 1,000 and 1,500.
. | L1Tc . | NARTc . | UKb . | Total . |
---|---|---|---|---|
Full length | 112c | 84d | 196 | |
5′-truncated | 118e | 7f | 125 | |
3′-truncated | 4 | 7 | 24 | 35g |
5′- and 3′-truncated | 3 | 3 | ||
End of contigh | 59 | 1 | 60 | |
Total | 296 | 99 | 24 | 419 |
. | L1Tc . | NARTc . | UKb . | Total . |
---|---|---|---|---|
Full length | 112c | 84d | 196 | |
5′-truncated | 118e | 7f | 125 | |
3′-truncated | 4 | 7 | 24 | 35g |
5′- and 3′-truncated | 3 | 3 | ||
End of contigh | 59 | 1 | 60 | |
Total | 296 | 99 | 24 | 419 |
The data set analyzed covers 0.91-times the T. cruzi haploid genome.
This retroelement sequence could be either NARTc or L1Tc elements because they only contain the conserved first 78 bp.
Fourteen of them are potentially functional because they code for the full-length protein, and 31 are flanked by a TSD.
Sixty-two of them are flanked by a TSD.
Five of them are flanked by a TSD. The relative abundance of 5′-truncated L1Tc is related to the expansion of two groups of 31 and 55 elements lacking the first 178 and 3,560 residues, respectively.
All of them are flanked by a TSD.
None of them are flanked by a TSD.
Incomplete retroelements located at one extremity of a contig.
. | L1Tc . | NARTc . | UKb . | Total . |
---|---|---|---|---|
Full length | 112c | 84d | 196 | |
5′-truncated | 118e | 7f | 125 | |
3′-truncated | 4 | 7 | 24 | 35g |
5′- and 3′-truncated | 3 | 3 | ||
End of contigh | 59 | 1 | 60 | |
Total | 296 | 99 | 24 | 419 |
. | L1Tc . | NARTc . | UKb . | Total . |
---|---|---|---|---|
Full length | 112c | 84d | 196 | |
5′-truncated | 118e | 7f | 125 | |
3′-truncated | 4 | 7 | 24 | 35g |
5′- and 3′-truncated | 3 | 3 | ||
End of contigh | 59 | 1 | 60 | |
Total | 296 | 99 | 24 | 419 |
The data set analyzed covers 0.91-times the T. cruzi haploid genome.
This retroelement sequence could be either NARTc or L1Tc elements because they only contain the conserved first 78 bp.
Fourteen of them are potentially functional because they code for the full-length protein, and 31 are flanked by a TSD.
Sixty-two of them are flanked by a TSD.
Five of them are flanked by a TSD. The relative abundance of 5′-truncated L1Tc is related to the expansion of two groups of 31 and 55 elements lacking the first 178 and 3,560 residues, respectively.
All of them are flanked by a TSD.
None of them are flanked by a TSD.
Incomplete retroelements located at one extremity of a contig.
Most L1Tc and NARTc Are Flanked by a 12-bp Target Site Duplication
According to the current model for retrotransposition of non-LTR retrotransposons, the target-primed reverse transcription (TPRT) process is initiated by the element-encoded endonuclease, which performs a single-strand cleavage (Luan et al. 1993). The same endonuclease also cleaves the other strand, a few nucleotides downstream of the first cleavage site. Consequently, the sequence located between both single-strand cleavages is duplicated to form the target site duplication (TSD) flanking the newly retrotransposed element. To study the insertion site of the T. cruzi retroelements, we compared the 5′- and 3′-adjacent sequence of all full-length L1Tc (112 elements) and NARTc (84 elements) identified in contigs larger than 10 kb. We have also included in this analysis all full-length NARTc (24 elements) present in the other contigs (smaller than 10 kb). Based on their flanking regions (120 bp upstream and downstream of each element), these 220 full-length elements (112 L1Tc and 108 NARTc) form 124 groups of nearly identical sequences (fig. 2), eight groups contain both L1Tc and NARTc elements, suggesting that both elements are using the same retrotransposition machinery, as previously observed for the human L1/Alu elements (Dewannieux, Esnault, and Heidmann 2003) and the UnaL2/UnaSINE1 elements from eel (Kajikawa and Okada 2002). Among the 220 full-length elements analyzed, 117 are flanked by a TSD (fig. 2A). Interestingly, the TSD is composed of 12 bp for 105 L1Tc/NARTc (87%), and the other 12 elements (13%) show a 13-bp TSD. The same situation was observed for the T. brucei ingi/RIME non-LTR retrotransposons, that is, 34 of 36 analyzed TSD are 12 bp long (94.4%) and the two others are 11 bp long (5.6%) (Bringaud et al. 2004). As far as we know, the size conservation of the TSD (12 bp) is unique to trypanosome retroelements. Indeed, all of the other nonsite-specific non–LTR retrotransposons characterized so far have polymorphic flanking direct repeats, as exemplified by human L1, Alu, and rodent ID elements, whose sizes range between 4 and 26 bp (Jurka 1997). The size of the TSD primarily depends on the position of the second-strand cleavage. The mechanism of the second-strand cleavage at the downstream site is poorly understood, however, it is commonly accepted that the element-encoded endonuclease is responsible for the first and second single-strand nicks of the target DNA. Thus, it is tempting to propose that the conservation of the TSD size, resulting from the retrotransposition of the T. brucei and T. cruzi retroelements, is due to mechanistic properties shared by the ingi- and L1Tc-encoded endonuclease.
About half of the L1Tc/NARTc elements are truncated, most of them showing a 5′-deletion (∼80% of the truncated elements) (table 1). In humans, greater than 95% of L1 elements are variably 5′-truncated during retrotransposition by TPRT (Kazazian and Moran 1998). These 5′-truncated L1 elements, as well as full-length elements, are flanked by a TSD (Morrish et al. 2002). To determine whether the same feature is present in trypanosomatid non-LTR retrotransposons, we searched for TSD flanking all of the 3′- and/or 5′-truncated L1Tc/NARTc elements. None of the 3′-truncated elements have a TSD, however, all the 5′-truncated NARTc (7 elements) and a few 5′-truncated L1Tc (five out of the 118 elements) are flanked by a 12-bp TSD (11 elements) or 11-bp TSD (1 element) (table 1 and fig. 2D), as observed for the full-length elements.
Among the 220 full-length elements analyzed, 103 are not flanked by a TSD (fig 2B and C). However, 32 of these TSD-less elements are preceded by a sequence found upstream of L1Tc/NARTc flanked by a TSD (14.5% of the full-length elements) (fig. 2B), which are probably generated by homologous recombination between retroelements to generate chimeric retrotransposons flanked by unrelated regions. Homologous recombination events imply that an equivalent number of the TSD-less elements should share their 3′-flanking region with elements flanked by a TSD. This was indeed observed because 33 TSD-less elements are followed by a sequence found downstream of L1Tc/NARTc flanked by a TSD (15% of the full-length elements) (data not shown). Because the probability of homologous recombination events increases with the size of the homologous sequences, our hypothesis implies that the proportion of elements flanked by a TSD should be higher for the short NARTc elements (0.25 kb) than for the long L1Tc elements (4.9 kb). Indeed, approximately three times more full-length NARTc are flanked by a TSD, as compared to the full-length L1Tc (79.6% vs. 27.7%). Similarly, 100% of 5′-truncated NARTc and 4% of 5′-truncated L1Tc are flanked by a TSD (see table 1 and fig. 2). The relative low abundance of 5′-truncated L1Tc flanked by a TSD, as compared to full-length L1Tc flanked by a TSD (4% vs. 27.7%), is probably related to the expansion of two groups of 31 and 55 elements lacking the first 178 and 3,560 residues, respectively, which are located in the same genomic environment (data not shown). If we consider that these 31 and 55 related elements result from duplication and are representative of two different 5′-truncated L1Tc, the proportion of L1Tc flanked by a TSD is in the same range for 5′-truncated and full-length elements (16% vs. 27.7%). We previously observed the same features with the T. brucei ingi/RIME elements, that is, one-tenth of the T. brucei full-length elements are flanked by a TSD and an unknown sequence, and 85% and 48% of RIME and ingi, respectively, are flanked by a TSD (Bringaud et al. 2004).
L1Tc and NARTc Are Preceded by a Conserved Motif
To further study the insertion site of the T. cruzi retroelements, we determined the conservation of nucleotides flanking the full-length L1Tc and NARTc elements flanked by a TSD. For this analysis, only a single sequence of each group of nearly identical sequences (67 groups) has been considered (fig. 2A). The sequence downstream of the retroelements does not show a conserved pattern. However, a well-conserved motif is located in the vicinity of the first-strand cleavage (fig. 3A). The most conserved residues are a guanine and an adenine (91% and 99% of conservation, respectively), that flank the first-strand cleavage (positions −12 and −13 upstream of the retroelements). In addition, nine other residues between positions −14 and −31 show more than 50% of conservation. To determine whether the consensus pattern present upstream of the L1Tc/NARTc retroelements is statistically significant, we performed a chi-square test on the same data set (fig. 4A). This analysis clearly demonstrates that a conserved motif (GAxxAxGaxxxxxtxTATG↑Axxxxxxxxxxx; the arrow indicates the first-strand cleavage site) precedes both the NARTc and L1Tc retrotranposons. The presence of the same conserved pattern upstream of both L1Tc (fig. 4B) and NARTc (fig. 4C) confirms that both elements use the same machinery for their retrotransposition.
Approximately half of the T. cruzi L1Tc/NARTc elements are not flanked by a TSD (103 out of 220) (fig. 2). As discussed above, about one-third of these TSD-less elements are preceded by a sequence found upstream of L1Tc/NARTc flanked by a TSD (fig. 2B) and may result from homologous recombination. The other 71 sequences are flanked by unrelated and unknown sequences (fig. 2C). However, as observed for the retroelements flanked by TSD, the latter group of elements representing approximately one-third of all the full-length L1Tc/NARTc is preceded by the same conserved pattern (fig. 3B). This indicates that most, if not all, NARTc and L1Tc elements are preceded by the conserved motif described above. Similarly, among the 76 analyzed full-length T. brucei elements, 40 (52.6%) are TSD-less and most of them show the upstream conserved sequence (Bringaud et al. 2004). Altogether, these observations demonstrate that the T. cruzi, as well as the T. brucei, non-LTR retrotransposons are not randomly distributed in the genome as previously proposed, but instead show a relative site specificity probably dictated by the retroelement-encoded endonuclease.
Discussion
The non-LTR retrotransposons, L1Tc and NARTc, of the protozoan parasite T. cruzi were thought to be randomly distributed in the nuclear genome. Here we show that these elements present a relative insertion site specificity. Indeed, the 220 full-length L1Tc/NARTc elements identified in the sequenced T. cruzi genome (CL-Brener strain) are preceded by a conserved pattern (GAxxAxGaxxxxxtxTATG↑Axxxxxxxxxxx), which may be the binding site of the element-encoded endonuclease. According to the current model, the retroelement-encoded endonuclease domain dictates whether the site of insertion of non-LTR retrotransposons is specific or not (Luan et al. 1993). Most of the nonsite-specific non–LTR retrotransposons contain an APE-like domain, which is thought to determine the site of retroelement insertion. Olivares et al. (2003) showed that the T. cruzi L1Tc APE-like domain contains an APE activity (Olivares, Alonso, and Lopez 1997). Furthermore, overexpression of the L1Tc APE-like domain protects T. cruzi against DNA damaging stresses (Olivares et al. 2003), indicating that the L1Tc-encoded APE-like domain is active in vivo and may have a protective role. Consequently, it has been proposed that this APE-repair activity could act as a signal for new retrotransposition events. APE recognizes modified purine and pyrimidine residues, which are randomly generated in the genome. After excision of the damaged DNA base by APE, the DNA repair machinery replaces the excised residue by the equivalent unmodified nucleotide (Mol, Hosfield, and Tainer 2000). This implies that, if the APE activity determines the site of insertion, the T. cruzi non-LTR retrotransposons should be randomly distributed and flanked by nonconserved sequences. This hypothesis is not in agreement with the relative site specificity of insertion we observed for most, if not all, T. cruzi retroelements. For example, the first-strand cleavage is performed between two highly conserved G and A residues (91% and 99% of conservation, respectively), which is not compatible with insertion at apurinic/apyrimidinic sites via APE-mediated repair. Furthermore, it has been observed that the human L1-encoded APE-like domain shows no preference for apurinic/apyrimidinic sites and preferentially cleaves unmodified DNA molecules within AT-rich sequences (Feng et al. 1996). This preferred experimentally defined integration site is similar to the target consensus sequence located upstream of the Alu elements, which use the L1 machinery for retrotransposition (Jurka 1997; Dewannieux, Esnault, and Heidmann 2003). Altogether this indicates that, as observed for the L1 and Alu elements, the apurinic/apyrimidinic sites are not the preferred insertion site of the T. cruzi L1Tc and NARTc elements.
To tentatively explain the presence of a consensus sequence usptream of the L1Tc and NARTc retroelements, two alternative hypotheses should be considered. First, it has been shown that mobilization of the human L1 elements can be mediated by endonuclease-independent retrotransposition to repair double-strand break DNA (Morrish et al. 2002). This hypothesis cannot be retained because endonuclease-independent retrotransposition generates retroelements, which lack the TSD and are not preceded by a conserved motif. Second, it has also been proposed that the T. brucei ingi element encodes multiple endonuclease functions, including the N-terminal APE domain and a C-terminal domain homologous to integrases and histidine-asparagine-histidine endonucleases (McClure, Donaldson, and Corro 2002). However, this endonuclease-like C-terminal domain is not present in the L1Tc element. Consequently, the L1Tc element encodes a single endonuclease domain, the N-terminal APE-like domain, known to be responsible for the target site recognition in other non-LTR retrotransposons.
Trypanosoma cruzi and T. brucei non-LTR retrotransposon pairs (L1Tc/NARTc and ingi/RIME, respectively), composed of long autonomous (L1Tc and ingi) and short nonautonomous (NARTc and RIME) elements, share many characteristics (Bringaud et al. 2004), such as (1) equivalent copy number per haploid genome (in the same range), (2) conservation of the 5′-extremity (78 bp for T. cruzi and 250 bp for T. brucei), (3) conservation of the TSD size (12 bp), (4) equivalent proportion of TSD-less elements, (5) a relative site specificity for insertion, and (6) the autonomous and nonautonomous elements are preceded by the same conserved motif, suggesting that they share the same retrotransposition machinery. However, there is a major difference between the consensus sequence observed upstream of T. cruzi (GAxxAxGaxxxxxtxTATG↑Axxxxxxxxxxx) and T. brucei (AxxxxxxxTtgxGTxGGxTxxx↑tTxTxxTxxxxxx) non-LTR retrotransposons (fig. 5). This target site difference is probably related to the divergent L1Tc and ingi APE-like domains, which are only 23.8 % identical.
Pierce Capy, Associate Editor
We thank Bill Wickstead for sharing informations. F.B. and T.B. were supported by the Centre National de Recherche Scientifique, the Conseil Régional d'Aquitaine, and the Ministère de l'Education Nationale de la Recherche et de la Technologie. N.M.A.E.S. and colleagues were supported by National Institutes of Health grants AI43062 and AI45038.
References
Aksoy, S., T. M. Lalor, J. Martin, L. H. Van der Ploeg, and F. F. Richards.
Bhattacharya, S., A. Bakre, and A. Bhattacharya.
Berriman, M., E. Ghedin, C. Hertz-Fowler et al. (90 co-authors).
Bringaud, F., N. Biteau, E. Zuiderwijk, M. Berriman, N. M. El-Sayed, E. Ghedin, S. E. Melville, N. Hall, and T. Baltz.
Bringaud, F., J. L. García-Pérez, S. R. Heras, E. Ghedin, N. M. El-Sayed, B. Andersson, T. Baltz, and M. C. Lopez.
Capy, P., C. Bazin, D. Higuet, and T. Langin.
Cost, G. J., and J. D. Boeke.
Crooks, G. E., G. Hon, J. M. Chandonia, and S. E. Brenner.
Dewannieux, M., C. Esnault, and T. Heidmann.
Eickbush, T. H., and H. S. Malik.
El-Sayed, N. M. A., P. Myler, D. C. Bartholomeu et al. (76 co-authors).
Feng, Q., J. V. Moran, H. H. Kazazian, and J. D. Boeke.
Garcia-Perez, J. L., C. I. Gonzalez, M. C. Thomas, M. Olivares, and M. C. Lopez.
George, J. A., W. D. Burke, and T. H. Eickbush.
Hasan, G., M. J. Turner, and J. S. Cordingley.
Ivens, A. C., C. Peacock, E. A. Worthey et al. (102 co-authors).
Jurka, J.
Kajikawa, M., and N. Okada.
Kazazian, H. H. Jr., and J. V. Moran.
Kimmel, B. E., O. K. Ole-MoiYoi, and J. R. Young.
Luan, D. D., M. H. Korman, J. L. Jakubczak, and T. H. Eickbush.
Martin, F., C. Maranon, M. Olivares, C. Alonso, and M. C. Lopez.
McClure, M. A., E. Donaldson, and S. Corro.
Mol, C. D., D. J. Hosfield, and J. A. Tainer.
Morrish, T. A., N. Gilbert, J. S. Myers, B. J. Vincent, T. D. Stamato, G. E. Taccioli, M. A. Batzer, and J. V. Moran.
Murphy, N. B., A. Pays, P. Tebabi, H. Coquelet, M. Guyaux, M. Steinert, and E. Pays.
Olivares, M., C. Alonso, and M. C. Lopez.
Olivares, M., J. L. Garcia-Perez, M. C. Thomas, S. R. Heras, and M. C. Lopez.
Olivares, M., M. C. Lopez, J. L. Garcia-Perez, P. Briones, M. Pulgar, and M. C. Thomas.
Olivares, M., M. C. Thomas, A. Lopez-Barajas, J. M. Requena, J. L. Garcia-Perez, S. Angel, C. Alonso, and M. C. Lopez.
Pays, E., and N. B. Murphy.
Schneider, T. D., and R. M. Stephens.
Tatout, C., L. Lavie, and J. M. Deragon.
Villanueva, M. S., S. P. Williams, C. B. Beard, F. F. Richards, and S. Aksoy.
Volff, J. N., C. Korting, A. Froschauer, K. Sweeney, and M. Schartl.
Whitcomb, J. M., and S. H. Hughes.
Author notes
*Laboratoire de Génomique Fonctionnelle des Trypanosomatides, UMR-5162 Centre National de la Recherche Scientifique, Université Victor Segalen Bordeaux 2, Bordeaux Cedex, France; †The Institute for Genomic Research, Rockville; and ‡Department of Microbiology and Tropical Medicine, George Washington University