A survey of metazoan selenocysteine insertion sequences
Introduction
In most living organisms, selenoprotein genes are interrupted by in-frame UGA codons—usually Stop codons—that are translated into the aminoacid selenocysteine. A detailed review of the mechanisms in different species is presented by Krol in this issue 〚1〛. Selenoprotein mRNAs contain a conserved hairpin structure that is required for the distinction of UGA Stop from UGA selenocysteine codons. This structure, called Selenocysteine Insertion Sequence (SECIS), is sufficiently large and constrained that it can be used as a screen for the computational identification of selenoprotein genes. Identifying the RNA component of the gene is essential in this case, since the coding sequence is usually misinterpreted or incorrectly assigned by conventional gene annotation methods not taking into account inframe UGA codons. In animal selenoprotein genes, SECIS occurs in the 3′ untranslated region (UTR) of mRNA and its secondary structure has been characterized in detail 〚2〛, 〚3〛. Several bioinformatics studies have been conducted in the past years to identify SECIS element in transcribed sequences, successfully identifying several new selenoproteins in mammalian 〚4〛, 〚5〛 and Drosophila 〚6〛, 〚7〛 genomes.
SECIS comprises two nested helical regions of about 5 and 14 base pairs (Fig. 1). The largest helix begins with four non-canonical base pairs comprising a central 5′GA3′/5′GA3′ tandem, often flanked by homopyrimidine pairs on each side. The apical loop is characterized by the presence of 2–4 adenosines on the 5′ side followed, in some cases, by an extra stem of about three base pairs. Those SECIS elements deprived of apical stem are said “Form 1”, while the others are said “Form 2” 〚8〛, 〚9〛. Although constraining, the SECIS structure alone does not permit an unambiguous identification of selenoprotein genes in large sequence databases. Using an RNAMOT descriptor representing the constraints in Fig. 1, Lescure et al. 〚4〛 estimated the frequency of false positive hits to be three every 10 Mb. Kryukov et al. 〚5〛 used looser constraints and obtained about 650 hits per 10 Mb, before applying a free energy screening procedure that lowered the number of hits to about 15 per 10 Mb. Subsequent studies also combined a raw structure detection and additional criteria such as free energy or techniques for coding exon recognition 〚6〛. To date, use of additional screening parameters beside SECIS has been a requirement in all selenoprotein gene identification studies.
We recently introduced a new computational tool for the identification of RNA motifs that could constitute a more selective means to detect SECIS elements. This program, called ERPIN 〚10〛, is based on a position weight matrix or “profile” model, specially adapted to handle base-paired structures. Each single-stranded and helical element in an RNA molecule is represented by a profile, and profiles are instantiated onto database sequences using a dynamic programming algorithm. This approach requires an initial “training set” of the RNA sequences under study and offers several advantages: it does not require writing any descriptor, it is usually more specific than descriptor-based programs, and it provides an objective scoring of solutions based on their similarity to training set sequences and structures.
For this study, we built an initial alignment of 44 aligned SECIS sequences and used this training set and the ERPIN program to scan a database of eukaryotic transcripts and genomic sequences. The training set was iteratively enriched with homologous SECIS structures collected during five successive rounds of database searches. The final collection of 117 aligned SECIS elements is the largest available to date and should be helpful in the assessment of base and base pair constraints for structural or functional studies. In addition, we can now use this enhanced collection to identify new selenoprotein genes candidates. Some high-scoring candidates are provided here, based on both pure statistical criteria and/or the presence of homologous SECIS sequences in different species.
Section snippets
The Erpin program
The basic algorithm in ERPIN has been published elsewhere 〚10〛. We used version 2 of the program, which presents significant improvements in the handling of multi-helix motifs. ERPIN 1 used four basic elements for searches: helix, strand, hairpin and pairs of helices. Basic elements could be combined, but optimal matches were guaranteed only within each basic element. ERPIN 2.1 handles more complex elements by creating a set of “configurations” based on the gaps present in the training set. If
Confirmed SECIS
The initial training contained representative sequences from all animal selenoprotein SECIS elements, except for the newly discovered SelM SECIS 〚16〛, which presents a significant deviation from other animal SECIS (CCC instead of AAA in the 5′ apical loop) that would greatly reduce search specificity. A specific training set would be more appropriate for the detection of this particular element. The first two rounds of iterative search, performed against the HGI databases, yield 46 SECIS
Conclusion
We have introduced a computational screen for SECIS elements based on the ERPIN program, differing from previously published protocols by a new search algorithm and the introduction of a statistical evaluation of candidates. Potential SECIS are scored based on their resemblance to SECIS elements in a training set and this score S is converted into an E-value expressing the number of expected hits of same or higher score in a random database. The mean score for SECIS elements in the training set
Acknowledgements
We thank Dr. Alain Krol for critical reading of manuscript.
References (17)
- et al.
A. Krol, Novel selenoproteins identified in silico and in vivo based on an RNA structural tag
J. Biol. Chem.
(1999) - et al.
New mammalian selenocysteine-containing proteins identified with an algorithm that searches for selenocysteine insertion sequence elements
J. Biol. Chem.
(1999) - et al.
Selenium metabolism in Drosophila: selenoproteins, selenoprotein mRNA expression, fertility, and mortality
J. Biol. Chem.
(2001) - et al.
Direct RNA definition and identification from multiple sequence alignments using secondary structure profiles
J. Mol. Biol.
(2001) Evolutionarily different RNA motifs and RNA-protein complexes to achieve selenoprotein synthesis
Biochimie
(2002)- et al.
A novel RNA structural motif in the selenocysteine insertion element of eukaryotic selenoprotein mRNAs
RNA
(1996) - et al.
An essential non-Watson-Crick base pair motif in 3′ UTR to mediate selenoprotein translation
RNA
(1998) - et al.
In silico identification of novel selenoproteins in the Drosophila melanogaster genome
EMBO Rep.
(2001)
Cited by (18)
Coordination of deiodinase and thyroid hormone receptor expression during the larval to juvenile transition in sea bream (Sparus aurata, Linnaeus)
2010, General and Comparative EndocrinologyCitation Excerpt :Vertebrate deiodinases need a reducing co-factor for appropriate enzyme activity and contain a selenocysteine (Sec) residue in the active site that is fundamental for removal of iodine from THs (Bianco et al., 2002; Buettner et al., 2000; Köhrle, 2000; Kuiper et al., 2002, 2003, 2005). The Sec residue is encoded by UGA that in normal circumstances stops translation, but which in the context of a SElenoCysteine Insertion Sequence (SECIS), in the 3′UTR of deiodinase mRNAs, leads to insertion of a Sec residue in the transcribed deiodinase protein (Buettner et al., 1998; Fagegaltier et al., 2000; Kollmus et al., 1996; Lambert et al., 2002). In all vertebrates in which deiodinases have been studied three genes which encode three different enzymes have been found (Bres et al., 2006; Croteau et al., 1996, 1995; Davey et al., 1999; Hernandez et al., 1999; Klaren et al., 2005; Leonard et al., 2000; Orozco et al., 2002, 2003; Sanders et al., 1999; St. Germain et al., 1994; Sutija et al., 2003; Valverde et al., 1997).
A Method for Identification of Selenoprotein Genes in Archaeal Genomes
2009, Genomics, Proteomics and BioinformaticsCitation Excerpt :Furthermore, other tools can also be freely integrated to Asec-Prediction if they enable it achieving better prediction. For example, Lambert et al. reported that ERPIN is effective to detect SECIS elements (30). Thus, Asec-Prediction can be updated timely with much higher prediction accuracy.
A Dedicated Computational Approach for the Identification of Archaeal H/ACA sRNAs
2007, Methods in EnzymologyCitation Excerpt :In the first step (Fig. 15.1, step 1), H/ACA‐like motifs are detected by use of the profile‐based ERPIN program (Gautheret and Lambert, 2001). This program has been applied to the search of a wide range of RNA motifs (Lambert et al., 2002, 2004; Legendre et al., 2005). Once H/ACA‐like motifs are identified, their putative target(s) in rRNAs are searched (Fig. 15.1, step 2) by use of the descriptor‐based RNAMOT program.
Loss of selenoprotein N function causes disruption of muscle architecture in the zebrafish embryo
2007, Experimental Cell ResearchGene structure and tissue expression of human selenoprotein W, SEPW1, and identification of a retroprocessed pseudogene, SEPW1P
2003, Biochimica et Biophysica Acta - Gene Structure and ExpressionSECISearch3 and Seblastian: In-silico tools to predict SECIS elements and selenoproteins
2018, Methods in Molecular Biology