A survey of metazoan selenocysteine insertion sequences

doi:10.1016/S0300-9084(02)01441-4

Biochimie

Volume 84, Issue 9, September 2002, Pages 953-959

https://doi.org/10.1016/S0300-9084(02)01441-4 Get rights and content

Abstract

The computational detection of novel selenoproteins in genomic sequences is usually achieved through identification of SECIS, a conserved secondary structure element found in the 3′ UTR of animal selenoprotein mRNAs. Previous studies have used “descriptors” specifying the number of base pairs and the conserved nucleotides in SECIS to identify this element. A major drawback of the “descriptor” approach is that the number of detections in current genomic or transcript databases largely exceeds the number of true selenoproteins. In this study, we use instead the ERPIN program to detect SECIS elements. ERPIN is based on a lod-score profile algorithm that uses a training-set of aligned RNA sequences as input. From an initial alignment of 44 animal SECIS sequences, we performed a series of iterative searches in which the training set was progressively enriched up to 117 confirmed SECIS elements, from a large collection of metazoan species. About 200 high-scoring candidates were also detected. We show that ERPIN scores for these candidates can be converted into expect values, thus enabling their statistical evaluation. The most interesting SECIS candidates are presented.

Introduction

In most living organisms, selenoprotein genes are interrupted by in-frame UGA codons—usually Stop codons—that are translated into the aminoacid selenocysteine. A detailed review of the mechanisms in different species is presented by Krol in this issue 〚1〛. Selenoprotein mRNAs contain a conserved hairpin structure that is required for the distinction of UGA Stop from UGA selenocysteine codons. This structure, called Selenocysteine Insertion Sequence (SECIS), is sufficiently large and constrained that it can be used as a screen for the computational identification of selenoprotein genes. Identifying the RNA component of the gene is essential in this case, since the coding sequence is usually misinterpreted or incorrectly assigned by conventional gene annotation methods not taking into account inframe UGA codons. In animal selenoprotein genes, SECIS occurs in the 3′ untranslated region (UTR) of mRNA and its secondary structure has been characterized in detail 〚2〛, 〚3〛. Several bioinformatics studies have been conducted in the past years to identify SECIS element in transcribed sequences, successfully identifying several new selenoproteins in mammalian 〚4〛, 〚5〛 and Drosophila 〚6〛, 〚7〛 genomes.

SECIS comprises two nested helical regions of about 5 and 14 base pairs (Fig. 1). The largest helix begins with four non-canonical base pairs comprising a central 5′GA3′/5′GA3′ tandem, often flanked by homopyrimidine pairs on each side. The apical loop is characterized by the presence of 2–4 adenosines on the 5′ side followed, in some cases, by an extra stem of about three base pairs. Those SECIS elements deprived of apical stem are said “Form 1”, while the others are said “Form 2” 〚8〛, 〚9〛. Although constraining, the SECIS structure alone does not permit an unambiguous identification of selenoprotein genes in large sequence databases. Using an RNAMOT descriptor representing the constraints in Fig. 1, Lescure et al. 〚4〛 estimated the frequency of false positive hits to be three every 10 Mb. Kryukov et al. 〚5〛 used looser constraints and obtained about 650 hits per 10 Mb, before applying a free energy screening procedure that lowered the number of hits to about 15 per 10 Mb. Subsequent studies also combined a raw structure detection and additional criteria such as free energy or techniques for coding exon recognition 〚6〛. To date, use of additional screening parameters beside SECIS has been a requirement in all selenoprotein gene identification studies.

We recently introduced a new computational tool for the identification of RNA motifs that could constitute a more selective means to detect SECIS elements. This program, called ERPIN 〚10〛, is based on a position weight matrix or “profile” model, specially adapted to handle base-paired structures. Each single-stranded and helical element in an RNA molecule is represented by a profile, and profiles are instantiated onto database sequences using a dynamic programming algorithm. This approach requires an initial “training set” of the RNA sequences under study and offers several advantages: it does not require writing any descriptor, it is usually more specific than descriptor-based programs, and it provides an objective scoring of solutions based on their similarity to training set sequences and structures.

For this study, we built an initial alignment of 44 aligned SECIS sequences and used this training set and the ERPIN program to scan a database of eukaryotic transcripts and genomic sequences. The training set was iteratively enriched with homologous SECIS structures collected during five successive rounds of database searches. The final collection of 117 aligned SECIS elements is the largest available to date and should be helpful in the assessment of base and base pair constraints for structural or functional studies. In addition, we can now use this enhanced collection to identify new selenoprotein genes candidates. Some high-scoring candidates are provided here, based on both pure statistical criteria and/or the presence of homologous SECIS sequences in different species.

Section snippets

The Erpin program

The basic algorithm in ERPIN has been published elsewhere 〚10〛. We used version 2 of the program, which presents significant improvements in the handling of multi-helix motifs. ERPIN 1 used four basic elements for searches: helix, strand, hairpin and pairs of helices. Basic elements could be combined, but optimal matches were guaranteed only within each basic element. ERPIN 2.1 handles more complex elements by creating a set of “configurations” based on the gaps present in the training set. If

Confirmed SECIS

The initial training contained representative sequences from all animal selenoprotein SECIS elements, except for the newly discovered SelM SECIS 〚16〛, which presents a significant deviation from other animal SECIS (CCC instead of AAA in the 5′ apical loop) that would greatly reduce search specificity. A specific training set would be more appropriate for the detection of this particular element. The first two rounds of iterative search, performed against the HGI databases, yield 46 SECIS

Conclusion

We have introduced a computational screen for SECIS elements based on the ERPIN program, differing from previously published protocols by a new search algorithm and the introduction of a statistical evaluation of candidates. Potential SECIS are scored based on their resemblance to SECIS elements in a training set and this score S is converted into an E-value expressing the number of expected hits of same or higher score in a random database. The mean score for SECIS elements in the training set

Acknowledgements

We thank Dr. Alain Krol for critical reading of manuscript.

References (17)

A. Lescure et al.
A. Krol, Novel selenoproteins identified in silico and in vivo based on an RNA structural tag
J. Biol. Chem.
(1999)
G.V. Kryukov et al.
New mammalian selenocysteine-containing proteins identified with an algorithm that searches for selenocysteine insertion sequence elements
J. Biol. Chem.
(1999)
F.J. Martin-Romero et al.
Selenium metabolism in Drosophila: selenoproteins, selenoprotein mRNA expression, fertility, and mortality
J. Biol. Chem.
(2001)
D. Gautheret et al.
Direct RNA definition and identification from multiple sequence alignments using secondary structure profiles
J. Mol. Biol.
(2001)
A. Krol
Evolutionarily different RNA motifs and RNA-protein complexes to achieve selenoprotein synthesis
Biochimie
(2002)
R. Walczak et al.
A novel RNA structural motif in the selenocysteine insertion element of eukaryotic selenoprotein mRNAs
RNA
(1996)
R. Walczak et al.
An essential non-Watson-Crick base pair motif in 3′ UTR to mediate selenoprotein translation
RNA
(1998)
S. Castellano et al.
In silico identification of novel selenoproteins in the Drosophila melanogaster genome
EMBO Rep.
(2001)

There are more references available in the full text version of this article.

Cited by (18)

Coordination of deiodinase and thyroid hormone receptor expression during the larval to juvenile transition in sea bream (Sparus aurata, Linnaeus)
2010, General and Comparative Endocrinology
Citation Excerpt :
Vertebrate deiodinases need a reducing co-factor for appropriate enzyme activity and contain a selenocysteine (Sec) residue in the active site that is fundamental for removal of iodine from THs (Bianco et al., 2002; Buettner et al., 2000; Köhrle, 2000; Kuiper et al., 2002, 2003, 2005). The Sec residue is encoded by UGA that in normal circumstances stops translation, but which in the context of a SElenoCysteine Insertion Sequence (SECIS), in the 3′UTR of deiodinase mRNAs, leads to insertion of a Sec residue in the transcribed deiodinase protein (Buettner et al., 1998; Fagegaltier et al., 2000; Kollmus et al., 1996; Lambert et al., 2002). In all vertebrates in which deiodinases have been studied three genes which encode three different enzymes have been found (Bres et al., 2006; Croteau et al., 1996, 1995; Davey et al., 1999; Hernandez et al., 1999; Klaren et al., 2005; Leonard et al., 2000; Orozco et al., 2002, 2003; Sanders et al., 1999; St. Germain et al., 1994; Sutija et al., 2003; Valverde et al., 1997).
To test the hypothesis that THs play an important role in the larval to juvenile transition in the marine teleost model, sea bream (Sparus auratus), key elements of the thyroid axis were analysed during development. Specific RT-PCR and Taqman quantitative RT-PCR were established and used to measure sea bream iodothyronine deiodinases and thyroid hormone receptor (TR) genes, respectively. Expression of deiodinases genes (D1 and D2) which encode enzymes producing T3, TRs and T4 levels start to increase at 20–30 days post-hatch (dph; beginning of metamorphosis), peak at about 45 dph (climax) and decline to early larval levels after 90–100 dph (end of metamorphosis) when fish are fully formed juveniles. The profile of these different TH elements during sea bream development is strikingly similar to that observed during the TH driven metamorphosis of flatfish and suggests that THs play an analogous role in the larval to juvenile transition in this species and probably also in other pelagic teleosts. However, the effect of T3 treatment on deiodinases and TR transcript abundance in sea bream is not as clear cut as in larval flatfish and tadpoles indicating divergence in the responsiveness of TH axis elements and highlighting the need for further studies of this axis during development of fish.
A Method for Identification of Selenoprotein Genes in Archaeal Genomes
2009, Genomics, Proteomics and Bioinformatics
Citation Excerpt :
Furthermore, other tools can also be freely integrated to Asec-Prediction if they enable it achieving better prediction. For example, Lambert et al. reported that ERPIN is effective to detect SECIS elements (30). Thus, Asec-Prediction can be updated timely with much higher prediction accuracy.
The genetic codon UGA has a dual function: serving as a terminator and encoding selenocysteine. However, most popular gene annotation programs only take it as a stop signal, resulting in misannotation or completely missing selenoprotein genes. We developed a computational method named Asec-Prediction that is specific for the prediction of archaeal selenoprotein genes. To evaluate its effectiveness, we first applied it to 14 archaeal genomes with previously known selenoprotein genes, and Asec-Prediction identified all reported selenoprotein genes without redundant results. When we applied it to 12 archaeal genomes that had not been researched for selenoprotein genes, Asec-Prediction detected a novel selenoprotein gene in Methanosarcina acetivorans. Further evidence was also collected to support that the predicted gene should be a real selenoprotein gene. The result shows that Asec-Prediction is effective for the prediction of archaeal selenoprotein genes.
A Dedicated Computational Approach for the Identification of Archaeal H/ACA sRNAs
2007, Methods in Enzymology
Citation Excerpt :
In the first step (Fig. 15.1, step 1), H/ACA‐like motifs are detected by use of the profile‐based ERPIN program (Gautheret and Lambert, 2001). This program has been applied to the search of a wide range of RNA motifs (Lambert et al., 2002, 2004; Legendre et al., 2005). Once H/ACA‐like motifs are identified, their putative target(s) in rRNAs are searched (Fig. 15.1, step 2) by use of the descriptor‐based RNAMOT program.
Whereas dedicated computational approaches have been developed for the search of C/D sRNAs and snoRNAs, as yet no dedicated computational approach has been developed for the search of archaeal H/ACA sRNAs. Here we describe a computational approach allowing a fast and selective identification of H/ACA sRNAs in archaeal genomes. It is easy to use, even for biologists having no special expertise in computational biology. This approach is a stepwise knowledge‐based approach, combining the search for common structural features of H/ACA motifs and the search for their putative target sequences. The first step is based on the ERPIN software. It depends on the establishment of a secondary structure‐based “profile.” We explain how this profile is built and how to use ERPIN to optimize the search for H/ACA motifs. Several examples of applications are given to illustrate how powerful the method is, its limits, and how the results can be evaluated. Then, the possible target rRNA sequences corresponding to the identified H/ACA motifs are searched by use of a descriptor‐based method (RNAMOT). The principles and the practical aspects of this method are also explained, and several examples are given here as well to help users in the interpretation of the results.
Loss of selenoprotein N function causes disruption of muscle architecture in the zebrafish embryo
2007, Experimental Cell Research
Mutations in the gene coding for selenoprotein N (SelN), a selenium containing protein of unknown function, cause different forms of congenital muscular dystrophy in humans. These muscular diseases are characterized by early onset of hypotonia which predominantly affect in axial muscles. We used zebrafish as a model system to understand the function of SelN in muscle formation during embryogenesis. Zebrafish SelN is highly homologous to its human counterpart and amino acids corresponding to the mutated positions in human muscle diseases are conserved in the zebrafish protein. The sepn1 gene is highly expressed in the somites and notochord during early development. Inhibition of the sepn1 gene by injection of antisense morpholinos does not alter the fate of the muscular tissue, but causes muscle architecture disorganization and greatly reduced motility. Ultrastructural analysis of the myotomes reveals defects in muscle sarcomeric organization and in myofibers attachment, as well as altered myoseptum integrity. These studies demonstrate the important role of SelN for muscle organization during early development. Moreover, alteration of myofibrils architecture and tendon-like structure in embryo deficient for SelN function provide new insights into the pathological mechanism of SelN-related myopathy.
Gene structure and tissue expression of human selenoprotein W, SEPW1, and identification of a retroprocessed pseudogene, SEPW1P
2003, Biochimica et Biophysica Acta - Gene Structure and Expression
We have determined that the human SEPW1 (selenoprotein W) gene maps to chromosome 19q13.3, spans approximately 6.3 kb and comprises six exons, in contrast to the previously published five exons. The gene lacks canonical TATA and CAAT boxes, but has numerous Sp1 consensus binding sites upstream of multiple transcription start sites. SEPW1 is expressed in all of the 22 tissues assayed, and shows highest expression in skeletal muscle and heart. Additionally, we have also identified a retroprocessed SEPW1 pseudogene, SEPW1P, which maps to chromosome 1p34–35.
SECISearch3 and Seblastian: In-silico tools to predict SECIS elements and selenoproteins
2018, Methods in Molecular Biology

View all citing articles on Scopus

View full text

A survey of metazoan selenocysteine insertion sequences

Abstract

Introduction

Section snippets

The Erpin program

Confirmed SECIS

Conclusion

Acknowledgements

J. Biol. Chem.

J. Biol. Chem.

J. Biol. Chem.

J. Mol. Biol.

Evolutionarily different RNA motifs and RNA-protein complexes to achieve selenoprotein synthesis

Biochimie

A novel RNA structural motif in the selenocysteine insertion element of eukaryotic selenoprotein mRNAs

RNA

An essential non-Watson-Crick base pair motif in 3′ UTR to mediate selenoprotein translation

RNA

In silico identification of novel selenoproteins in the Drosophila melanogaster genome

EMBO Rep.