Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency

  1. PETER CLOTE1,
  2. FABRIZIO FERRÉ1,
  3. EVANGELOS KRANAKIS2, and
  4. DANNY KRIZANC3
  1. 1Department of Biology, Boston College, Chestnut Hill, Massachusetts 02467, USA
  2. 2School of Computer Science, Carleton University, Ottawa, Ontario, K1S 5B6, Canada
  3. 3Department of Mathematics and Computer Science, Wesleyan University, Middletown, Connecticut 06459, USA

Abstract

We present results of computer experiments that indicate that several RNAs for which the native state (minimum free energy secondary structure) is functionally important (type III hammerhead ribozymes, signal recognition particle RNAs, U2 small nucleolar spliceosomal RNAs, certain riboswitches, etc.) all have lower folding energy than random RNAs of the same length and dinucleotide frequency. Additionally, we find that whole mRNA as well as 5′-UTR, 3′-UTR, and cds regions of mRNA have folding energies comparable to that of random RNA, although there may be a statistically insignificant trace signal in 3′-UTR and cds regions. Various authors have used nucleotide (approximate) pattern matching and the computation of minimum free energy as filters to detect potential RNAs in ESTs and genomes. We introduce a new concept of the asymptotic Z-score and describe a fast, whole-genome scanning algorithm to compute asymptotic minimum free energy Z-scores of moving-window contents. Asymptotic Z-score computations offer another filter, to be used along with nucleotide pattern matching and minimum free energy computations, to detect potential functional RNAs in ESTs and genomic regions.

Keywords

Footnotes

  • 4 Zuker’s algorithm was first implemented in Zuker’s mfold, subsequently in Hofacker et al.’s Vienna RNA Package RNAfold, and most recently in Mathews and Turner’s RNAstructure.

  • 5 The Z-score of x (with respect to a histogram or probability distribution) is the number of standard deviation units to the left or right of the mean for the position where x lies, that is, (x − μ)/σ.

  • 6 Figures 12 and 13 of Rivas and Eddy (2000) are similar to some of the graphs presented in this paper; however, unlike our work, Rivas and Eddy (2000) use mononucleotide shuffles to produce random sequences. As previously observed in Workman and Krogh (1999) when computing Z-scores for minimum free energies of RNA, it is important to generate random sequences that preserve dinucleotide frequency of the given RNA. Our work presents a careful analysis of a large class of RNAs using the dinucleotide shuffling Algorithm 4.

  • 7 SECIS abbreviates “selenocysteine insertion sequence,” a small (30–45 nt) portion of the 3′-UTR that forms a stem–loop structure necessary for the UGA stop codon to be retranslated to allow selenocysteine incorporation.

  • 8 After completion of this paper, we learned of the more general Web server Shufflet (Coward 1999).

  • 9 The work of Workman and Krogh (1999) focuses on mRNA, and only at the end of their article do they consider a small collection of five tRNAs, where 100 random RNAs are generated per tRNA.

  • 10 Bonnet et al. (2004) compute p-values of minimum free energy not not p-values of Z-scores as done in this paper.

  • 11 Average Z-scores have value 0, while average asymptotic Z-scores are >0, making a greater contrast with negative scores of functional RNA in computational experiments.

  • 12 Z-score is often used as a statistical measure of deviation from the mean in units of standard deviation. See Materials and Methods for formal definition.

  • 13 By structural RNA, we mean naturally occurring classes of RNA whose functionality depends on the native state, where we identify the native state with the minimum free energy secondary structure if the structure is not experimentally determined.

  • 15 In this paper, we present a proof of concept. In work in progress, we are computing dinucleotide frequencies, within two decimal places, of viral and bacterial genomes and are computing tables necessary for a general application of our method, to be reported elsewhere.

  • 14 For a genome of length N, successive applications of Zuker’s algorithm to window contents of size L require time O(NL3). By re-using partial computations from previous window contents, Hofacker et al. (2004) describe an improvement to O(NL2).

  • 16 In order to obtain this last inequality, we needed Zs, t ≥ 0. This is the reason for working with Zs, t, rather than Xs, t.

  • Article and publication are at http://www.rnajournal.org/cgi/doi/10.1261/rna.7220505.

    • Accepted February 12, 2005.
    • Received October 28, 2004.
| Table of Contents