Massively parallel sequencing of the polyadenylated transcriptome of C. elegans

  1. LaDeana W. Hillier1,
  2. Valerie Reinke2,
  3. Philip Green1,3,
  4. Martin Hirst4,
  5. Marco A. Marra4 and
  6. Robert H. Waterston1,5
  1. 1 Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195-5065, USA;
  2. 2 Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520-8005, USA;
  3. 3 Howard Hughes Medical Institute, Seattle, Washington 98195, USA;
  4. 4 British Columbia Cancer Agency (BCCA), Genome Sciences Centre, Vancouver, British Columbia V5Z 456, Canada

    Abstract

    Using massively parallel sequencing by synthesis methods, we have surveyed the polyA+ transcripts from four stages of the nematode Caenorhabditis elegans to an unprecedented depth. Using novel statistical approaches, we evaluated the coverage of annotated features of the genome and of candidate processed transcripts, including splice junctions, trans-spliced leader sequences, and polyadenylation tracts. The data provide experimental support for >85% of the annotated protein-coding transcripts in WormBase (WS170) and confirm additional details of processing. For example, the total number of confirmed splice junctions was raised from 70,911 to over 98,000. The data also suggest thousands of modifications to WormBase annotations and identify new spliced junctions and genes not part of any WormBase annotation, including at least 80 putative genes not found in any of three predicted gene sets. The quantitative nature of the data also suggests that mRNA levels may be measured by this approach with unparalleled precision. Although most sequences align with protein-coding genes, a small fraction falls in introns and intergenic regions. One notable region on the X chromosome encodes a noncoding transcript of >10 kb localized to somatic nuclei.

    Footnotes

    • 5 Corresponding author.

      E-mail waterston{at}gs.washington.edu; fax (206) 685-7301.

    • [Supplemental material is available online at www.genome.org. These short-read sequence data have been submitted to the NCBI Short Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi) under accession no. SRA003622.7. Alignments, confirmed sequence features, and relevant data have been submitted to the modENCODE Data Coordinating Center/WormBase.]

    • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.088112.108.

      • Received October 21, 2008.
      • Accepted January 16, 2009.

    Related Article

    | Table of Contents

    Preprint Server