Impact of Alternative Initiation, Splicing, and Termination on the Diversity of the mRNA Transcripts Encoded by the Mouse Transcriptome

  1. Mihaela Zavolan1,7,
  2. Shinji Kondo2,
  3. Christian Schönbach3,
  4. Jun Adachi2,
  5. David A. Hume4,
  6. RIKEN GER Group2,
  7. GSL Members5,6,
  8. Yoshihide Hayashizaki2,5, and
  9. Terry Gaasterland1
  1. 1Laboratory of Computational Genomics, The Rockefeller University, New York, New York 10021-6399, USA
  2. 2Laboratory for Genome Exploration Research Group, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
  3. 3Biomedical Knowledge Discovery Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
  4. 4ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane Q4072, Australia
  5. 5Genome Science Laboratory, RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan

Abstract

We analyzed the FANTOM2 clone set of 60,770 RIKEN full-length mouse cDNA sequences and 44,122 public mRNA sequences. We developed a new computational procedure to identify and classify the forms of splice variation evident in this data set and organized the results into a publicly accessible database that can be used for future expression array construction, structural genomics, and analyses of the mechanism and regulation of alternative splicing. Statistical analysis shows that at least 41% and possibly as much as 60% of multiexon genes in mouse have multiple splice forms. Of the transcription units with multiple splice forms, 49% contain transcripts in which the apparent use of an alternative transcription start (stop) is accompanied by alternative splicing of the initial (terminal) exon. This implies that alternative transcription may frequently induce alternative splicing. The fact that 73% of all exons with splice variation fall within the annotated coding region indicates that most splice variation is likely to affect the protein form. Finally, we compared the set of constitutive (present in all transcripts) exons with the set of cryptic (present only in some transcripts) exons and found statistically significant differences in their length distributions, the nucleotide distributions around their splice junctions, and the frequencies of occurrence of several short sequence motifs.

Footnotes

  • [Supplemental material is available online at www.genome.org.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1017303.

  • 7 Corresponding author. E-MAIL mihaela{at}genomes.rockefeller.edu; FAX (212) 327-7765.

  • 6 Takahiro Arakawa, Piero Carninci, and Jun Kawai.

    • Accepted February 25, 2003.
    • Received November 19, 2002.
| Table of Contents

Preprint Server