Whole Genome Sequence Comparisons and “Full-Length” cDNA Sequences: A Combined Approach to Evaluate and Improve Arabidopsis Genome Annotation

  1. Vanina Castelli1,
  2. Jean-Marc Aury1,
  3. Olivier Jaillon1,
  4. Patrick Wincker1,
  5. Christian Clepet2,
  6. Manuella Menard1,
  7. Corinne Cruaud1,
  8. Francis Quétier1,
  9. Claude Scarpelli1,
  10. Vincent Schächter1,
  11. Gary Temple3,4,
  12. Michel Caboche2,
  13. Jean Weissenbach1, and
  14. Marcel Salanoubat1,5
  1. 1 Genoscope-Centre National de Séquençage and Centre National de la Recherche Scientifique Unité Mixte de Recherche-3080, 91000 Evry, France
  2. 2 Institut National de la Recherche Agronomique, Unité de Recherche en Génomique Végétale, 91000 Evry, France
  3. 3 Life Technologies, a Division of Invitrogen, Carlsbad, California 92008 USA

Abstract

To evaluate the existing annotation of the Arabidopsis genome further, we generated a collection of evolutionary conserved regions (ecores) between Arabidopsis and rice. The ecore analysis provides evidence that the gene catalog of Arabidopsis is not yet complete, and that a number of these annotations require re-examination. To improve the Arabidopsis genome annotation further, we used a novel “full-length” enriched cDNA collection prepared from several tissues. An additional 1931 genes were covered by new “full-length” cDNA sequences, raising the number of annotated genes with a corresponding “full-length” cDNA sequence to about 14,000. Detailed comparisons between these “full-length” cDNA sequences and annotated genes show that this resource is very helpful in determining the correct structure of genes, in particular, those not yet supported by “full-length” cDNAs. In addition, a total of 326 genomic regions not included previously in the Arabidopsis genome annotation were detected by this cDNA resource, providing clues for new gene discovery. Because, as expected, the two data sets only partially overlap, their combination produces very useful information for improving the Arabidopsis genome annotation.

Footnotes

  • [Supplemental material is available online at www.genome.org. The cDNA sequences have been released to the EMBL. The data produced during this analysis and accession nos. are available at http://www.genoscope.cns.fr/Arabidopsis/. The GSLT cDNA clones are available at Genoscope. The results can be visualized at http://www.genoscope.cns.fr/cgi-bin/ggb/ggb?source=Arabidopsis/.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1515604.

  • 4 Present address: Intronn, Inc. Gaithersburg, MD 20878, USA.

  • 5 Corresponding author. E-MAIL salanou{at}genoscope.cns.fr; FAX 33-01-60-87-25-14.

    • Accepted December 27, 2003.
    • Received May 7, 2003.
| Table of Contents

Preprint Server