The Drosophila Gene Collection: Identification of Putative Full-Length cDNAs for 70% of D. melanogaster Genes

  1. Mark Stapleton1,2,6,
  2. Guochun Liao1,3,
  3. Peter Brokstein1,3,
  4. Ling Hong1,3,
  5. Piero Carninci4,
  6. Toshiyuki Shiraki4,
  7. Yoshihide Hayashizaki4,
  8. Mark Champe1,2,
  9. Joanne Pacleb1,2,
  10. Ken Wan1,2,
  11. Charles Yu1,2,
  12. Joe Carlson1,2,
  13. Reed George1,2,
  14. Susan Celniker1,2, and
  15. Gerald M. Rubin1,3,5
  1. 1Berkeley Drosophila Genome Project, 2Genome Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; 3Department of Molecular and Cell Biology, University of California, Berkeley, California 94720-3200, USA; 4 Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan; 5Howard Hughes Medical Institute, University of California, Berkeley, California 94720, USA

Abstract

Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5′ expressed sequence tags (5′ expressed sequence tags [ESTs]from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to ∼40% of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remaining genes, we have generated an additional 157,835 5′ ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0–22-h embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70% of the predicted genes inDrosophila.

[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. BF485518-BF503517, BF503521-BF506780, BG631888-BG631996,BG633696-BG637540, BG640063-BG641469, BI141709-BI142246,BI161485-BI173971, BI212109-BI216987, BI227448-BI233322,BI234009-BI243989, BI351612-BI354228, BI354231-BI355901,BI355935-BI358751, BI361285-BI376197, BI481532-BI487261,BI563331-BI593695, BI604243-BI620155, BI620158-BI635012,BI635064-BI638027, and BI638030-BI642053. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: J. Pringle and M. Fuller.]

Footnotes

  • 6 Corresponding author.

  • E-MAIL staple{at}fruitfly.org; FAX (510) 486-6798.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.269102.

    • Received March 13, 2002.
    • Accepted June 12, 2002.
| Table of Contents

Preprint Server