Connecting Sequence and Biology in the Laboratory Mouse

  1. Richard M. Baldarelli1,7,
  2. David P. Hill1,
  3. Judith A. Blake1,6,
  4. Jun Adachi3,
  5. Masaaki Furuno3,
  6. Dirck Bradt1,
  7. Lori E. Corbani1,
  8. Sharon Cousins1,
  9. Kenneth S. Frazer1,4,
  10. Dong Qi1,
  11. Longlong Yang1,5,
  12. Sridhar Ramachandran1,
  13. Deborah Reed1,
  14. Yunxia Zhu1,
  15. Takeya Kasukawa3,
  16. Martin Ringwald1,6,
  17. Benjamin L. King1,
  18. Lois J. Maltais1,
  19. Louise M. McKenzie1,
  20. Lynn M. Schriml2,
  21. Donna Maglott2,
  22. Deanna M. Church2,
  23. Kim Pruitt2,
  24. Janan T. Eppig1,6,
  25. Joel E. Richardson1,6,
  26. Jim A. Kadin1,6, and
  27. Carol J. Bult1,6
  1. 1Mouse Genome Informatics Group, The Jackson Laboratory, Bar Harbor, Maine 04609, USA
  2. 2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
  3. 3Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
  4. 4Zebrafish Information Network (ZFIN), the Zebrafish International Resource Center, University of Oregon, Eugene, Oregon 97403, USA

Abstract

The Mouse Genome Sequencing Consortium and the RIKEN Genome Exploration Research grouphave generated large sets of sequence data representing the mouse genome and transcriptome, respectively. These data provide a valuable foundation for genomic research. The challenges for the informatics community are how to integrate these data with the ever-expanding knowledge about the roles of genes and gene products in biological processes, and how to provide useful views to the scientific community. Public resources, such as the National Center for Biotechnology Information (NCBI; http://www.ncbi.nih.gov), and model organism databases, such as the Mouse Genome Informatics database (MGI; http://www.informatics.jax.org), maintain the primary data and provide connections between sequence and biology. In this paper, we describe how the partnership of MGI and NCBI LocusLink contributes to the integration of sequence and biology, especially in the context of the large-scale genome and transcriptome data now available for the laboratory mouse. In particular, we describe the methods and results of integration of 60,770 FANTOM2 mouse cDNAs with gene records in the databases of MGI and LocusLink.

Footnotes

  • 8 The Mouse Genome Informatics group at The Jackson Laboratory is a consortium of multiple investigators who work cooperatively to provide a comprehensive information resource on the genetics, genomics, and biology of the laboratory mouse.

  • 9 Since that time, RefSeq has added two other accession types for mouse, NR_123456 for noncoding RNAs and NG_123456 for genomic segments (primarily for pseudogenes).

  • 10 We note that the total number of genes annotated by NCBI's computed annotation of the mouse assembly sequence (Build 2) is 36,976.

  • 11 In 228 cases, RefSeq or SWISS-PROT sequences that correspond to single curated genes in LocusLink and MGI were associated with more than one gene model from the first Ensembl annotation of the assembly. These may represent instances of paralogs, and are being targeted for manual curation.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.991003.

  • 7 Corresponding author. E-MAIL rmb{at}informatics.jax.org; FAX (207) 288-6132.

  • 6 Mouse Genome Informatics Consortium Principal Investigator

  • 5 Present address: Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24060.

    • Accepted April 11, 2003.
    • Received December 3, 2002.
| Table of Contents

Preprint Server