Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Gene Index analysis of the human genome estimates approximately 120,000 genes

A Correction to this article was published on 01 December 2000

Abstract

Although sequencing of the human genome will soon be completed, gene identification and annotation remains a challenge. Early estimates suggested that there might be 60,000–100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140,000 (ref. 3) distinct genes. The Chromosome 22 Sequencing Consortium estimated a minimum of 45,000 genes based on their annotation of the complete chromosome, although their data suggests there may be additional genes4. The nearly 2,000,000 human ESTs in dbEST provide an important resource for gene identification and genome annotation, but these single-pass sequences must be carefully analysed to remove contaminating sequences, including those from genomic DNA, spurious transcription, and vector and bacterial sequences. We have developed a highly refined and rigorously tested protocol for cleaning, clustering and assembling EST sequences to produce high-fidelity consensus sequences for the represented genes (F.L. et al., manuscript submitted) and used this to create the TIGR Gene Indices5—databases of expressed genes for human, mouse, rat and other species (http://www.tigr.org/tdb/tgi.html). Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Fields, C., Adams, M.D., White, O. & Venter, J.C. How many genes in the human genome? Nature Genet. 7, 345–346 (1994).

    Article  CAS  Google Scholar 

  2. Green, P. Interpreting the genome. Presentation at Bridging the Gap Between Sequence and Function, Cold Spring Harbor Laboratory, NY, September, 1999.

    Google Scholar 

  3. Scott, R. The future in understanding the molecular basis of life. Presentation at Eleventh International Genome Sequencing and Analysis Conference, Miami, 1999.

  4. Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).

    Article  CAS  Google Scholar 

  5. Quackenbush, J., Liang, F., Holt, I., Pertea, G. & Upton, J. The TIGR Gene Indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res. 28, 141–145 (2000).

    Article  CAS  Google Scholar 

  6. Huang, X., Adams, M.D., Zhou, H. & Kerlavage, A.R. A tool for analyzing and annotating genomic sequence. Genomics 46, 37–45 (1997).

    Article  CAS  Google Scholar 

  7. Huang, X. & Madan, A. CAP3: a DNA sequence assembly program. Genome Res. 9, 868–877 (1999).

    Article  CAS  Google Scholar 

  8. Deloukas, P. et al. A physical map of 30,000 human genes. Science 282, 744–746 (1998).

    Article  CAS  Google Scholar 

  9. Schuler, G.D. Sequence mapping by electronic PCR. Genome Res 7, 541–550 (1997).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank the remaining members of the TIGR Gene Index Team, T. Hansen and J. Upton; A. Glodek for database development efforts; M. Heaney and S. Lo for database support; V. Sapiro, B. Lee, S. Gregory, R. Karamchedu, C. Irwin, L. Fu and E. Arnold for computer system support; and C. Ronning, R. Buell, J. White, and C.M. Fraser for thoughtful comments and suggestions. This work was supported by a grant from the U.S. Department of Energy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Quackenbush.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, F., Holt, I., Pertea, G. et al. Gene Index analysis of the human genome estimates approximately 120,000 genes. Nat Genet 25, 239–240 (2000). https://doi.org/10.1038/76126

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/76126

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing