Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Tag-based approaches for transcriptome research and genome annotation

Abstract

With the increasing number of whole genome sequences available, genomic research has shifted toward the annotation of functional elements and transcribed regions. Thus, the related field of transcriptome research requires accurate methods for the profiling of genes that are not biased by known sequence information, and that also allow for the identification of promoter regions. Starting with serial analysis of gene expression (SAGE), methods making use of short sequencing tags have greatly contributed to transcriptome studies. Here we review recent developments in the use of short sequencing tags in expression profiling, gene discovery and genome annotation. These tags are obtained from the 5′ end of mRNAs, both terminal ends of mRNAs, or genomic regions. The 5′ end–specific tags, with their ability to identify transcripts along with their transcriptional start sites, will be of particular interest for gene network studies and may become one of the most important approaches in systems biology.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Use of sequencing tags in transcript identification.
Figure 2: Standard approach for the preparation of SAGE tags.
Figure 3: Principle of a CAGE library preparation as an example for 5′-end tag cloning.
Figure 4: Preparation of gene identification signature (GIS) libraries.

Similar content being viewed by others

References

  1. Ruan, Y., Le Ber, P., Ng, H.H. & Liu, E.T. Interrogating the transcriptome. Trends Biotechnol. 22, 23–30 (2004).

    Article  CAS  PubMed  Google Scholar 

  2. Gerhard, D.S. et al. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res. 14, 2121–2127 (2004).

    Article  PubMed  Google Scholar 

  3. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

    Article  PubMed  Google Scholar 

  4. Landry, J.R., Mager, D.L. & Wilhelm, B.T. Complex controls: the role of alternative promoters in mammalian genomes. Trends Genet. 19, 640–648 (2003).

    Article  CAS  PubMed  Google Scholar 

  5. Black, D.L. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 72, 291–336 (2003).

    Article  CAS  PubMed  Google Scholar 

  6. Garcia-Blanco, M.A., Baraniak, A.P. & Lasda, E.L. Alternative splicing in disease and therapy. Nat. Biotechnol. 22, 535–546 (2004).

    Article  CAS  PubMed  Google Scholar 

  7. Brasch, M.A., Hartley, J.L. & Vidal, M. ORFeome cloning and systems biology: standardized mass production of the parts from the parts-list. Genome Res. 14, 2001–2009 (2004).

    Article  CAS  PubMed  Google Scholar 

  8. Marshall, E. Getting the noise out of gene arrays. Science 306, 630–631 (2004).

    Article  CAS  PubMed  Google Scholar 

  9. Tan, P.K. et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 31, 5676–5684 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

    Article  CAS  PubMed  Google Scholar 

  11. Kampa, D. et al. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331–342 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

    Article  CAS  PubMed  Google Scholar 

  13. Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).

    Article  CAS  PubMed  Google Scholar 

  14. Brenner, S. et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18, 630–634 (2000).

    Article  CAS  PubMed  Google Scholar 

  15. Saha, S. et al. Using the transcriptome to annotate the genome. Nat. Biotechnol. 20, 508–512 (2002).

    Article  CAS  PubMed  Google Scholar 

  16. Matsumura, H. et al. SuperSAGE. Cell. Microbiol. 7, 11–18 (2005).

    Article  CAS  PubMed  Google Scholar 

  17. Matsumura, H. et al. Gene expression analysis of plant host-pathogen interactions by SuperSAGE. Proc. Natl. Acad. Sci. USA 100, 15718–15723 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Powell, J. SAGE. The serial analysis of gene expression. Methods Mol. Biol. 99, 297–319 (2000).

    CAS  PubMed  Google Scholar 

  19. Wang, S.M. SAGE: Current Technologies an Applications. (Horizon Bioscience, Norwich; 2005).

    Google Scholar 

  20. Lash, A.E. et al. SAGEmap: a public gene expression resource. Genome Res. 10, 1051–1060 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Man, M.Z., Wang, X. & Wang, Y. POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics 16, 953–959 (2000).

    Article  CAS  PubMed  Google Scholar 

  22. Wang, D.G. et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998).

    Article  CAS  PubMed  Google Scholar 

  23. Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).

    Article  CAS  PubMed  Google Scholar 

  24. Silva, A.P. et al. The impact of SNPs on the interpretation of SAGE and MPSS experimental data. Nucleic Acids Res. 32, 6104–6110 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Hwang, B.J., Muller, H.M. & Sternberg, P.W. Genome annotation by high-throughput 5′ RNA end determination. Proc. Natl. Acad. Sci. USA 101, 1650–1655 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Hashimoto, S. et al. 5′-end SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22, 1146–1149 (2004).

    Article  CAS  PubMed  Google Scholar 

  28. Wei, C.L. et al. 5′ long serial analysis of gene expression (LongSAGE) and 3′ LongSAGE for transcriptome characterization and genome annotation. Proc. Natl. Acad. Sci. USA 101, 11701–11706 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Schnoor, M. et al. Characterization of the synthetic compatible solute homoectoine as a potent PCR enhancer. Biochem. Biophys. Res. Commun. 322, 867–872 (2004).

    Article  CAS  PubMed  Google Scholar 

  30. Das, M., Harvey, I., Chu, L.L., Sinha, M. & Pelletier, J. Full-length cDNAs: more than just reaching the ends. Physiol. Genomics 6, 57–80 (2001).

    Article  CAS  PubMed  Google Scholar 

  31. Sugahara, Y. et al. Comparative evaluation of 5′-end-sequence quality of clones in CAP trapper and other full-length cDNA libraries. Gene 263, 93–102 (2001).

    Article  CAS  PubMed  Google Scholar 

  32. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kasai, Y. et al. 5′ SAGE: 5′-end Serial Analysis of Gene Expression database. Nucleic Acids Res. 33, D550–D552 (2005).

    Article  CAS  PubMed  Google Scholar 

  34. Tateno, Y., Saitou, N., Okubo, K., Sugawara, H. & Gojobori, T. DDBJ in collaboration with mass-sequencing teams on annotation. Nucleic Acids Res. 33, D25–D28 (2005).

    Article  CAS  PubMed  Google Scholar 

  35. Harbers, M. & Carninci, P. in SAGE: Current Technologies and Applications. (ed. S.M. Wang) 29–76 (Horizon Bioscience, Norwich; 2005).

    Google Scholar 

  36. Hieronymus, H. & Silver, P.A. A systems view of mRNP biology. Genes Dev. 18, 2845–2860 (2004).

    Article  CAS  PubMed  Google Scholar 

  37. Ideker, T., Galitski, T. & Hood, L. A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343–372 (2001).

    Article  CAS  PubMed  Google Scholar 

  38. Laub, M.T., McAdams, H.H., Feldblyum, T., Fraser, C.M. & Shapiro, L. Global analysis of the genetic network controlling a bacterial cell cycle. Science 290, 2144–2148 (2000).

    Article  CAS  PubMed  Google Scholar 

  39. Ideker, T. et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001).

    Article  CAS  PubMed  Google Scholar 

  40. Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).

    Article  CAS  PubMed  Google Scholar 

  41. Edgar, R., Domrachev, M. & Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Kodzius, R. et al. Absolute expression values for mouse transcripts: re-annotation of the READ expression database by the use of CAGE and EST sequence tags. FEBS Lett. 559, 22–26 (2004).

    Article  CAS  PubMed  Google Scholar 

  43. Carninci, P. et al. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 13, 1273–1289 (2003).

  44. Scheetz, T.E. et al. High-throughput gene discovery in the rat. Genome Res. 14, 733–741 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Pleasance, E.D., Marra, M.A. & Jones, S.J. Assessment of SAGE in transcript identification. Genome Res. 13, 1203–1215 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Ng, P. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat. Methods 2, 105–111 (2005).

    Article  CAS  PubMed  Google Scholar 

  47. Macevicz, S.C. US patent 6, 136,537 (2000).

    Google Scholar 

  48. Salditt-Georgieff, M., Harpold, M.M., Wilson, M.C. & Darnell, J.E., Jr. Large heterogeneous nuclear ribonucleic acid has three times as many 5′ caps as polyadenylic acid segments, and most caps do not enter polyribosomes. Mol. Cell. Biol. 1, 179–187 (1981).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Carninci, P. et al. Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res. 10, 1617–1630 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Wang, T.L. et al. Digital karyotyping. Proc. Natl. Acad. Sci. USA 99, 16156–16161 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Sabo, P.J. et al. Discovery of functional noncoding elements by digital analysis of chromatin structure. Proc. Natl. Acad. Sci. USA 101, 16837–16842 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kim, J., Bhinge, A.A., Morgan, X.C. & Iyer, V.R. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat. Methods 2, 47–53 (2004).

    Article  PubMed  Google Scholar 

  53. Shibata, Y. et al. Removal of polyA tails from full-length cDNA libraries for high-efficiency sequencing. Biotechniques 31, 1042, 1044, 1048–1049 (2001).

    Article  CAS  PubMed  Google Scholar 

  54. Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).

    Article  CAS  PubMed  Google Scholar 

  55. Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).

    Article  CAS  PubMed  Google Scholar 

  56. Maruyama, K. & Sugano, S. Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138, 171–174 (1994).

    Article  CAS  PubMed  Google Scholar 

  57. Edery, I., Chu, L.L., Sonenberg, N. & Pelletier, J. An efficient strategy to isolate full-length cDNAs based on an mRNA cap retention procedure (CAPture). Mol. Cell. Biol. 15, 3363–3371 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Theissen, H. et al. Cloning of the human cDNA for the U1 RNA-associated 70K protein. EMBO J. 5, 3209–3217 (1986).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Merenkova, I., Milne, E. & Jean-Baptiste, D. US patent 6, 136,537 (2000).

    Google Scholar 

Download references

Acknowledgements

We thank Y. Hayashizaki and T. Hayashi for their support and encouragement in the development and application of CAGE and GSC, and the entire RIKEN GSL and GSC as well as Dnaform teams, who helped to make the CAGE and GSC projects possible. In particular we are grateful to T. Shiraki, R. Kodzius, H. Nishiyori, M. Nakamura, Y. Kojima, H. Sato, T. Kawazu, K. Waki, S. Fukuda, S. Katayama and A. Hasegawa for their contribution to CAGE-related projects, and to Y. Shibata, S. Takaku and M. Suzuki for their contribution to GSC development. We also thank Y. Ruan for fruitful discussions and our collaboration within the FANTOM 3 project, and M. Dushay for critically reading the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Matthias Harbers or Piero Carninci.

Ethics declarations

Competing interests

M.H. is an employee of the company K.K. Dnaform.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harbers, M., Carninci, P. Tag-based approaches for transcriptome research and genome annotation. Nat Methods 2, 495–502 (2005). https://doi.org/10.1038/nmeth768

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth768

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing