Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Discovery of regulatory elements in vertebrates through comparative genomics

Abstract

We have analyzed issues of reliability in studies in which comparative genomic approaches have been applied to the discovery of regulatory elements at a genome-wide level in vertebrates. We point out some potential problems with such studies, including difficulties in accurately identifying orthologous promoter regions. Many of these subtle analytical problems have become apparent only when studying the more complex vertebrate genomes. By determining motif reliability, we compared existing tools when applied to the discovery of vertebrate regulatory elements. We then used a statistical clustering method to produce a computational catalog of high quality putative regulatory elements from vertebrates, some of which are widely conserved among vertebrates and many of which are novel regulatory elements. The results provide a glimpse into the wealth of information that comparative genomics can yield and suggest the need for further improvement of genome-wide comparative computational techniques.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Potential hazards in choosing orthologous promoter regions in vertebrates.
Figure 2: Performance of alignment tools.
Figure 3

Similar content being viewed by others

References

  1. Tagle, D. et al. Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus); nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203, 439–455 (1988).

    Article  CAS  PubMed  Google Scholar 

  2. Cliften, P. et al. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301, 71–76 (2003).

    Article  CAS  PubMed  Google Scholar 

  3. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E.S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003).

    CAS  PubMed  Google Scholar 

  4. Karlin, S. & Altschul, S.F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Dieterich, C., Wang, H., Rateitschak, K., Luz, H. & Vingron, M. CORG: a database for COmparative Regulatory Genomics. Nucleic Acids Res. 31, 55–57 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Elemento, O. & Tavazoie, S. Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach. Genome Biol. 6, R18 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Birney, E. et al. An overview of Ensembl. Genome Res. 14, 925–928 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Chenna, R. et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497–3500 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Bray, N. & Pachter, L. MAVID: Constrained ancestral alignment of multiple sequences. Genome Res. 14, 693–699 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Brudno, M. et al. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Morgenstern, B. DIALIGN 2: Improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218 (1999).

    Article  CAS  PubMed  Google Scholar 

  13. Blanchette, M. et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Blanchette, M. & Tompa, M. Footprinter: A program designed for phylogenetic footprinting. Nucleic Acids Res. 31, 3840–3842 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Fitch, W.M. Toward defining the course of evolution: Minimum change for a specified tree topology. Syst. Zool. 20, 406–416 (1971).

    Article  Google Scholar 

  16. Pollard, D.A., Bergman, C.M., Stoye, J., Celniker, S.E. & Eisen, M.B. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 5, 6 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Margulies, E., Blanchette, M., Haussler, D. & Green, E. NISC Comparative Sequencing Program, Haussler, D. & Green, E. Identification and characterization of multi-species conserved sequences. Genome Res. 13, 2507–2518 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Genome Biol. 432, 695–716 (2004).

  20. Zhang, B., Schmoyer, D., Kirov, S. & Snoddy, J. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics 5, 16 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Thiel, G., Sarraj, J.A. & Stefano, L. cAMP response element binding protein (CREB) activates transcription via two distinct genetic elements of the human glucose-6-phosphatase gene. BMC Mol. Biol. 6, 2 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Yamazaki, Y., Kubota, H., Nozaki, M. & Nagata, K. Transcriptional regulation of the cytosolic chaperonin θ subunit gene, Cctq, by Ets domain transcription factors Elk-1, Sap-1a, and Net in the absence of serum response factor. J. Biol. Chem. 278, 30642–30651 (2003).

    Article  CAS  PubMed  Google Scholar 

  23. Scholz, H. & Kirschner, K.M. A role for the Wilms' tumor protein WT1 in organ development. Physiology (Bethesda) 20, 54–59 (2005).

    CAS  Google Scholar 

  24. Schäffer, A.A. et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29, 2994–3005 (2001).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Olson, M.V. & Varki, A. Sequencing the chimpanzee genome: insights into human evolution and disease. Nat. Rev. Genet. 4, 20–28 (2003).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Mathieu Blanchette, Nan Li, Michal Linial, Larry Ruzzo, Saurabh Sinha, Zasha Weinberg, Zizhen Yao, the Ensembl Help Desk (in particular, Michael Schuster and Ewan Birney) and the anonymous reviewers for their contributions to this work. This material is based upon work supported in part by the National Science Foundation under grant DBI-0218798 and by the National Institutes of Health under grant R01 HG02602.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Tompa.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prakash, A., Tompa, M. Discovery of regulatory elements in vertebrates through comparative genomics. Nat Biotechnol 23, 1249–1256 (2005). https://doi.org/10.1038/nbt1140

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt1140

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing