Assessing computational tools for the discovery of transcription factor binding sites

Tompa, Martin; Li, Nan; Bailey, Timothy L; Church, George M; De Moor, Bart; Eskin, Eleazar; Favorov, Alexander V; Frith, Martin C; Fu, Yutao; Kent, W James; Makeev, Vsevolod J; Mironov, Andrei A; Noble, William Stafford; Pavesi, Giulio; Pesole, Graziano; Régnier, Mireille; Simonis, Nicolas; Sinha, Saurabh; Thijs, Gert; van Helden, Jacques; Vandenbogaert, Mathias; Weng, Zhiping; Workman, Christopher; Ye, Chun; Zhu, Zhou

doi:10.1038/nbt1053

Analysis
Published: 01 January 2005

Assessing computational tools for the discovery of transcription factor binding sites

Martin Tompa^1,2,
Nan Li¹,
Timothy L Bailey³,
George M Church⁴,
Bart De Moor⁵,
Eleazar Eskin⁶,
Alexander V Favorov^7,8,
Martin C Frith⁹,
Yutao Fu⁹,
W James Kent¹⁰,
Vsevolod J Makeev^7,8,
Andrei A Mironov^7,11,
William Stafford Noble^1,2,
Giulio Pavesi¹²,
Graziano Pesole¹³,
Mireille Régnier¹⁴,
Nicolas Simonis¹⁵,
Saurabh Sinha¹⁶,
Gert Thijs⁵,
Jacques van Helden¹⁵,
Mathias Vandenbogaert¹⁴,
Zhiping Weng⁹,
Christopher Workman¹⁷,
Chun Ye¹⁸ &
…
Zhou Zhu⁴

Nature Biotechnology volume 23, pages 137–144 (2005)Cite this article

8895 Accesses
884 Citations
15 Altmetric
Metrics details

Abstract

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Representative statistics comparing the accuracy of the 13 tools assessed in this analysis.**

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Wenpin Hou & Zhicheng Ji

ReLo is a simple and rapid colocalization assay to identify and characterize direct protein–protein interactions

Article Open access 03 April 2024

Harpreet Kaur Salgania, Jutta Metz & Mandy Jeske

Gene trajectory inference for single-cell data by optimal transport metrics

Article 05 April 2024

Rihao Qu, Xiuyuan Cheng, … Yuval Kluger

References

Pevzner, P. & Sze, S.-H. Combinatorial approaches to finding subtle signals in DNA sequences. in Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ed. Altman, R. et al.). 269–278 (AAAI Press, Menlo Park, CA, 2000).
Google Scholar
Sinha, S. & Tompa, M. Performance comparison of algorithms for finding transcription factor binding sites. in 3^rd IEEE Symposium on Bioinformatics and Bioengineering (ed. Bourbakis, N.G.). 214–220 (IEEE Computer Society, New York, 2003).
Google Scholar
Burset, M. & Guigó, R. Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996).
Article CAS Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS Google Scholar
Reese, M.G. et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 10, 483–501 (2000).
Article CAS Google Scholar
Ashburner, M. A biologist's view of the Drosophila genome annotation assessment project. Genome Res. 10, 391–393 (2000).
Article CAS Google Scholar
Hughes, J.D., Estep, P.W., Tavazoie, S. & Church, G.M. Computational identification of cis-regulatory elements associated with functionally coherent groups of genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000).
Article CAS Google Scholar
Workman, C.T. & Stormo, G.D. ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. in Pacific Symposium on Biocomputing (ed. Altman, R., Dunker, A.K., Hunter, L. & Klein, T.E.). 467–478 (Stanford University, Stanford, CA, 2000).
Google Scholar
Hertz, G.Z. & Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999).
Article CAS Google Scholar
Frith, M.C., Hansen, U., Spouge, J.L. & Weng, Z. Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32, 189–200 (2004).
Article CAS Google Scholar
Ao, W., Gaudet, J., Kent, W.J., Muttumu, S. & Mango, S.E. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305, 1743–1746 (2004).
Article CAS Google Scholar
Bailey, T.L. & Elkan, C. The value of prior knowledge in discovering motifs with MEME. in Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology. 21–29 (AAAI Press, Menlo Park, CA, 1995).
Google Scholar
Eskin, E. & Pevzner, P. Finding composite regulatory patterns in DNA sequences. Bioinformatics (Supplement 1) 18, S354–S363 (2002).
Article Google Scholar
Thijs, G. et al. A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122 (2001).
Article CAS Google Scholar
van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
Article CAS Google Scholar
van Helden, J., Rios, A.F. & Collado-Vides, J. Discovering regulatory elements in noncoding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).
Article CAS Google Scholar
Régnier, M. & Denise, A. Rare events and conditional events on random strings. Discrete Math. Theor. Comput. Sci. 6, 191–214 (2004).
Google Scholar
Favorov, A.V., Gelfand, M.S., Gerasimova, A.V., Mironov, A.A. & Makeev, V.J. Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length and its validation on the ArcA binding sites. in Proceedings of BGRS 2004 (BGRS, Novosibirsk, 2004).
Google Scholar
Pavesi, G., Mereghetti, P., Mauri, G. & Pesole, G. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 32, W199–W203 (2004).
Article CAS Google Scholar
Sinha, S. & Tompa, M. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588 (2003).
Article CAS Google Scholar
Wingender, E., Dietze, P., Karas, H. & Knüppel, R. TRANSFAC: a Database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996).
Article CAS Google Scholar
Moult, J., Fidelis, K., Zemla, A. & Hubbard, T. Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins 53, 334–339 (2003).
Article CAS Google Scholar
Sinha, S., Blanchette, M. & Tompa, M. PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformat. 5, 170 (2004).
Article Google Scholar

Download references

Acknowledgements

We thank Mathieu Blanchette, Ari Frank, Phil Green, Susan Hewitt, S.N. Maheshwari, Larry Ruzzo, Terry Speed, Gary Stormo and the organizers and participants of the 2002 Bellairs Workshop on Computational Biology for their important contributions to this project. Martin Tompa and Nan Li were supported by National Science Foundation (NSF) grant DBI-0218798 and by National Institutes of Health (NIH) grant R01 HG02602. Alexander Favorov, Andrei Mironov and Vsevolod Makeev were supported by Howard Hughes Medical Institute grant 55000309, Ludwig Cancer Research Institute grant CRDF RBO-1268-MO-02, Russian Fund of Basic Research grant 04-07-90270 and support from the Russian Academy of Sciences Presidium Program in Molecular and Cellular Biology, project no. 10. Yutao Fu, Martin C. Frith and Zhiping Weng were supported by NSF grant DBI-0116574 and NIH NHGRI grant 1R01HG03110. Giulio Pavesi and Graziano Pesole were supported by the Italian Ministry of University and Scientific Research's Fondo Italiano per la Ricerca di Base project 'Bioinformatica per la Genomica e la Proteomica' and by Telethon. Nicolas Simonis and Jacques van Helden were supported by the European Communities grant QLRI-199-01333, by the Action de Recherches Concertées de la Communauté Française de Belgique and by the Government of the Brussels Region. Saurabh Sinha was supported by a Keck Foundation Fellowship. Gert Thijs and Bart De Moor were supported by Geconcerteerde Onderzoeks-Acties Mefisto-666 and Ambiorics, InterUniversity Attraction Pole V-22, and several funded projects of the Institut voor de aanmoediging van Innovatie door Wetenshap en Technologie in Vlaanderen, Fonds voor Wetenshappelijk Onderzoek, and European Union. Zhou Zhu is a Howard Hughes Medical Institute predoctoral fellow. Zhou Zhu and George Church were supported by the Department of Energy and the Lipper Foundation.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, 98195-2350, Washington, USA
Martin Tompa, Nan Li & William Stafford Noble
Department of Genome Sciences, Box 357730, University of Washington, Seattle, 98195-7730, Washington, USA
Martin Tompa & William Stafford Noble
Institute for Molecular Biosciences, University of Queensland, Brisbane, Australia
Timothy L Bailey
Department of Genetics and Lipper Center for Computational Genetics, Harvard Medical School, Boston, 02115, Massachusetts, USA
George M Church & Zhou Zhu
ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Leuven, B-3001, Belgium
Bart De Moor & Gert Thijs
Department of Computer Science and Engineering, University of California, San Diego, 92093, California, USA
Eleazar Eskin
State Scientific Centre 'GosNIIGenetica,' 1st Dorozhny pr. 1, Moscow, 117545, Russia
Alexander V Favorov, Vsevolod J Makeev & Andrei A Mironov
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilova 32, Moscow, 119991, Russia
Alexander V Favorov & Vsevolod J Makeev
Bioinformatics Program, Boston University, Boston, 02215, Massachusetts, USA
Martin C Frith, Yutao Fu & Zhiping Weng
Center for Biomolecular Science and Engineering, University of California, Santa Cruz, 95064, California, USA
W James Kent
Department of Bioengineering and Bioinformatics, Moscow State University, Lab. Bldg B, Vorobiovy Gory 1-33, Moscow, 119992, Russia
Andrei A Mironov
Department of Computer Science and Communication (D.I.Co), University of Milan, Milan, Italy
Giulio Pavesi
Department of Biomolecular Science and Biotechnology, University of Milan, Milan, Italy
Graziano Pesole
INRIA Rocquencourt, Domaine de Voluceau B.P. 105, Le Chesnay, 78153, France
Mireille Régnier & Mathias Vandenbogaert
SCMB-Université Libre de Bruxelles, Campus Plaine, CP 263, Boulevard du Triomphe, Bruxelles, 1050, Belgium
Nicolas Simonis & Jacques van Helden
Center for Studies in Physics and Biology, The Rockefeller University, New York, 10021, New York, USA
Saurabh Sinha
Department of Bioengineering, University of California, San Diego, 92093, California, USA
Christopher Workman
Bioinformatics Program, University of California, San Diego, 92093, California, USA
Chun Ye

Authors

Martin Tompa
View author publications
You can also search for this author in PubMed Google Scholar
Nan Li
View author publications
You can also search for this author in PubMed Google Scholar
Timothy L Bailey
View author publications
You can also search for this author in PubMed Google Scholar
George M Church
View author publications
You can also search for this author in PubMed Google Scholar
Bart De Moor
View author publications
You can also search for this author in PubMed Google Scholar
Eleazar Eskin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander V Favorov
View author publications
You can also search for this author in PubMed Google Scholar
Martin C Frith
View author publications
You can also search for this author in PubMed Google Scholar
Yutao Fu
View author publications
You can also search for this author in PubMed Google Scholar
W James Kent
View author publications
You can also search for this author in PubMed Google Scholar
Vsevolod J Makeev
View author publications
You can also search for this author in PubMed Google Scholar
Andrei A Mironov
View author publications
You can also search for this author in PubMed Google Scholar
William Stafford Noble
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Pavesi
View author publications
You can also search for this author in PubMed Google Scholar
Graziano Pesole
View author publications
You can also search for this author in PubMed Google Scholar
Mireille Régnier
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Simonis
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Gert Thijs
View author publications
You can also search for this author in PubMed Google Scholar
Jacques van Helden
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Vandenbogaert
View author publications
You can also search for this author in PubMed Google Scholar
Zhiping Weng
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Workman
View author publications
You can also search for this author in PubMed Google Scholar
Chun Ye
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Tompa.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tompa, M., Li, N., Bailey, T. et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23, 137–144 (2005). https://doi.org/10.1038/nbt1053

Download citation

Published: 01 January 2005
Issue Date: 01 January 2005
DOI: https://doi.org/10.1038/nbt1053

This article is cited by

A novel approach GRNTSTE to reconstruct gene regulatory interactions applied to a case study for rat pineal rhythm gene
- Zhenyu Liu
- Jing Gao
- Junjie Chen
Scientific Reports (2022)
Improving language model of human genome for DNA–protein binding prediction based on task-specific pre-training
- Hanyu Luo
- Wenyu Shan
- Lingyun Luo
Interdisciplinary Sciences: Computational Life Sciences (2022)
Fast and exact quantification of motif occurrences in biological sequences
- Mattia Prosperi
- Simone Marini
- Christina Boucher
BMC Bioinformatics (2021)
Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators
- Inge De Clercq
- Jan Van de Velde
- Klaas Vandepoele
Nature Plants (2021)
Genome-wide identification and expression analysis of the pear autophagy-related gene PbrATG8 and functional verification of PbrATG8c in Pyrus bretschneideri Rehd
- Xun Sun
- Bisheng Pan
- Shaoling Zhang
Planta (2021)

Assessing computational tools for the discovery of transcription factor binding sites

Abstract

Access options

Similar content being viewed by others

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

ReLo is a simple and rapid colocalization assay to identify and characterize direct protein–protein interactions

Gene trajectory inference for single-cell data by optimal transport metrics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Data

Supplementary Data

Supplementary Data

Rights and permissions

About this article

Cite this article

This article is cited by

A novel approach GRNTSTE to reconstruct gene regulatory interactions applied to a case study for rat pineal rhythm gene

Improving language model of human genome for DNA–protein binding prediction based on task-specific pre-training

Fast and exact quantification of motif occurrences in biological sequences

Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators

Genome-wide identification and expression analysis of the pear autophagy-related gene PbrATG8 and functional verification of PbrATG8c in Pyrus bretschneideri Rehd

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links