Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites

Abstract

The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile1 was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro–selected ligands using standard hidden Markov model training algorithms2,3. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX)4 and serial analysis of gene expression (SAGE)5 protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores6. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: CTF/NFI sequence-specific DNA–protein interaction profiles.
Figure 2: Use of a SELEX experiment with a SAGE-inspired multimerization step to construct a new CTF/NFI binding-site model.

Similar content being viewed by others

References

  1. Bucher, P., Karplus, K., Moeri, N. & Hofmann, K. A flexible motif search technique based on generalized profiles. Comput. Chem. 20, 3–29 (1996).

    Article  CAS  Google Scholar 

  2. Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Press, Cambridge, United Kingdom, 1998).

    Book  Google Scholar 

  3. Ehret, G.B. et al. DNA binding specificity of different STAT proteins. Comparison of in vitro specificity with natural target sites. J. Biol. Chem. 276, 6675–6688 (2001).

    Article  CAS  Google Scholar 

  4. Klug, S.J. & Famulok, M. All you wanted to know about SELEX. Mol. Biol. Rep. 20, 97–107 (1994).

    Article  CAS  Google Scholar 

  5. Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).

    Article  CAS  Google Scholar 

  6. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 3, 175–185 (1998).

    Article  Google Scholar 

  7. Roulet, E., Fisch, I., Junier, T., Bucher, P. & Mermod, N. Evaluation of computer tools for the prediction of transcription factor binding sites on genomic DNA. In Silico Biol. 1, 21–28 (1998).

    CAS  PubMed  Google Scholar 

  8. Roulet, E. et al. Experimental analysis and computer prediction of CTF/NF-I transcription factor DNA binding sites. J. Mol. Biol. 297, 833–848 (2000).

    Article  CAS  Google Scholar 

  9. Berg, O.G. & von Hippel, P.H. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193, 723–750 (1987).

    Article  CAS  Google Scholar 

  10. Goodman, S.D., Velten, N.J., Gao, Q., Robinson, S. & Segall, A.M. In vitro selection of integration host factor binding sites. J. Bacteriol. 181, 3246–3255 (1999).

    Article  CAS  Google Scholar 

  11. Fields, D.S., He, Y.Y., Al-Uzri, A.Y. & Stormo, G.D. Quantitative specificity of the Mnt repressor. J. Mol. Biol. 271, 178–194 (1997).

    Article  CAS  Google Scholar 

  12. Vant-Hull, B., Payano-Baez, A., Davis, R.H. & Gold, L. The mathematics of SELEX against complex targets. J. Mol. Biol. 278, 579–597 (1998).

    Article  CAS  Google Scholar 

  13. Meisterernst, M., Gander, I., Rogge, L. & Winnacker, E.L. A quantitative analysis of nuclear factor I/DNA interactions. Nucleic Acids Res. 16, 4419–4435 (1988).

    Article  CAS  Google Scholar 

  14. Perier, R.C., Praz, V., Junier, T. & Bucher, P. The eukaryotic promoter database EPD. Nucleic Acids Res. 28, 302–303 (2000).

    Article  CAS  Google Scholar 

  15. Man, T.K. & Stormo, G.D. Non-independence of Mnt repressor-operator interactions determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 29, 2471–2478 (2001).

    Article  CAS  Google Scholar 

  16. Zhang, M.Q. & Marr, T.G. A weight array method for splicing signal analysis. Comput. Appl. Biosci. 9, 499–509 (1993).

    CAS  PubMed  Google Scholar 

  17. Burge, C.B. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

    Article  CAS  Google Scholar 

  18. Hughey Hughey, R. & Krogh, A. Hidden Markov models for sequence analysis. Extension and analysis of the basic method. Comput. Appl. Biosci. 12, 95–107 (1996).

    PubMed  Google Scholar 

Download references

Acknowledgements

We thank Victor Jongeneel for support and suggestions, Roman Chrast and Stylianos Antonarakis for help with the SAGE procedure, Khalil Kadaoui for assistance, and Alan McNair for helpful comments on the manuscript. The financial support of the Ludwig Institute for Cancer Research, the Etat de Vaud, and the Swiss National Science Foundation (grants 31-63933.00 and 31-59370.99) are gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nicolas Mermod or Philipp Bucher.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roulet, E., Busso, S., Camargo, A. et al. High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites. Nat Biotechnol 20, 831–835 (2002). https://doi.org/10.1038/nbt718

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt718

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing