Trends in Genetics
Volume 25, Issue 10, October 2009, Pages 434-440
Journal home page for Trends in Genetics

Update
Genome Analysis
Different gene regulation strategies revealed by analysis of binding motifs

https://doi.org/10.1016/j.tig.2009.08.003Get rights and content

Coordinated regulation of gene expression relies on transcription factors (TFs) binding to specific DNA sites. Our large-scale information–theoretical analysis of >950 TF-binding motifs demonstrates that prokaryotes and eukaryotes use strikingly different strategies to target TFs to specific genome locations. Although bacterial TFs can recognize a specific DNA site in the genomic background, eukaryotic TFs exhibit widespread, nonfunctional binding and require clustering of sites to achieve specificity. We find support for this mechanism in a range of experimental studies and in our evolutionary analysis of DNA-binding domains. Our systematic characterization of binding motifs provides a quantitative assessment of the differences in transcription regulation in prokaryotes and eukaryotes.

Section snippets

DNA binding and gene regulation

Classical experiments have demonstrated that strong binding of a TF to its cognate site in a promoter is sufficient to alter gene expression [1]. Significant effort has been put into experimentally determining 2, 3, 4, 5, 6 and computationally inferring 7, 8, 9, 10 motifs recognized by TFs, and determining the occupancy of promoters by TFs [11]. The motifs and binding locations of a TF have in turn been used to predict which genes it regulates and their expression levels [12]. Such studies rely

An information–theoretical approach to binding-motif recognition

To bind its cognate site, a TF has to recognize it among ∼106 alternative sites in bacteria or ∼109 sites in eukaryotes. Using information theory, we ask whether individual TFs possess enough information for such remarkably precise recognition. The application of information theory to protein–DNA recognition has a rich history 16, 17, 18 and provides a theoretical basis for current efforts to characterize motifs recognized by DNA-binding proteins using a range of in vivo and in vitro techniques

Motifs of bacterial and eukaryotic TFs are markedly different

Using this metric, we find that the motifs of prokaryotic and eukaryotic TFs are strikingly different (Figure 2, Tables S5–6 in the online supplementary material). The average information content of a prokaryotic motif I ≈ 23 bits is slightly above the required Imin = 22 bits, demonstrating that a single cognate site is generally sufficient to address a TF to a specific location in prokaryotes, though there still might be an overlap between the background and some weak but functional sites (

Widespread nonfunctional binding in multicellular eukaryotes

The significant information deficiency in eukaryotes, which emerges because of their large genomes and the degeneracy of the motifs, has several biologically important consequences. Primarily, it suggests that numerous sites as strong as the cognate ones are expected to be present in eukaryotic genomes by chance. Using information theory and simulations, we estimate the lower bound of the number of such spurious sites or hits as h2IminI, with an average spacing s2I between them (Figure S1c

Widespread nonfunctional binding is consistent with diverse experimental data

Evidence of this landscape has been found in several large-scale experiments. Our estimate of ∼103 spurious hits in the chromatinized D. melanogaster genome is consistent with the 103–104 experimentally observed binding events for several TFs [14]. Moreover, our results explain the large number of binding events detected by ChIP-chip [11] and ChIP-seq experiments [24], suggesting that the majority of these events reflect the widespread binding to sites that arise by chance and are likely to be

Clustering of cognate sites can provide specificity of eukaryotic TFs

Although high TF copy numbers are necessary to cope with spurious sites, they are not sufficient to provide specificity (i.e. to allow cellular machinery to distinguish regulatory binding sites from equally strong decoys). However, the presence of multiple sites in proximity to each other can specify a regulatory region. Many regulatory regions in eukaryotes contain multiple sites of the same or different TFs 7, 27, 28, 29, 30, 31, 32, 33, 34, 35, a property commonly used in bioinformatics to

Eukaryotic and bacterial TFs using different repertoires of DNA-binding domains

Our study shows that combinatorial regulation is rooted in the way eukaryotic TFs recognize DNA, but how did this difference from prokaryotes arise? The gradual modifications of DNA-binding residues, expansion or contraction of the DNA-binding interface and/or re-invention of DNA-binding domains might have contributed to this difference. To investigate the possible evolutionary trajectory, we compare sequences of prokaryotic and eukaryotic DNA-binding domains of TFs available in the PFAM

Energy-based considerations of TF binding

As demonstrated in the seminal paper by Berg and von Hippel [16] and later papers this information–theoretical approach is closely related to the energy-based analysis of TF-binding motifs. The constraints on the information content of motifs considered here can be interpreted as constraints on the sequence-specific protein–DNA binding energy. Gerland et al.[26] and Lassig [39] have considered these constraints and demonstrated that the energy contribution of each consensus base pair to the

Concluding remarks

We asked whether individual TF-binding motifs possess enough information to find a cognate site in the genome. The promiscuity of eukaryotic TFs leads to widespread, likely nonfunctional, binding to decoy sites. If supported by direct experimental evidence, this conclusion will challenge our understanding of gene regulation, which was gained largely from experiments in bacterial systems and can be summarized as: one site – one TF – one binding event. In multicellular eukaryotes, this paradigm

Acknowledgments

We thank Daniel Fisher, Shamil Sunyaev, Mikahil Gelfand, Shaun Mahoney, Sharad Ramanathan and Alex Shpunt for insightful discussions and Michael Schnall for the interpretation of the information cutoff. ZW was supported by a Howard Hughes Medical Institute Predoctoral Fellowship. LM acknowledges the support of i2b2, NIH-supported Center for Biomedical Computing at the Brigham and Women's Hospital.

References (44)

  • G. Badis

    Diversity and complexity in DNA recognition by transcription factors

    Science (New York, N.Y.)

    (2009)
  • A.M. Sengupta

    Specificity and robustness in transcription control networks

    Proc. Natl. Acad. Sci. U. S. A.

    (2002)
  • K.D. MacIsaac

    An improved map of conserved regulatory sites for Saccharomyces cerevisiae

    BMC Bioinformatics

    (2006)
  • J.B. Kinney

    Precise physical models of protein–DNA interaction from high-throughput data

    Proc. Natl. Acad. Sci. U. S. A.

    (2007)
  • C.T. Harbison

    Transcriptional regulatory code of a eukaryotic genome

    Nature

    (2004)
  • F. Gao

    Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data

    BMC Bioinformatics

    (2004)
  • Z. Hu

    Genetic reconstruction of a functional transcriptional regulatory network

    Nat. Genet.

    (2007)
  • X.Y. Li

    Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm

    PLoS Biol.

    (2008)
  • A. Gitter

    Backup in gene regulatory networks explains differences between binding and knockout results

    Mol. Syst. Biol.

    (2009)
  • O.G. Berg et al.

    Selection of DNA binding sites by regulatory proteins. Statistical–mechanical theory and application to operators and promoters

    J. Mol. Biol.

    (1987)
  • T.M. Cover et al.

    Elements of Information Theory

    (1991)
  • D. Vlieghe

    A new generation of JASPAR, the open-access repository for transcription factor binding site profiles

    Nucleic Acids Res.

    (2006)
  • Cited by (197)

    View all citing articles on Scopus
    View full text