Cross-species de novo identification of cis-regulatory modules with GibbsModule: Application to gene regulation in embryonic stem cells

  1. Dan Xie1,
  2. Jun Cai1,
  3. Na-Yu Chia2,
  4. Huck H. Ng2,3, and
  5. Sheng Zhong1,4,5,6,7
  1. 1 Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA;
  2. 2 Gene Regulation Laboratory, Genome Institute of Singapore, Singapore 138672;
  3. 3 Department of Biological Sciences, National University of Singapore, Singapore 117543;
  4. 4 Department of Statistics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA;
  5. 5 Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA;
  6. 6 Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA

Abstract

We introduce the GibbsModule algorithm for de novo detection of cis-regulatory motifs and modules in eukaryote genomes. GibbsModule models the coexpressed genes within one species as sharing a core cis-regulatory motif and each homologous gene group as sharing a homologous cis-regulatory module (CRM), characterized by a similar composition of motifs. Without using a predetermined alignment result, GibbsModule iteratively updates the core motif shared by coexpressed genes and traces the homologous CRMs that contain the core motif. GibbsModule achieved substantial improvements in both precision and recall as compared with peer algorithms on a number of synthetic and real data sets. Applying GibbsModule to analyze the binding regions of the Krüppel-like factor (KLF) transcription factor in embryonic stem cells (ESCs), we discovered a motif that differs from a previously published KLF motif identified by a SELEX experiment, but the new motif is consistent with mutagenesis analysis. The SOX2 motif was found to be a collaborating motif to the KLF motif in ESCs. We used quantitative chromatin immunoprecipitation (ChIP) analysis to test whether GibbsModule could distinguish functional and nonfunctional binding sites. All seven tested binding sites in GibbsModule-predicted CRMs had higher ChIP signals as compared with the other seven tested binding sites located outside of predicted CRMs. GibbsModule is available at http://biocomp.bioen.uiuc.edu/GibbsModule.

Footnotes

  • 7 Corresponding author.

    7 E-mail szhong{at}uiuc.edu; fax (217) 265-0246.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.072769.107.

  • 8 Although we use the term ‘‘upstream sequence’’ to describe the method, in practice, the method should be applied to any regions that may contain cis-regulatory elements.

  • 9 Throughout this article we use 1, −0.33, −1, −0.33, and 10 as match score, mismatch penalty, gap open penalty, gap extension penalty, and conservation threshold, respectively. This means that we expect to see a 10-bp TFBS to be perfectly conserved, or a 14-bp TFBS to have no more than three mutations.

  • 10 The mouse data are obtained with a ChIP-PET technology, which is similar to ChIP-chip, but instead of hybridizing the ChIP sequences onto a microarray, it uses a sequencing technology to count the immunoprecipitated sequences. In this article, we use the term ChIP-chip to represent both ChIP-chip and ChIP-PET data.

  • 11 SOX2 ChIP–PET sequences in murine ESCs (H.H. Ng, unpubl.).

    • Received October 31, 2007.
    • Accepted May 5, 2008.
| Table of Contents

Preprint Server