A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data—A case study using E2F1

  1. Victor X. Jin1,
  2. Alina Rabinovich1,
  3. Sharon L. Squazzo1,
  4. Roland Green2, and
  5. Peggy J. Farnham1,3
  1. 1 Department of Pharmacology and the Genome Center, University of California–Davis, Davis, California 95616, USA;
  2. 2 NimbleGen Systems Inc., Madison, Wisconsin 53711, USA

Abstract

Advances in high-throughput technologies, such as ChIP–chip, and the completion of human and mouse genomic sequences now allow analysis of the mechanisms of gene regulation on a systems level. In this study, we have developed a computational genomics approach (termed ChIPModules), which begins with experimentally determined binding sites and integrates positional weight matrices constructed from transcription factor binding sites, a comparative genomics approach, and statistical learning methods to identify transcriptional regulatory modules. We began with E2F1 binding site information obtained from ChIP–chip analyses of ENCODE regions, from both HeLa and MCF7 cells. Our approach not only distinguished targets from nontargets with a high specificity, but it also identified five regulatory modules for E2F1. One of the identified modules predicted a colocalization of E2F1 and AP-2α on a set of target promoters with an intersite distance of <270 bp. We tested this prediction using ChIP–chip assays with arrays containing ∼14,000 human promoters. We found that both E2F1 and AP-2α bind within the predicted distance to a large number of human promoters, demonstrating the strength of our sequence-based, unbiased, and universal protocol. Finally, we have used our ChIPModules approach to develop a database that includes thousands of computationally identified and/or experimentally verified E2F1 target promoters.

Footnotes

  • 3 Corresponding author.

    3 E-mail pjfarnham{at}ucdavis.edu; fax (530) 754-9658.

  • [Supplemental material is available online at www.genome.org. The E2F1 and AP-2α ChIP–chip data are deposited in GEO (GEO series # GSE5175, which includes GPL3930 and GSM116738–GSM116742)].

  • Article published online before print. Article and publication data are at http://www.genome.org/cgi/doi/10.1101/gr.5520206

    • Received May 17, 2006.
    • Accepted August 9, 2006.
| Table of Contents

Preprint Server