Predicting tissue-specific enhancers in the human genome

  1. Len A. Pennacchio1,2,
  2. Gabriela G. Loots3,
  3. Marcelo A. Nobrega4, and
  4. Ivan Ovcharenko2,5,6
  1. 1 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA;
  2. 2 U.S. Department of Energy, Joint Genome Institute, Walnut Creek, California 94598, USA;
  3. 3 Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA;
  4. 4 Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA;
  5. 5 Computation Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA

Abstract

Determining how transcriptional regulatory signals are encoded in vertebrate genomes is essential for understanding the origins of multicellular complexity; yet the genetic code of vertebrate gene regulation remains poorly understood. In an attempt to elucidate this code, we synergistically combined genome-wide gene-expression profiling, vertebrate genome comparisons, and transcription factor binding-site analysis to define sequence signatures characteristic of candidate tissue-specific enhancers in the human genome. We applied this strategy to microarray-based gene expression profiles from 79 human tissues and identified 7187 candidate enhancers that defined their flanking gene expression, the majority of which were located outside of known promoters. We cross-validated this method for its ability to de novo predict tissue-specific gene expression and confirmed its reliability in 57 of the 79 available human tissues, with an average precision in enhancer recognition ranging from 32% to 63% and a sensitivity of 47%. We used the sequence signatures identified by this approach to successfully assign tissue-specific predictions to ∼328,000 human–mouse conserved noncoding elements in the human genome. By overlapping these genome-wide predictions with a data set of enhancers validated in vivo, in transgenic mice, we were able to confirm our results with a 28% sensitivity and 50% precision. These results indicate the power of combining complementary genomic data sets as an initial computational foray into a global view of tissue-specific gene regulation in vertebrates.

Footnotes

  • 6 Corresponding author.

    6 E-mail ovcharenko1{at}llnl.gov; fax (925) 422-2099.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5972507

    • Received September 20, 2006.
    • Accepted November 28, 2006.
  • Freely available online through the Genome Research Open Access option.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server