Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome

  1. Nathan D. Trinklein1,6,7,
  2. Ulaş Karaöz2,6,
  3. Jiaqian Wu3,6,
  4. Anason Halees2,6,
  5. Shelley Force Aldred1,7,
  6. Patrick J. Collins1,
  7. Deyou Zheng4,
  8. Zhengdong D. Zhang4,
  9. Mark B. Gerstein4,
  10. Michael Snyder3,4,
  11. Richard M. Myers1,8, and
  12. Zhiping Weng2,5,8
  1. 1 Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA;
  2. 2 Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA;
  3. 3 Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA;
  4. 4 Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA;
  5. 5 Biomedical Engineering Department, Boston University, Boston, Massachusetts 02215, USA
  1. 6 These authors contributed equally to this work.

  2. 7 Presently at SwitchGear Genomics, Menlo Park, CA 94025, USA.

Abstract

The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3′-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5′-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5′-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5′-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.

Footnotes

  • 8 Corresponding authors.

    8 E-mail zhiping{at}bu.edu; fax (617) 353-6766.

    8 E-mail myers{at}shgc.stanford.edu; fax (650) 725-9689.

  • [Supplemental material is available online at www.genome.org.]

  • Article is online at http://www.genome.org/cgi/doi/10.1101/gr.5716607

    • Received July 3, 2006.
    • Accepted February 5, 2007.
  • Freely available online through the Genome Research Open Access option.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server