Close sequence comparisons are sufficient to identify human cis-regulatory elements

  1. Shyam Prabhakar1,2,4,
  2. Francis Poulin1,3,
  3. Malak Shoukry1,
  4. Veena Afzal1,
  5. Edward M. Rubin1,2,
  6. Olivier Couronne1,2, and
  7. Len A. Pennacchio1,2,4
  1. 1 Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA;
  2. 2 U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA

Abstract

Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons. To address this problem, we identified evolutionarily conserved noncoding regions in primate, mammalian, and more distant comparisons using a uniform approach (Gumby) that facilitates unbiased assessment of the impact of evolutionary distance on predictive power. We benchmarked computational predictions against previously identified cis-regulatory elements at diverse genomic loci and also tested numerous extremely conserved human–rodent sequences for transcriptional enhancer activity using an in vivo enhancer assay in transgenic mice. Human regulatory elements were identified with acceptable sensitivity (53%–80%) and true-positive rate (27%–67%) by comparison with one to five other eutherian mammals or six other simian primates. More distant comparisons (marsupial, avian, amphibian, and fish) failed to identify many of the empirically defined functional noncoding elements. Our results highlight the practical utility of close sequence comparisons, and the loss of sensitivity entailed by more distant comparisons. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole-genome comparative analysis that explains most of the observations from empirical benchmarking. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for in vivo testing at embryonic time points.

Footnotes

  • 3 Present address: Department of Integrative Biology, University of California, Berkeley, CA 94720, USA.

  • 4 Corresponding authors.

    4 E-mail SPrabhakar{at}lbl.gov; fax (510) 486-4229. E-mail LAPennacchio{at}lbl.gov; fax (510) 486-4229.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4717506

    • Received September 23, 2005.
    • Accepted April 10, 2006.
| Table of Contents

Preprint Server