Eukaryotic Regulatory Element Conservation Analysis and Identification Using Comparative Genomics

  1. Yueyi Liu1,6,
  2. X. Shirley Liu4,6,
  3. Liping Wei5,
  4. Russ B. Altman2, and
  5. Serafim Batzoglou3,7
  1. 1 Stanford Medical Informatics, Stanford University, Stanford, California 94305, USA
  2. 2 Department of Genetics, Stanford University, Stanford, California 94305, USA
  3. 3 Department of Computer Science, Stanford University, Stanford, California 94305, USA
  4. 4 Department of Biostatistics, Harvard School of Public Health, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
  5. 5 Nexus Genomics, Inc., Mountain View, California 94043, USA

Abstract

Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from evolutionary constraints. We systematically analyzed known human and Saccharomyces cerevisiae regulatory elements and discovered that human regulatory elements are more conserved between human and mouse than are background sequences. Although S. cerevisiae regulatory elements do not appear to be more conserved by comparison of S. cerevisiae to Schizosaccharomyces pombe, they are more conserved when compared with multiple other yeast genomes (Saccharomyces paradoxus, Saccharomyces mikatae, and Saccharomyces bayanus). Based on these analyses, we developed a sequence-motif-finding algorithm called CompareProspector, which extends Gibbs sampling by biasing the search in regions conserved across species. Using human–mouse comparison, CompareProspector identified known motifs for transcription factors Mef2, Myf, Srf, and Sp1 from a set of human-muscle-specific genes. It also discovered the NFAT motif from genes up-regulated by CD28 stimulation in T-cells, which implies the direct involvement of NFAT in mediating the CD28 stimulatory signal. Using Caenorhabditis elegansCaenorhabditis briggsae comparison, CompareProspector found the PHA-4 motif and the UNC-86 motif. CompareProspector outperformed many other computational motif-finding programs, demonstrating the power of comparative genomics-based biased sampling in eukaryotic regulatory element identification.

Footnotes

  • [Supplemental data are available at www.genome.org and at http://compareprospector.stanford.edu. The program CompareProspector is available at http://compareprospector.stanford.edu.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1327604.

  • 6 These authors contributed equally to this work.

  • 7 Corresponding author. E-MAIL serafim{at}cs.stanford.edu; FAX (650) 725-1449.

    • Accepted December 27, 2003.
    • Received March 10, 2003.
| Table of Contents

Preprint Server