Reliable prediction of regulator targets using 12 Drosophila genomes

  1. Pouya Kheradpour1,4,
  2. Alexander Stark1,2,4,5,
  3. Sushmita Roy3, and
  4. Manolis Kellis1,2,5
  1. 1 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;
  2. 2 Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, USA;
  3. 3 Department of Computer Science, University of New Mexico, Albuquerque, New Mexico 87131, USA
  1. 4 These authors contributed equally to this work.

Abstract

Gene expression is regulated pre- and post-transcriptionally via cis-regulatory DNA and RNA motifs. Identification of individual functional instances of such motifs in genome sequences is a major goal for inferring regulatory networks yet has been hampered due to the motifs’ short lengths that lead to many chance matches and poor signal-to-noise ratios. In this paper, we develop a general methodology for the comparative identification of functional motif instances across many related species, using a phylogenetic framework that accounts for the evolutionary relationships between species, allows for motif movements, and is robust against missing data due to artifacts in sequencing, assembly, or alignment. We also provide a robust statistical framework for evaluating motif confidence, which enables us to translate evolutionary conservation into a confidence measure for each motif instance, correcting for varying motif length, composition, and background conservation of the target regions. We predict targets of fly transcription factors and miRNAs in alignments of 12 recently sequenced Drosophila species. When compared to extensive genome-wide experimental data, predicted targets are of high quality, matching and surpassing ChIP-chip microarrays and recovering miRNA targets with high sensitivity. The resulting regulatory network suggests significant redundancy between pre- and post-transcriptional regulation of gene expression.

Footnotes

  • 5 Corresponding authors.

    5 E-mail manoli{at}mit.edu; fax (617) 253-7512.

    5 E-mail alex.stark{at}mit.edu; fax (617) 253-7512.

  • [Supplemental material is available online at www.genome.org. All data and predicted transcription factor and miRNA targets are freely available at http://compbio.mit.edu/fly/motif-instances/.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.7090407

    • Received August 29, 2007.
    • Accepted October 10, 2007.
  • Freely available online through the Genome Research Open Access option.

Related Article

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server