The Ensembl Analysis Pipeline

Simon C. Potter; Laura Clarke; Val Curwen; Stephen Keenan; Emmanuel Mongin; Stephen M.J. Searle; Arne Stabenau; Roy Storey; Michele Clamp

doi:10.1101/gr.1859804

The Ensembl Analysis Pipeline

¹ The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
² EMBL European Bioinformatics Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
³ The Broad Institute, Cambridge, Massachusetts 02141, USA

Abstract

The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules (“Runnables” and “RunnableDBs”) which are `wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job submission system (the “RuleManager”) which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.

Footnotes

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1859804.
↵4 Corresponding author. E-MAIL mclamp{at}broad.mit.edu; FAX (617) 258-0903.
- Accepted January 12, 2004.
- Received August 8, 2003.
Cold Spring Harbor Laboratory Press

The Ensembl Analysis Pipeline

Abstract

Footnotes

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

Share

Preprint Server

Current Issue

From the Cover