Assemblathon 1: A competitive assessment of de novo short read assembly methods

  1. Benedict Paten1,2,33
  1. 1Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA;
  2. 2Biomolecular Engineering Department, University of California, Santa Cruz, California 95064, USA;
  3. 3Genome Center, University of California, Davis, California 95616, USA;
  4. 4Bioinformatics Core, Genome Center, University of California, Davis, California 95616, USA;
  5. 5Computational & Mathematical Biology Group, Genome Institute of Singapore, Singapore 119077;
  6. 6School of Computing, National University of Singapore, Singapore 119077;
  7. 7Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom;
  8. 8EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom;
  9. 9CRACS-INESC Porto LA, Universidade do Porto, 4169-007 Porto, Portugal;
  10. 10Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada V5Z 4E6;
  11. 11DOE Joint Genome Institute, Walnut Creek, California 94598, USA;
  12. 12Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA;
  13. 13Computer Science Department, ENS Cachan/IRISA, 35042 Rennes Cedex, France;
  14. 14CNRS/Symbiose, IRISA, 35042 Rennes Cedex, France;
  15. 15INRIA, Rennes Bretagne Atlantique, 35042 Rennes Cedex, France;
  16. 16Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;
  17. 17Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA;
  18. 18National Biodefense Analysis and Countermeasures Center, Frederick, Maryland 20702, USA;
  19. 19Monsanto Company, Chesterfield, Missouri 63017, USA;
  20. 20Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, USA;
  21. 21Department of Biochemistry and Biophysics, University of California, San Francisco, California 94143, USA;
  22. 22Biological and Medical Informatics Program, University of California, San Francisco, California 94143, USA;
  23. 23Howard Hughes Medical Institute, Bethesda, Maryland 20814, USA;
  24. 24Department of Computer Science, Royal Holloway, University of London, London WC1E 7HU, United Kingdom;
  25. 25Softberry Inc., Mount Kisco, New York 10549, USA;
  26. 26The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, United Kingdom;
  27. 27The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7IH, United Kingdom;
  28. 28Computation Institute, University of Chicago, Chicago, Illinois 60637, USA;
  29. 29BGI-Shenzhen, Shenzhen 518083, China;
  30. 30Broad Institute, Cambridge, Massachusetts 02142, USA;
  31. 31Department of Computer Science, Iowa State University, Ames, Iowa 50011, USA;
  32. 32Molecular and Cellular Biology, Genome Center, University of California, Davis, California 95064, USA

    Abstract

    Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

    Footnotes

    • Received May 20, 2011.
    • Accepted September 8, 2011.

    Freely available online through the Genome Research Open Access option.

    Related Articles

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server