Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Application of 'next-generation' sequencing technologies to microbial genetics

Key Points

  • New sequencing technologies, such as Solexa, 454 pyrosequencing and SOLiD, developed by Illumina, Roche and Applied Biosystems, respectively, are set to revolutionize microbiology by dramatically increasing throughput and reducing costs of DNA sequencing.

  • These new technologies present new technical and computational challenges, as well as new research opportunities.

  • Applications include de novo genome sequence assembly, metagenomics, sRNA discovery, detection of polymorphisms, expression profiling and epigenetics.

  • Many freely available software packages are available for dealing with the large datasets generated by these applications.

  • As well as sequence alignment and assembly, there is a need for downstream processing of data into a form that is accessible to biologists.

  • Standards are emerging for analysis and archiving of data generated by the new technologies.

Abstract

New sequencing methods generate data that can allow the assembly of microbial genome sequences in days. With such revolutionary advances in technology come new challenges in methodologies and informatics. In this article, we review the capabilities of high-throughput sequencing technologies and discuss the many options for getting useful information from the data.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: High-throughput sequencing technologies.
Figure 2: Selecting a technology for an experiment.
Figure 3: Road map for planning software solutions for experiments with different data sources and different goals.

Similar content being viewed by others

References

  1. Pop, M. & Salzberg, S. L. Bioinformatics challenges of new sequencing technology. Trends Genet. 24, 142–149 (2008). An accessible overview of the computational challenges presented by new sequencing technologies.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Trombetti, G. A., Bonnal, R. J., Rizzi, E., De Bellis, G. & Milanesi, L. Data handling strategies for high throughput pyrosequencers. BMC Bioinformatics 8, S22 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Hall, N. Advanced sequencing technologies and their wider impact in microbiology. J. Exp. Biol. 210, 1518–1525 (2007).

    Article  CAS  PubMed  Google Scholar 

  4. Holt, R. A. & Jones, S. J. The new paradigm of flow cell sequencing. Genome Res. 18, 839–846 (2008). A comprehensive description of sequencing technologies and their applications.

    Article  CAS  PubMed  Google Scholar 

  5. Mardis, E. R. The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008).

    Article  CAS  PubMed  Google Scholar 

  6. Mardis, E. R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387–402 (2008).

    Article  CAS  PubMed  Google Scholar 

  7. Marguerat, S., Wilhelm, B. T. & Bähler, J. Next-generation sequencing: applications beyond genomes. Biochem. Soc. Trans. 36, 1091–1096 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Medini, D. et al. Microbiology in the post-genomic era. Nature Rev. Microbiol. 6, 419–430 (2008).

    Article  CAS  Google Scholar 

  9. Rusk, N. & Kiermer, V. Primer: sequencing — the next generation. Nature Methods 5, 15 (2008).

    Article  CAS  PubMed  Google Scholar 

  10. Schuster, S. C. Next-generation sequencing transforms today's biology. Nature Methods 5, 16–18 (2008).

    Article  CAS  PubMed  Google Scholar 

  11. Shendure, J. & Ji, H. Next-generation DNA sequencing. Nature Biotechnol. 26, 1135–1145 (2008). Contains detailed descriptions of sequencing technologies and their applications, and a useful survey of available software.

    Article  CAS  Google Scholar 

  12. Snyder, L. A., Loman, N., Pallen, M. J. & Penn, C. W. Next-generation sequencing — the promise and perils of charting the great microbial unknown. Microb. Ecol. 57, 1–3 (2009).

    Article  PubMed  Google Scholar 

  13. Steinberg, K. M., Okou, D. T. & Zwick, M. E. Applying rapid genome sequencing technologies to characterize pathogen genomes. Anal. Chem. 80, 520–528 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Wold, B. & Myers, R. M. Sequence census methods for functional genomics. Nature Methods 5, 19–21 (2008).

    Article  CAS  PubMed  Google Scholar 

  15. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Braslavsky, I., Hebert, B., Kartalov, E. & Quake, S, R. Sequence information can be obtained from single DNA molecules. Proc. Natl Acad. Sci. USA 100, 3960–3964 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Harris, T. D. et al. Single-molecule DNA sequencing of a viral genome. Science 320, 106–109 (2008).

    Article  CAS  PubMed  Google Scholar 

  18. Medini, D., Donati, C., Tettelin, H., Masignani, V. & Rappuoli, R. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594 (2005).

    Article  CAS  PubMed  Google Scholar 

  19. Velicer, G. J. Comprehensive mutation identification in an evolved bacterial cooperator and its cheating ancestor. Proc. Natl Acad. Sci. USA 103, 8107–8112 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Mardis, E., McPherson, J., Martienssen, R., Wilson, R. K. & McCombie, W. R. What is finished, and why does it matter. Genome Res. 12, 669–671 (2002).

    Article  CAS  PubMed  Google Scholar 

  21. Stiens, M. et al. Comparative genomic hybridisation and ultrafast pyrosequencing revealed remarkable differences between the Sinorhizobium meliloti genomes of the model strain Rm1021 and the field isolate SM11. J. Biotechnol. 136, 31–37 (2008).

    Article  CAS  PubMed  Google Scholar 

  22. La Scola, B. et al. Rapid comparative genomic analysis for clinical microbiology: the Francisella tularensis paradigm. Genome Res. 18, 742–750 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Dinsdale, E. A. et al. Functional metagenomic profiling of nine biomes. Nature 455, 830 (2008). The 454 GS20 technology developed by Roche enabled the authors to find that metagenomes from different biomes encode distinctly different metabolic profiles.

    Article  CAS  Google Scholar 

  24. Ossowski, S. et al. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 18, 2024–2033 (2008). The authors tackle genome-wide polymorphism by integrating 'resequencing' approaches with de novo assembly.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3, e3376 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Holt, K. E. et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nature Genet. 40, 987–993 (2008).

    Article  CAS  PubMed  Google Scholar 

  27. Liu, Z. et al. Patterns of diversifying selection in the phytotoxin-like scr74 gene family of Phytophthora infestans. Mol. Biol. Evol. 22, 659–672 (2004).

    Article  CAS  PubMed  Google Scholar 

  28. Kamoun, S. A catalogue of the effector secretome of plant pathogenic oomycetes. Annu. Rev. Phytopathol. 44, 41–60 (2006).

    Article  CAS  PubMed  Google Scholar 

  29. Srivatsan, A. et al. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies. PLoS Genet. 4, e1000139 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Loman, N. J. & Pallen, M. J. XDR-TB genome sequencing: a glimpse of the microbiology of the future. Future Microbiol. 3, 111–113 (2008).

    Article  CAS  PubMed  Google Scholar 

  31. Velculescu, V. E., Zhang, L., Vogelstein, B. & Kinzler, K. W. Serial analysis of gene expression. Science 270, 484–487 (1995).

    Article  CAS  PubMed  Google Scholar 

  32. Cheung, F. et al. Analysis of the Pythium ultimum transcriptome using Sanger and pyrosequencing approaches. BMC Genomics 9, 542 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods 5, 613–619 (2008).

    Article  CAS  PubMed  Google Scholar 

  34. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA–Seq. Nature Methods 5, 621–628 (2008).

    Article  CAS  PubMed  Google Scholar 

  35. Nagalakshmi, U. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008). This ambitious and comprehensive survey of the epigenome was enabled by sequencing technology developed by Illumina.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. RNA–Seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Shendure, J. The beginning of the end for microarrays? Nature Methods 5, 585–587 (2008).

    Article  CAS  PubMed  Google Scholar 

  39. Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).

    Article  CAS  PubMed  Google Scholar 

  40. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  PubMed  Google Scholar 

  41. Taylor, K. H. Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. Cancer Res. 67, 8511–8518 (2007).

    Article  CAS  PubMed  Google Scholar 

  42. Cokus, S. J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    Article  CAS  PubMed  Google Scholar 

  44. Hakimi, M. A. & Deitsch, K. W. Epigenetics in Apicomplexa: control of gene expression during cell cycle progression, differentiation and antigenic variation. Curr. Opin. Microbiol. 10, 357–362 (2007).

    Article  CAS  PubMed  Google Scholar 

  45. Wang, G. P., Ciuffi, A., Leipzig, J., Berry, C. C. & Bushman, F. D. HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. Genome Res. 17, 1186–1194 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Molnár, A., Schwach, F., Studholme, D. J., Thuenemann, E. C. & Baulcombe, D. C. miRNAs control gene expression in the single-cell alga Chlamydomonas reinhardtii. Nature 447, 1126–1129 (2007).

    Article  PubMed  Google Scholar 

  47. Dohm, J. C., Lottaz, C., Borodina, T. & Himmelbauer, H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17, 1697–1706 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Warren, R. L., Sutton, G. G., Jones, S. J. & Holt, R. A. Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23, 500–501 (2007).

    Article  CAS  PubMed  Google Scholar 

  49. Jeck, W. R. et al. Extending assembly of short DNA sequences to handle error. Bioinformatics 23, 2942–2944 (2007).

    Article  CAS  PubMed  Google Scholar 

  50. Pevzner, P. A., Tang, H. & Waterman, M. S. An Eulerian path approach to DNA fragment assembly. Proc. Natl Acad. Sci. USA 98, 9748–9753 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Chaisson, M. J. & Pevzner, P. A. Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Hernandez, D., François, P., Farinelli, L., Osterås, M. & Schrenzel, J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 18, 802–809 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Phillippy, A. M., Schatz, M. C. & Pop, M. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol. 9, R55 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Huang, W. & Marth, G. EagleView: a genome assembly viewer for next-generation sequencing technologies. Genome Res. 18, 1538–1543 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Farrer, R. A., Kemen, E., Jones, J. D. G. & Studholme, D. J. De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads. FEMS Microbiol. Lett. 291, 103–111 (2009).

    Article  CAS  PubMed  Google Scholar 

  58. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  59. Kent, W. J. BLAT — the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Ning, Z., Cox, A. J. & Mullikin, J. C. SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

    Article  CAS  PubMed  Google Scholar 

  63. Smith, A. D., Xuan, Z. & Zhang, M. Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9, 128 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Prüfer, K. et al. PatMaN: rapid alignment of short sequences to large databases. Bioinformatics 24, 1530–1531 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).

    Article  CAS  PubMed  Google Scholar 

  66. Jiang, H. & Wong, W. H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24, 2395–2396 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Coarfa, C. & Milosavljevic, A. Pash 2.0: scaleable sequence anchoring for next-generation sequencing technologies. Pac. Symp. Biocomput. 102–113 (2008).

  68. Fejes, A. P. et al. FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 24, 1729–1730 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP–Seq data. Nature Methods 5, 829–834 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Stein, L. D. The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599–1610 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Barton, G. et al. EMAAS: an extensible grid-based rich internet application for microarray data analysis and management. BMC Bioinformatics 9, 493 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Huntley, D., Tang, Y. A., Nesterova, T. B., Butcher, S. & Brockdorff, N. Genome Environment Browser (GEB): a dynamic browser for visualising high-throughput experimental data in the context of genome features. BMC Bioinformatics 9, 501 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Field, D. et al. The minimum information about a genome sequence (MIGS) specification. Nature Biotechnol. 26, 541–547 (2008).

    Article  CAS  Google Scholar 

  74. Aury, J. M. High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies. BMC Genomics 9, 603 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Reinhardt, J. A. et al. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res. 19, 294–305 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to S. Kamoun, E. Kemen, S. Foster and M. Pallen for useful discussions and suggestions on the manuscript. This work was supported by Gatsby Foundation core funding to The Sainsbury Laboratory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David J. Studholme.

Related links

Related links

DATABASES

Entrez Genome Project

Arabidopsis thaliana

Bacillus subtilis

Chlamydomonas reinhardtii

Escherichia coli

Francisella tularensis subsp. holartica

Mycobacterium tuberculosis

Myxococcus xanthus

Neurospora crassa

Phytophthora infestans

Salmonella enterica subsp. enteric serovar Typhi

Sinorhizobium meliloti

FURTHER INFORMATION

David J Studholme's homepage

1001 Genomes Project

454 sequencing

Chlamydomonas reinhardtii silencing RNA database

EMBL–EBI

Ensembl

Genomes OnLine Database

Helicos BioSciences

Pacific Biosciences

SEQanswers

SHRiMP

Solexa

SOLiD Systems Sequencing

SourceForge.net (sequence read format description)

Glossary

De novo assembly

Construction of longer sequences, such as contigs or genomes, from shorter sequences, such as sequence reads, without prior knowledge of the order of the reads or reference to a closely related sequence.

Contig

A fragment of genome sequence derived by assembling shorter sequence reads into larger constructs on the basis of overlap between the sequence reads.

Paired-end read

A sequence read known to come from a genomic region within a limited number of nucleotides of another. The extra information puts constraints on how far apart the reads can be placed during assembly or alignment, allowing more accurate placement and construction of contigs.

Epigenetics

The study of inherited changes in gene function that cannot be explained by changes in DNA sequence.

de Bruijn graph

In mathematics, a network structure is properly called a graph. The entities that are connected are called nodes and the connections are called edges. A de Bruijn graph is a graph in which the nodes are sets of symbols (similarly to the nucleotides in a sequence read) and the edges represent overlaps between the symbols. This is a convenient way to represent data, such as overlapping sequence reads.

k-mer

A piece of nucleotide sequence of length k. A k-mer is usually used to indicate a computationally selected subsequence of an experimentally derived sequence, such as a read or a genome.

N50

A measure of contig length. If all contigs generated in an assembly are placed end to end in order of length (longest first), then the N50 is the length of the contig that, when added, causes the total length of the chain to exceed half of the length of the genome being sequenced. The longer the contigs are the longer the contig that would break this barrier.

BLAST

(Basic local alignment and search tool). A computer program for finding sequences in a database that have identity to a query sequence. BLAST has been available for years, and is the most widely used search tool.

MIGS

(Minimum information about a genome sequence). A proposed metadata standard that aims to capture essential species, the source of the strain and other phylogenetic and experimental data about a sequenced organism. Such data collection facilitates the cataloguing and searching of species in large-scale databases.

Finished genome

A genome sequence that has been shotgun sequenced and subjected to post-assembly procedures, such as long PCR, to close the gaps that occur between contigs.

Rights and permissions

Reprints and permissions

About this article

Cite this article

MacLean, D., Jones, J. & Studholme, D. Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol 7, 96–97 (2009). https://doi.org/10.1038/nrmicro2088

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrmicro2088

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing