Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Genome-wide genetic marker discovery and genotyping using next-generation sequencing

Key Points

  • New methods that make use of high-throughput sequencing are enabling the simultaneous discovery and sequencing of thousands of genetic markers across whole genomes.

  • These methods can be used to study wild populations of tens or hundreds of individuals for which genomic resources were not previously available.

  • They also enable the rapid genotyping of hundreds of individuals in a mapping cross, for quantitative trait locus (QTL) mapping and marker-assisted selection.

  • We describe best practices and make recommendations for a group of methods involving the use of restriction enzymes, namely reduced-representation libraries, complexity reduction of polymorphic sequences, restriction-site-associated DNA sequencing, multiplexed shotgun genotyping and genotyping by sequencing.

  • We discuss the impact of several factors — such as the availability of genomic resources, the levels of polymorphism, the pooling of samples and the choice of restriction enzyme — on the design and implementation of high-throughput marker discovery and genotyping experiments.

  • The analysis of data from these methods can be challenging and new methods for processing high-throughput marker data are described.

  • At present these methods are far more economical than whole-genome sequencing. We discuss how this situation is likely to change over the next few years, as sequencing costs continue to fall rapidly.

Abstract

The advent of next-generation sequencing (NGS) has revolutionized genomic and transcriptomic approaches to biology. These new sequencing tools are also valuable for the discovery, validation and assessment of genetic markers in populations. Here we review and discuss best practices for several NGS methods for genome-wide genetic marker development and genotyping that use restriction enzyme digestion of target genomes to reduce the complexity of the target. These new methods — which include reduced-representation sequencing using reduced-representation libraries (RRLs) or complexity reduction of polymorphic sequences (CRoPS), restriction-site-associated DNA sequencing (RAD-seq) and low coverage genotyping — are applicable to both model organisms with high-quality reference genome sequences and, excitingly, to non-model species with no existing genomic data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Methods for high-throughput marker discovery.
Figure 2: Effects of restriction enzyme selection on reference genomes of different size and with different levels of polymorphism.

Similar content being viewed by others

References

  1. Luikart, G., England, P. R., Tallmon, D., Jordan, S. & Taberlet, P. The power and promise of population genomics: from genotyping to genome typing. Nature Rev. Genet. 4, 981–994 (2003).

    Article  CAS  PubMed  Google Scholar 

  2. Stapley, J. et al. Adaptation genomics: the next generation. Trends Ecol. Evol. 25, 705–712 (2010).

    Article  PubMed  Google Scholar 

  3. Allendorf, F. W., Hohenlohe, P. A. & Luikart, G. Genomics and the future of conservation genetics. Nature Rev. Genet. 11, 697–709 (2010).

    Article  CAS  PubMed  Google Scholar 

  4. Helyar, S. J. et al. Application of SNPs for population genetics of nonmodel organisms: new opportunities and challenges. Mol. Ecol. Resour. 11, 123–136 (2011).

    Article  PubMed  Google Scholar 

  5. Botstein, D., White, R. L., Skolnick, M. & Davis, R. W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32, 314–331 (1980).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Vos, P. et al. AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res. 23, 4407–4414 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Jarne, P. & Lagoda, P. J. Microsatellites, from molecules to populations and back. Trends Ecol. Evol. 11, 424–429 (1996).

    Article  CAS  PubMed  Google Scholar 

  8. Gusella, J. F. et al. A polymorphic DNA marker genetically linked to Huntington's disease. Nature 306, 234–238 (1983).

    Article  CAS  PubMed  Google Scholar 

  9. Riordan, J. et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245, 1066–1073 (1989).

    Article  CAS  PubMed  Google Scholar 

  10. Donis-Keller, H. et al. A genetic linkage map of the human genome. Cell 51, 319–337 (1987).

    Article  CAS  PubMed  Google Scholar 

  11. Altshuler, D. et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513–516 (2000).

    Article  CAS  PubMed  Google Scholar 

  12. van Tassell, C. P. et al. SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nature Methods 5, 247–252 (2008). The first description of the RRL approach using NGS.

    Article  CAS  PubMed  Google Scholar 

  13. Wiedmann, R. T., Smith, T. P. & Nonneman, D. J. SNP discovery in swine by reduced representation and high throughput pyrosequencing. BMC Genet. 9, 81 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Ramos, A. M. et al. Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS ONE 4, e6524 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Amaral, A. J. et al. Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome. BMC Genomics 10, 374 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Amaral, A. J. et al. Genome-wide footprints of pig domestication and selection revealed through massive parallel sequencing of pooled DNA. PLoS ONE 6, e14782 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Kerstens, H. H. et al. Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey. BMC Genomics 10, 479 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Gore, M. A. et al. A first-generation haplotype map of maize. Science 326, 1115–1117 (2009). An example of the simplicity and power of reduced-representation sequencing for the development of whole-genome resources.

    Article  CAS  PubMed  Google Scholar 

  19. Sánchez, C. et al. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library. BMC Genomics 10, 559 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  20. van Bers, N. E. M. et al. Genome-wide SNP detection in the great tit Parus major using high throughput sequencing. Mol. Ecol. 19 (Suppl. 1), 89–99 (2010).

    Article  CAS  PubMed  Google Scholar 

  21. Hyten, D. L. et al. High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence. BMC Genomics 11, 38 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Hyten, D. L. et al. High-throughput SNP discovery and assay development in common bean. BMC Genomics 11, 475 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Esteve-Codina, A. et al. Partial short-read sequencing of a highly inbred Iberian pig and genomics inference thereof. Heredity 16 Mar 2011 (doi:10.1038/hdy.2011.13).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. You, F. M. et al. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence. BMC Genomics 12, 59 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Kraus, R. H. S. et al. Genome wide SNP discovery, analysis and evaluation in mallard (Anas platyrhynchos). BMC Genomics 12, 150 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Pandey, V., Nutter, R. C. & Prediger, E. in Next Generation Genome Sequencing: Towards Personalized Medicine (ed. Janitz, M.) 29–42 (Wiley-VCH Weinheim, 2008).

    Book  Google Scholar 

  29. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443–451 (2011).

    Article  CAS  PubMed  Google Scholar 

  31. Kerstens, H. H. et al. Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries. BMC Genomics 12, 94 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. van Orsouw, N. J. et al. Complexity reduction of polymorphic sequences (CRoPS): a novel approach for large-scale polymorphism discovery in complex genomes. PLoS ONE 2, e1172 (2007). The original description of the CRoPS method.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Mammadov, J. A. et al. Development of highly polymorphic SNP markers from the complexity reduced portion of maize [Zea mays, L.] genome for use in marker-assisted breeding. Theor. Appl. Genet. 121, 577–588 (2010).

    Article  CAS  PubMed  Google Scholar 

  34. Gompert, Z. et al. Bayesian analysis of molecular variance in pyrosequences quantifies population genetic structure across the genome of Lycaeides butterflies. Mol. Ecol. 19, 2455–2473 (2010). An excellent demonstration of CRoPS, with a useful analysis technique for handling large population genomics data sets.

    Article  CAS  PubMed  Google Scholar 

  35. Gompert, Z. & Buerkle, C. A. A hierarchical Bayesian model for next-generation population genomics. Genetics 187, 903–917 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Davey, J. W. & Blaxter, M. L. RADSeq: next-generation population genetics. Brief. Funct. Genomics 9, 416–423 (2010).

    Article  CAS  PubMed  Google Scholar 

  37. Miller, M. R., Dunham, J. P., Amores, A., Cresko, W. A. & Johnson, E. A. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 17, 240–248 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Baird, N. A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3, e3376 (2008). The original description of high-throughput RAD-seq.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Hohenlohe, P. A. et al. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet. 6, e1000862 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Emerson, K. J. et al. Resolving postglacial phylogeography using high-throughput sequencing. Proc. Natl Acad. Sci. USA 107, 16196–16200 (2010). A demonstration of the power of RAD-seq for the study of non-model wild populations.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Hohenlohe, P. A., Amish, S. J., Catchen, J. M., Allendorf, F. W. & Luikart, G. Next-generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout. Mol. Ecol. Resour. 11, 117–122 (2011).

    Article  PubMed  Google Scholar 

  42. Chutimanitsakun, Y. et al. Construction and application for QTL analysis of a restriction site associated DNA (RAD) linkage map in barley. BMC Genomics 12, 4 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Pfender, W. F., Saha, M. C., Johnson, E. A. & Slabaugh, M. B. Mapping with RAD (restriction-site associated DNA) markers to rapidly identify QTL for stem rust resistance in Lolium perenne. Theor. Appl. Genet. 122, 1467–1480 (2011).

    Article  CAS  PubMed  Google Scholar 

  44. Baxter, S. W. et al. Linkage mapping and comparative genomics using next-generation RAD sequencing of a non-model organism. PLoS ONE 6, e19315 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Etter, P. D., Preston, J., Bassham, S., Cresko, W. A. & Johnson, E. A. Local de novo assembly of RAD paired-end contigs using short sequencing reads. PLoS ONE 6, e18561 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Huang, X. et al. High-throughput genotyping by whole-genome resequencing. Genome Res. 19, 1068–1076 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Xie, W. et al. Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing. Proc. Natl Acad. Sci. USA 107, 10578–10583 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Elshire, R. J. et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6, e19379 (2011). The original description of the GBS method.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Andolfatto, P. et al. Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Genome Res. 21, 610–617 (2011). The original description of MSG, describing the hidden Markov model approach to imputation of genotypes.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Krueger, F., Andrews, S. R. & Osborne, C. S. Large scale loss of data in low-diversity Illumina sequencing libraries can be recovered by deferred cluster calling. PLoS ONE 6, e16607 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Harismendy, O. et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 10, R32 (2009). A useful study of the accuracy of variant detection in populations on the Roche Genome Sequencer, Illumina Genome Analyzer and Applied Biosystems SOLiD platforms.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Quail, M. A. et al. A large genome center's improvements to the Illumina sequencing system. Nature Methods 5, 1005–1010 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. DeAngelis, M. M., Wang, D. G. & Hawkins, T. L. Solid-phase reversible immobilization for the isolation of PCR products. Nucleic Acids Res. 23, 4742–4743 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Etter, P. D., Bassham, S., Hohenlohe, P. A., Johnson, E. & Cresko, W. A. SNP discovery and genotyping for evolutionary genetics using RAD sequencing. in Molecular Methods for Evolutionary Genetics (eds Orgogozo, V. & Rockman, M. V.), Humana Press, New York (in the press).

  55. Li, Y., Sidore, C., Kang, H. M., Boehnke, M. & Abecasis, G. Low coverage sequencing: implications for the design of complex trait association studies. Genome Res. 1 Apr 2011 (doi:10.1101/gr.117259.110).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Futschik, A. & Schlotterer, C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186, 207–218 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Cutler, D. J. & Jensen, J. D. To pool, or not to pool? Genetics 186, 41–43 (2010). A useful discussion of the advantages and disadvantages of pooling samples for SNP calling.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Bradbury, P. J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

    Article  CAS  PubMed  Google Scholar 

  59. Kofler, R. et al. PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE 6, e15925 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Pandey, R. V., Kofler, R., Orozco-terWengel, P., Nolte, V. & Schlötterer, C. PoPoolation DB: a user-friendly web-based database for the retrieval of natural polymorphisms in Drosophila. BMC Genet. 12, 27 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 4 May 2011 (doi:10.1101/gr.110882.110).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Ozsolak, F. & Milos, P. M. RNA sequencing: advances, challenges and opportunities. Nature Rev. Genet. 12, 87–98 (2011).

    Article  CAS  PubMed  Google Scholar 

  63. Barbazuk, W. B. & Schnable, P. S. SNP discovery by transcriptome pyrosequencing. Methods Mol. Biol. 729, 225–246 (2011).

    Article  CAS  PubMed  Google Scholar 

  64. Chepelev, I., Wei, G., Tang, Q. & Zhao, K. Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq. Nucleic Acids Res. 37, e106 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Cánovas, A., Rincon, G., Islas-Trejo, A., Wickramasinghe, S. & Medrano, J. F. SNP discovery in the bovine milk transcriptome using RNA-Seq technology. Mamm. Genome 21, 592–598 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Geraldes, A. et al. SNP discovery in black cottonwood (Populus trichocarpa) by population transcriptome resequencing. Mol. Ecol. Resour. 11 (Suppl. 1), 81–92 (2011).

    Article  CAS  PubMed  Google Scholar 

  67. Nothnagel, M. et al. Statistical inference of allelic imbalance from transcriptome data. Hum. Mutat. 32, 98–106 (2011).

    Article  CAS  PubMed  Google Scholar 

  68. Christodoulou, D. C., Gorham, J. M., Herman, D. S. & Seidman, J. G. Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. Curr. Protoc. Mol. Biol. 94, 4.12.1–4.12.11 (2011).

    Google Scholar 

  69. Kumar, S. & Blaxter, M. L. Comparing de novo assemblers for 454 transcriptome data. BMC Genomics 11, 571 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Bräutigam, A., Mullick, T., Schliesky, S. & Weber, A. P. M. Critical assessment of assembly strategies for non-model species mRNA-Seq data and application of next-generation sequencing to the comparison of C3 and C4 species. J. Exp. Bot. 11 Mar 2011 (doi: 10.1093/jxb/err029).

    Article  PubMed  Google Scholar 

  71. Hedges, D., Guettouche, T., Yang, S. & Bademci, G. Comparison of three targeted enrichment strategies on the SOLiD sequencing platform. PLoS ONE 6, e18595 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Kiialainen, A. et al. Performance of microarray and liquid based capture methods for target enrichment for massively parallel sequencing and SNP discovery. PLoS ONE 6, e16486 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Cheng, Y. et al. Identification of novel SNPs by next-generation sequencing of the genomic region containing the APC gene in colorectal cancer patients in China. OMICS 14, 315–325 (2010).

    Article  CAS  PubMed  Google Scholar 

  74. Teer, J. K. & Mullikin, J. C. Exome sequencing: the sweet spot before whole genomes. Hum. Mol. Genet. 19, R145–R151 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Teer, J. K. et al. Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome Res. 20, 1420–1431 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Stein, L. D. The case for cloud computing in genome informatics. Genome Biol. 11, 207 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Schadt, E. E., Turner, S. & Kasarskis, A. A window into third-generation sequencing. Hum. Mol. Genet. 19, R227–R240 (2010).

    Article  CAS  PubMed  Google Scholar 

  78. Neely, R. K., Deen, J. & Hofkens, J. Optical mapping of DNA: single-molecule-based methods for mapping genomes. Biopolymers 95, 298–311 (2011).

    Article  CAS  PubMed  Google Scholar 

  79. Lynch, M. Estimation of allele frequencies from high-coverage genome-sequencing projects. Genetics 182, 295–301 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Rubin, C. et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587–591 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We are grateful to P. Andolfatto, E. Buckler, W. Cresko, R. Elshire, E. Johnson, S. Mitchell, D. Stern and four anonymous referees for reviewing and discussing drafts of this manuscript. We thank S. Bassham, S. Baxter, C. Eland, K. Gharbi, M. Liu, J. Taggart, and P. Fuentes Utrilla for discussions that have improved our understanding of these methods. J.W.D. and M.L.B. are funded by the UK Natural Environment Research Council, grant NE/H019804/1. P.A.H. and J.M.C. received funding support from the US National Institutes of Health (NIH) grant 1R24GM079486-01A1, the US National Science Foundation grant IOS-0843392 and a Keck Foundation grant to W. Cresko. J.M.C. was also funded by the NIH National Research Service Award Ruth L. Kirschstein postdoctoral fellowship 1F32GM095213-01. P.D.E. was supported by grants R21HG003834 and R21HG006036 from the US National Human Genome Research Institute awarded to E. Johnson.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John W. Davey.

Ethics declarations

Competing interests

J.Q.B. is an employee of Floragenex, Inc., an organization that offers RAD-seq and associated consulting as a commercial service. The other authors declare no competing interests.

Related links

Related links

FURTHER INFORMATION

Paul A. Hohenlohe's homepage

Julian M. Catchen's homepage

Mark L. Blaxter's homepage

Ensembl

Floragenex, Inc. (commercial RAD-seq provider)

Genotyping by sequencing in the Buckler laboratory

Nature Reviews Genetics series on Study designs

Nature Reviews Genetics series on Applications of next-generation sequencing

Software packages for analysing NGS marker data BAMOVA

MSG

PoPoolation

RADtools (within the United Kingdom RAD-seq Wiki page)

Stacks

TASSEL

Glossary

Quantitative trait locus

(QTL). A locus that controls a quantitative phenotypic trait, identified by showing a statistical association between genetic markers surrounding the locus and phenotypic measurements.

Marker-assisted selection

The use of genetic markers to predict the inheritance of alleles at a closely linked trait locus.

Restriction fragment length polymorphism

(RFLP). A fragment-length variant that is generated through the presence or absence of a restriction enzyme recognition site. Restriction sites can be gained or lost by base substitutions, insertions or deletions.

Amplified fragment length polymorphism

(AFLP). A mapping method in which genomic DNA from different strains is PCR amplified using arbitrary primers. DNA fragments that are amplified in one strain, but not the other, are cloned, sequenced and used as polymorphic markers.

Microsatellite

A class of repetitive DNA that is made up of repeats that are 2–8 nucleotides in length. They can be highly polymorphic and are frequently used as molecular markers in population genetics studies.

Optical mapping

A method for creating a map of a genome by stretching DNA in microfluidic channels on a slide for visualization on a fluorescent microscope. The DNA is then digested by restriction enzymes and the sizes of these fragments are inferred by the integrated intensity of the fluorescent intercalator dye.

F ST

(Wright's fixation index). The fraction of the total genetic variation that is distributed among subpopulations in a subdivided population.

Imputation

A statistical method for handling missing data in which the missing values are replaced by estimated values.

Recombinant inbred lines

(RILs). A population of fully homozygous individuals that is obtained through the repeated selfing of F1 hybrids, and that is comprised of 50% of each original parental genome in different combinations.

Hidden Markov model

A statistical approach that is used to estimate a series of hidden states (for example, ancestry at loci along a chromosome). The method is based on observations of the states that have uncertainty (for example, the ancestral assignment of sequence reads) and the expected probability of transitions between states (for example, recombination breakpoints).

Soft ancestry calls

Assigning probabilities to ancestral (for example, parental or grandparental) genotypes, rather than making explicit, 'hard' calls. This approach appropriately propagates uncertainty (which often arises around recombination breakpoints) in individual ancestry assignments, thus enabling a more accurate inference of breakpoint location.

Scaffold

A genomic unit composed of one or more contigs that have been ordered and orientated using end-read information.

Sliding window averaging

The averaging of statistics, such as nucleotide diversity or FST, for all markers in a chosen size of overlapping genomic region (window). When applied across the genome, this method smoothes out variation within regions so that genome-wide patterns can be observed.

lod score

(Base 10 'logarithm of the odds' or 'log-odds'). A statistical estimate of whether two loci are likely to lie near each other on a chromosome and are therefore likely to be inherited together. A lod score of three or more is generally considered to indicate that the two loci are close.

Major histocompatibility complex

(MHC). A complex locus on human chromosome 6p, which comprises numerous genes, including the human leukocyte antigen genes, which are involved in the immune response. MHC molecules bind peptide fragments that are derived from pathogens and display them on the cell surface for recognition by the appropriate T cells. The organizations of the MHC gene clusters are similar in many species.

Solid-phase reversible immobilization

(SPRI). The purification of nucleic acids using magnetic beads, thus avoiding gel extraction, filtration and centrifugation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Davey, J., Hohenlohe, P., Etter, P. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12, 499–510 (2011). https://doi.org/10.1038/nrg3012

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3012

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing