Elsevier

Gene

Volume 234, Issue 2, 8 July 1999, Pages 177-186
Gene

Review
The essence of SNPs

https://doi.org/10.1016/S0378-1119(99)00219-XGet rights and content

Abstract

Single nucleotide polymorphisms (SNPs) are an abundant form of genome variation, distinguished from rare variations by a requirement for the least abundant allele to have a frequency of 1% or more. A wide range of genetics disciplines stand to benefit greatly from the study and use of SNPs. The recent surge of interest in SNPs stems from, and continues to depend upon, the merging and coincident maturation of several research areas, i.e. (i) large-scale genome analysis and related technologies, (ii) bio-informatics and computing, (iii) genetic analysis of simple and complex disease states, and (iv) global human population genetics. These fields will now be propelled forward, often into uncharted territories, by ongoing discovery efforts that promise to yield hundreds of thousands of human SNPs in the next few years. Major questions are now being asked, experimentally, theoretically and ethically, about the most effective ways to unlock the full potential of the upcoming SNP revolution.

Introduction

The Human Genome Project (HGP) is progressing rapidly, with over one million partial cDNA sequences and approximately 10% of a ‘reference’ genomic sequence now in public databases. With this advance has come an appreciation of the need to also study naturally occurring sequence variations, i.e. to understand human DNA polymorphism, about 90% of which is single nucleotide polymorphism (SNP) (Collins et al., 1998). Significant efforts towards large-scale characterisation of human SNPs have been initiated in the last year or so, a somewhat late stage given that almost two decades ago the original incarnation of SNPs [as restriction fragment length polymorphisms (RFLPs)] clearly indicated the existence of widespread subtle genome variation. Now, the renewed and extensive interest in genome polymorphism signifies a development in human genetics research that will have a major impact upon population genetics, drug development, forensics, cancer and genetic disease research. One consequence of all this activity is that the acronym ‘SNP’ (pronounced ‘S’ ‘N’ ‘P’ or ‘SNiP’) has appeared in many diverse articles and reviews, leading many to ponder “what are SNPs and why all the fuss?”. This review is an attempt to answer these questions.

We can start with a working definition — SNPs are single base pair positions in genomic DNA at which different sequence alternatives (alleles) exist in normal individuals in some population(s), wherein the least frequent allele has an abundance of 1% or greater. Thus, single base insertion/deletion variants (indels) would not formally be considered to be SNPs. In principle, SNPs could be bi-, tri-, or tetra-allelic polymorphisms. However, in humans, tri-allelic and tetra-allelic SNPs are rare almost to the point of non-existence, and so SNPs are sometimes simply referred to as bi-allelic markers (or di-allelic to be etymologically correct). This is somewhat misleading because SNPs are only a subset of all possible bi-allelic polymorphisms (e.g. indels, multiple base variations).

In practice, the term SNP is typically used more loosely than required by the above definition. Single base variants in cDNAs (cSNPs) are usually classed as SNPs since most of these will reflect underlying genomic DNA variants. This, however, ignores the possibility that they may be the result of RNA editing. Genomic DNA indels involving single or multiple bases are commonly discovered in SNP search efforts and so can become deposited in SNP lists and databases. In a similar way, such data-sets also contain SNP variants of less than 1% allele frequency. Complications with the above definition also exist. Specifically, some people might not want to consider disease predisposing single base variants to be SNPs — but the above definition would encompass such things as recessively acting, low penetrance dominant, quantitative trait loci, or risk associated alleles, since all of these will occur in some normal (non-diseased) individuals. Also the ‘some population’ component of the definition is limited by practical challenges of attaining and surveying representative global population samples. Consequently, claims of non-polymorphic sequences should always be accompanied by statements of the actual populations and the numbers of chromosomes tested. Overall, it is therefore apparent that the term ‘SNP’ is being widely and imprecisely used as a catch-all label for many different types of subtle sequence variation. To maintain clarity within this review, I shall restrict myself to the SNP definition given above. I shall also use the term polymorphism consistently and correctly to refer to the set of alleles at a locus, rather than to any one allele alone.

Section snippets

SNP basics

Bi-allelic SNPs comprise four distinct types. Using the abbreviation X⇔Y (X1⇔Y1) to represent allelic nucleotides X and Y of an SNP on one DNA strand, with their base pairing nucleotides X1 and Y1 of the second strand shown in parentheses, then the four SNP alternatives include one transition C⇔T (G⇔A) and three transversions C⇔A (G⇔T), C⇔G (G⇔C), and T⇔A (A⇔T). This four-way classification is valid if one considers each DNA strand to be equivalent, so C⇔T (G⇔A) is an identical ‘mirror image’

SNP discovery and scoring

Significant efforts towards large-scale SNP discovery have now begun, in what started as something of a hectic race between industry and academia. Both camps appreciate the functional importance and practical utility of SNPs, and whilst the former is keen to secure intellectual property protection on them, the latter would generally like them to be available to all as a research tool. With so many SNPs out there to be gathered and no real indication as to which will be the most useful (with the

Population genetics and linkage disequilibrium

Population genetics is the study of the genetic composition and inter-relationships between populations. The major research tool it uses is DNA polymorphism. Unfortunately, population genetics and human molecular genetics have in some ways been running along parallel research paths, with much population genetics effort over the last few decades being directed towards non-human organisms. With the new SNP era, these fields are beginning to interact far more closely. Population genetics

Complex phenotypes and genome variation

The myriad of human phenotype variations one might wish to study are likely to be caused by genetic and non-genetic (environmental) factors, as well as by an interplay between the two and even a sprinkling of chance events. Clearly, many clinical phenotypes do seem to have a considerable genetic component. The underlying genetic factors of relevance will be encoded in the spectrum of genomic variation that is primarily SNPs. Thus, risks of major common diseases such as cancer, cardiovascular

SNP based association studies

If a factor contributes an increased risk for disease occurrence, then that factor should be found at higher frequency in individuals with that disease compared to non-diseased controls, i.e. associated with the phenotype. A non-genetic example would be smoking which is associated with lung cancer (Vial, 1986), and a good genetic example would be the ε4 allele of the apolipoprotein E gene (APOE4) which is associated with Alzheimer's Disease (Strittmatter and Roses, 1995). In common diseases

Conclusions

An SNP revolution has begun which promises to challenge and stimulate DNA technologists, population geneticists, and molecular genetics researchers alike, and should bring them closer together than ever before. The field is new and important, with the consequence that much money is being spent with some very different ideas about what are the best initial experiments to perform. Industry is a major player, but joining forces with academia could be the most effective way to reach their goals, as

Acknowledgments

Thanks are given to J.D. Terwilliger for his critical reading of this manuscript. Input from members of our research group is recognised and appreciated. Generous financial support, provided by the Swedish Medical Research Council, Professor U. Pettersson and the Beijer Foundation, for research activities bringing the author into this field are gratefully acknowledged.

References (73)

  • J.D. Terwilliger et al.

    Linkage disequilibrium mapping of complex disease: fantasy or reality?

    Curr. Opin. Biotechnol.

    (1998)
  • S.A. Tishkoff et al.

    A global haplotype analysis of the myotonic dystrophy locus: implications for the evolution of modern humans and for the origin of myotonic dystrophy mutations

    Am. J. Human Genet.

    (1998)
  • G. Barbujani et al.

    An apportionment of human DNA diversity

    Proc. Natl. Acad. Sci. USA

    (1997)
  • K.H. Buetow et al.

    Reliable identification of large numbers of candidate SNPs from public EST data

    Nature Genet.

    (1999)
  • F.S. Collins

    Positional cloning: let's not call it reverse anymore

    Nature Genet.

    (1992)
  • F.S. Collins et al.

    A DNA polymorphism discovery resource for research on human genetic variation

    Genome Res.

    (1998)
  • J.F. Crow

    Spontaneous mutation as a risk factor

    Exp. Clin. Immunogenet.

    (1995)
  • S. Folstein et al.

    Genetic influences and infantile autism

    Nature

    (1977)
  • T.B. Friedman et al.

    A gene for congenital, recessive deafness DFNB3 maps to the pericentromeric region of chromosome 17

    Nature Genet.

    (1995)
  • M. Gatz et al.

    Heritability for Alzheimer's disease: the study of dementia in Swedish twins

    J. Gerontol.

    (1997)
  • A. Goate et al.

    Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer's disease

    Nature

    (1991)
  • D.A. Greenberg

    Linkage analysis of ‘necessary’ disease loci versus ‘susceptibility’ loci

    Am. J. Human Genet.

    (1993)
  • Z. Gu et al.

    Single nucleotide polymorphism hunting in cyberspace

    Human Mutat.

    (1998)
  • T. Guillaudeux et al.

    The complete genomic sequence of 424 015 bp at the centromeric end of the HLA class I region: gene content and polymorphism

    Proc. Natl. Acad. Sci. USA

    (1998)
  • J.G. Hacia et al.

    Applications of DNA chips for genomic analysis

    Mol. Psychiatry

    (1998)
  • M.F. Hammer et al.

    The role of the Y chromosome in human evolutionary studies

    Evol. Anthropol.

    (1996)
  • R.M. Harding et al.

    Archaic African and Asian lineages in the genetic ancestry of modern humans

    Am. J. Human Genet.

    (1997)
  • S.E. Hodge

    What association analysis can and cannot tell us about the genetics of complex disease

    Am. J. Med. Genet.

    (1994)
  • R.H. Houwen et al.

    Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis

    Nature Genet.

    (1994)
  • W.M. Howell et al.

    Dynamic allele-specific hybridization. A new method for scoring single nucleotide polymorphisms

    Nature Biotechnol.

    (1999)
  • S. Ikuta et al.

    Dissociation kinetics of 19 base paired oligonucleotide–DNA duplexes containing different single mismatched base pairs

    Nucleic Acids Res.

    (1987)
  • L.B. Jorde et al.

    Linkage disequilibrium predicts physical distance in the adenomatous polyposis coli region

    Am. J. Human Genet.

    (1994)
  • B. Kerem et al.

    Identification of the cystic fibrosis gene: genetic analysis

    Science

    (1989)
  • M. Kestila et al.

    Congenital nephrotic syndrome of the Finnish type maps to the long arm of chromosome 19

    Am. J. Human Genet.

    (1994)
  • M.J. Khoury

    Case-parental control method in the search for disease-susceptibility genes

    Am. J. Human Genet.

    (1994)
  • K.K. Kidd et al.

    A global survey of haplotype frequencies and linkage disequilibrium at the DRD2 locus

    Human Genet.

    (1998)
  • Cited by (0)

    View full text