Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Assessing the function of genetic variants in candidate gene association studies

Key Points

  • A large proportion of genomic variation might be associated with human disease phenotypes. New approaches should improve our understanding of this variation and its functional significance.

  • Because there is insufficient guidance for molecular epidemiologists to optimally select variants for an epidemiology study, methods that prioritize the choice of genetic variants need to be included in molecular epidemiological studies.

  • Laboratory-based evidence about the functional significance of a genetic variant can provide the strongest evidence for the functional role of a genetic variant, but these studies are difficult to mount on the scale that may be required for characterizing all human genetic variants, and their results might not always reflect in vivo genotype function in humans.

  • Novel approaches to assessing the function of genetic variants are required to provide molecular epidemiological association studies with the information that is required to choose candidate genes and variants in these genes for association studies, and to optimally interpret the results of observed associations.

  • Novel experimental approaches that might be informative, include the HaploCHIP method, gene tagging, gene trapping, N-ethyl-N-nitrosourea (ENU) mutagenesis, proteomics methods and evaluation of epigenetic mechanisms that assess genotype function.

  • Non-laboratory-based approaches for assessing SNP function should also be considered. These approaches include those that use evolutionary similarity or structural effects of genetic variants, such as those implemented in the SIFT, PolyPhen or CODDLE algorithms.

  • Population and evolutionary genetics data can be directly incorporated into association studies to optimize the identification of genes that are causally related to disease risk. The 'set association' approach is one example of this method.

  • We propose an algorithm that can be useful in determining when a genetic variant might be functionally significant. This algorithm can be applied in the design and interpretation of molecular epidemiological association studies to maximize the potential that truly causative associations can be identified.

Abstract

Knowledge of inherited genetic variation has a fundamental impact on understanding human disease. Unfortunately, our understanding of the functional significance of many inherited genetic variants is limited. New approaches to assessing functional significance of inherited genetic variation, which combine molecular genetics, epidemiology and bioinformatics, promise to enhance reproducibility and plausibility of associations between genotypes and disease.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: In vitro studies of the effect of CYP3A4*1B compared with CYP3A4*1A.
Figure 2: Relationship of in silico indices of SNP function and associations (measured by log2-transformed odds ratios) taken from the literature.

Similar content being viewed by others

References

  1. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).

    Article  CAS  PubMed  Google Scholar 

  2. Salisbury, B. A. et al. SNP and haplotype variation in the human genome. Mutat. Res. 526, 53–61 (2003).

    Article  CAS  PubMed  Google Scholar 

  3. Schneider, J. A. et al. DNA variability of human genes. Mech. Ageing Dev. 124, 17–25 (2003).

    Article  CAS  PubMed  Google Scholar 

  4. Schork, N. J., Fallin, D. & Lanchbury, J. S. Single nucleotide polymorphisms and the future of genetic epidemiology. Clin. Genet. 58, 250–264 (2000).

    Article  CAS  PubMed  Google Scholar 

  5. Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).

    Article  CAS  PubMed  Google Scholar 

  6. Zhu, Y. et al. An evolutionary perspective on SNP screening in molecular cancer epidemiology. Cancer Res. 64, 2251–2257 (2004). The first comprehensive evaluation and comparison of SIFT and PolyPhen algorithms in molecular epidemiological association studies.

    Article  CAS  PubMed  Google Scholar 

  7. Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182 (2003). A comprehensive evaluation of the consistency of association studies that demonstrates the need for functional correlates in achieving consistency in association study results.

    Article  CAS  PubMed  Google Scholar 

  8. Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genet. 33 (Suppl.), 228–237 (2003).

    Article  CAS  PubMed  Google Scholar 

  9. Stoilov, P. et al. Defects in pre-mRNA processing as causes of and predisposition to diseases. DNA Cell Biol. 21, 803–818 (2002).

    Article  CAS  PubMed  Google Scholar 

  10. Knight, J. C. Functional implications of genetic variation in non-coding DNA for disease susceptibility and gene regulation. Clin. Sci. (Lond.) 104, 493–501 (2003).

    Article  CAS  Google Scholar 

  11. Li, A. P., Kaminski, D. L. & Rasmussen, A. R. Substrates of human hepatic cytochrome P450 3A4. Toxicology 104, 1–8 (1995).

    Article  CAS  PubMed  Google Scholar 

  12. Hashimoto, H. et al. Gene structure of CYP3A4, an adult-specific form of cytochrome P450 in human livers and its transcriptional control. Eur. J. Biochem. 218, 585–595 (1993).

    Article  CAS  PubMed  Google Scholar 

  13. Rebbeck, T. R., Jaffe, J. M., Walker, A. H., Wein, A. J. & Malkowicz, S. B. Modification of clinical presentation of prostate tumors by a novel genetic variant in CYP3A4. J. Natl Cancer Inst. 90, 1225–1229 (1998).

    Article  CAS  PubMed  Google Scholar 

  14. Paris, P. L. et al. Association between a CYP3A4 genetic variant and clinical presentation in African-American prostate cancer patients. Cancer Epidemiol. Biomarkers Prev. 8, 901–905 (1999).

    CAS  PubMed  Google Scholar 

  15. Felix, C. A., et al. Association of CYP3A4 genotype with treatment-related leukemia. Proc. Natl Acad. Sci. 95, 13176–13181 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Kadlubar, F. F. et al. The putative high activity variant, CYP3A4*1B, predicts the onset of puberty in young girls. Cancer Epidemiol. Biomarkers Prev. 12, 327–331 (2003).

    CAS  PubMed  Google Scholar 

  17. Lai, J., Vesprini, D., Chu, W., Jernstrom, H. & Narod, S. A. CYP gene polymorphisms and early menarche. Mol. Genet. Metab. 74, 449–457 (2001).

    Article  CAS  PubMed  Google Scholar 

  18. Jernstrom, H. et al. Genetic factors related to racial variation in plasma levels of insulin-like growth factor-1: implications for premenopausal breast cancer risk. Mol. Genet. Metab. 72, 144–154 (2001).

    Article  CAS  PubMed  Google Scholar 

  19. Lamba, J. K. et al. Common allelic variants in cytochrome P4503A4 and their prevalence in different populations. Pharmacogenetics 12, 121–132 (2002).

    Article  CAS  PubMed  Google Scholar 

  20. Westlind, A., Lofberg, L., Tindberg, N., Andersson, T. B. & Ingelman-Sundberg, M. Interindividual differences in hepatic expression of CYP3A4: relationship to genetic polymorphism in the 5′-upstream regulatory region. Biochem. Biophys. Res. Commun. 259, 201–205 (1999).

    Article  CAS  PubMed  Google Scholar 

  21. Amirimani, B., Walker, A. H., Weber, B. L. & Rebbeck, T. R. Response: re: modification of clinical presentation of prostate tumors by a novel genetic variant in CYP3A4. J. Natl Cancer Inst. 91, 1588–1590 (1999).

    Article  CAS  PubMed  Google Scholar 

  22. Ando, Y. et al. Re: modification of clinical presentation of prostate tumors by a novel genetic variant in CYP3A4. J. Natl Cancer Inst. 91, 1587–1590 (1999).

    Article  CAS  PubMed  Google Scholar 

  23. Spurdle, A. B. et al. The CYP3A4*1B polymorphism has no functional significance and is not associated with risk of breast or ovarian cancer. Pharmacogenetics 12, 355–366 (2002).

    Article  CAS  PubMed  Google Scholar 

  24. Floyd, M. D. et al. Genotype-phenotype associations for common CYP3A4 and CYP3A5 variants in the basal and induced metabolism of midazolam in European- and African-American men and women. Pharmacogenetics 13, 595–606 (2003).

    Article  CAS  PubMed  Google Scholar 

  25. Amirimani, B. et al. Transcriptional activity effects of a CYP3A4 promoter variant. Environ. Mol. Mutagen. 42, 299–305 (2003).

    Article  CAS  PubMed  Google Scholar 

  26. Hamzeiy, H., Bombail, V., Plant, N., Gibson, G. & Goldfarb, P. Transcriptional regulation of cytochrome P4503A4 gene expression: effects of inherited mutations in the 5′-flanking region. Xenobiotica 33, 1085–1095 (2003).

    Article  CAS  PubMed  Google Scholar 

  27. Jeon, J. & An, G. Gene tagging in rice: a high throughput system for functional genomics. Plant Sci. 161, 211–219 (2001).

    Article  CAS  PubMed  Google Scholar 

  28. Cecconi, F. & Meyer, B. I. Gene trap: a way to identify novel genes and unravel their biological function. FEBS Lett. 480, 63–71 (2000).

    Article  CAS  PubMed  Google Scholar 

  29. Adams, M. D. ENU mutagenesis for pharma. Drug Discov. Today 8, 199–200 (2003)

    Article  PubMed  Google Scholar 

  30. Lee, Y. S. & Mrksich, M. Protein chips: from concept to practice. Trends Biotechnol. 20 (Suppl.), S14–18 (2002).

    Article  CAS  PubMed  Google Scholar 

  31. Nikaido, I. et al. EICO (expression-based imprint candidate organizer): finding disease-related imprinted genes. Nucleic Acids Res. 32 (database issue), D548–551 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Knight, J. C., Keating, B. J., Rockett, K. A. & Kwiatkowski, D. P. In vivo characterization of regulatory polymorphisms by allele-specific quantification of RNA polymerase loading. Nature Genet. 33, 469–475 (2003). The authors report a new method and application of experimental approaches to assessing genotype function.

    Article  CAS  PubMed  Google Scholar 

  33. Fay, J. C., Wyckoff, G. J. & Wu, C. I. Positive and negative selection on the human genome. Genetics 158, 1227–1234 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Akey, J. M., Zhang, G., Zhang, K., Jin, L. & Shriver, M. D. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12, 1805–1814 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Feder, J. N. et al. A novel MHC class I-like gene is mutated in patients with hereditary haemochromatosis. Nature Genet. 13, 399–408 (1996).

    Article  CAS  PubMed  Google Scholar 

  36. Nielsen, D. M., Ehm, M. G. & Weir, B. S. Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. Am. J. Hum. Genet. 63, 1531–1540 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hoh, J., Wille, A. & Ott, J. Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Res. 11, 2115–2119 (2001). The authors propose a novel approach to association studies that incorporates both association and population genetics information in identifying disease genes, including the possibility of genome-wide associations.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Perutz, M. F. Structure and function of haemoglobin. I. A tentative atomic model of horse oxyhaemoglobin. J. Mol. Biol. 13, 646–668 (1965).

    Article  CAS  Google Scholar 

  39. Wang, Z. & Moult, J. Three-dimensional structural location and molecular functional effects of missense SNPs in the T cell receptor Vβ domain. Proteins 53, 748–757 (2003).

    Article  CAS  PubMed  Google Scholar 

  40. Wang, Z. & Moult, J. SNPs, protein structure, and disease. Hum. Mutat. 17, 263–270 (2001).

    Article  PubMed  Google Scholar 

  41. Chasman, D. & Adams, R. M. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J. Mol. Biol. 307, 683–706 (2001).

    Article  CAS  PubMed  Google Scholar 

  42. Ferrer-Costa, C., Orozco, M. & de la Cruz, X. Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J. Mol. Biol. 315, 771–786 (2002).

    Article  CAS  PubMed  Google Scholar 

  43. Saunders, C. T. & Baker, D. Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J. Mol. Biol. 322, 891–901 (2002).

    Article  CAS  PubMed  Google Scholar 

  44. Herrgard, S. et al. Prediction of deleterious functional effects of amino acid mutations using a library of structure-based function descriptors. Proteins 53, 806–816 (2003).

    Article  CAS  PubMed  Google Scholar 

  45. Miller, M. P. & Kumar, S. Understanding human disease mutations through the use of interspecific genetic variation. Hum. Mol. Genet. 10, 2319–2328 (2001).

    Article  CAS  PubMed  Google Scholar 

  46. Koref, M. E. S., Gangeswaran, R., Koref, I. P. S., Shanahan, N. & Hancock, J. M. A phylogenetic approach to assessing the significance of missense mutations in disease genes. Hum. Mutat. 22, 51–58 (2003).

    Article  CAS  Google Scholar 

  47. Krishnan, V. G. & Westhead, D. R. A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function. Bioinformatics 19, 2199–2209 (2003).

    Article  CAS  PubMed  Google Scholar 

  48. Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001). An outline of the SIFT approach to assessing missense variant function using evolutionary similarity.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Ng, P. C. & Henikoff, S. Accounting for human polymorphisms predicted to affect protein function. Genome Res. 12, 436–446 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Ramensky, V., Bork, P. & Sunyaev, S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 30, 3894–3900 (2002). An outline of the PolyPhen methodology for using evolutionary and structure data to assess SNP function.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Fleming, M. A., Potter, J. D., Ramirez, C. J., Ostrander, G. K. & Ostrander, E. A. Understanding missense mutations in the BRCA1 gene: an evolutionary approach. Proc. Natl Acad. Sci. USA 100, 1151–1156 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. National Institutes of Health. The ENCODE Project: ENCyclopedia Of DNA Elements [online], <http://www.genome.gov/10005107> (2003).

  54. Rogan, P. K., Svojanovsky, S. & Leeder, J. S. Information theory-based analysis of CYP2C19, CYP2D6 and CYP3A5 splicing mutations. Pharmacogenetics 13, 207–218 (2003).

    Article  CAS  PubMed  Google Scholar 

  55. Pagani, F. & Baralle, F. E. Genomic variants in exons and introns: identifying the splicing spoilers. Nature Rev. Genet. 5, 389–396 (2004).

    Article  CAS  PubMed  Google Scholar 

  56. Sunyaev, S., Ramensky, V. & Bork, P. Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet. 16, 198–200 (2000).

    Article  CAS  PubMed  Google Scholar 

  57. Sunyaev, S. et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001).

    Article  CAS  PubMed  Google Scholar 

  58. Schuetz, E. G. Lessons from the CYP3A4 promoter. Mol. Pharmacol. 65, 279–281 (2004).

    Article  CAS  PubMed  Google Scholar 

  59. Zeigler-Johnson, C. M. et al. Ethnic differences in the frequency of prostate cancer susceptibilty alleles at SRD5A2 and CYP3A4. Hum. Hered. 54, 13–21 (2002).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Some of the work discussed in this review was supported by grants from the Public Health Service and the University of Pennsylvania Cancer Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Timothy R. Rebbeck.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

DATABASES

Entrez Gene

CYP3A4

TNF

FURTHER INFORMATION

CODDLE

PolyPhen

SIFT

Glossary

LINKAGE DISEQUILIBRIUM

The observation that two or more alleles, usually at loci that are physically close together on a chromosome, are not inherited independently but are observed to occur together more frequently than predicted under Mendel's law of independent assortment.

NIFEDIPINE

A calcium-blocker drug (also called Procardia) that was one of the first drugs recognized to be metabolized by CYP3A4, and for which a regulatory element specific to the CYP3A4 gene was named.

MENARCHE

The first occurrence of menstruation in a woman.

HARDY-WEINBERG PROPORTIONS

The binomial distribution of genotypes (that is, frequencies of genotypes AA, Aa and aa will be p2, 2pq, and q2, respectively, where p is the frequency of allele A, and q is the frequency of allele a) that result in a population when there are no external pressures that cause deviations from p2, 2pq and q2.

TEST STATISTIC

A quantity whose value is used to decide whether or not the null hypothesis should be rejected, usually based on quantities computed using observed data.

RESTENOSIS

The constriction, narrowing or blockage of a coronary artery after an initial treatment such as angioplasty aimed at removing this blockage.

ANGIOPLASTY

An operation that is used to repair a damaged blood vessel or unblock a coronary artery.

ADMIXTURE

Combining two or more populations into a single group. Combining two populations has implications for studies of genotype–disease associations if the component populations have different genotypic distributions.

CELL CYCLE CHECKPOINTS

Steps in the normal sequence of development and division in the cell. Disruption can lead to uncontrolled cell growth, and possibly cancer.

ODDS RATIO

A measure of relative risk that is usually estimated from case control studies.

TYPE I ERROR

Incorrectly rejecting a null hypothesis when the null hypothesis is correct. Similarly, the false positive rate.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rebbeck, T., Spitz, M. & Wu, X. Assessing the function of genetic variants in candidate gene association studies. Nat Rev Genet 5, 589–597 (2004). https://doi.org/10.1038/nrg1403

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1403

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing