Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspectives
  • Published:

Predicting genetic predisposition in humans: the promise of whole-genome markers

Abstract

Although genome-wide association studies have identified markers that are associated with various human traits and diseases, our ability to predict such phenotypes remains limited. A perhaps overlooked explanation lies in the limitations of the genetic models and statistical techniques commonly used in association studies. We propose that alternative approaches, which are largely borrowed from animal breeding, provide potential for advances. We review selected methods and discuss the challenges and opportunities ahead.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Guttmacher, A. E. & Collins, F. S. Genomic medicine — a primer. N. Engl. J. Med. 347, 1512–1520 (2002).

    Article  CAS  PubMed  Google Scholar 

  2. Dominiczak, A. F. & McBride, M. W. Genetics of common polygenic stroke. Nature Genet. 35, 116–117 (2003).

    Article  CAS  PubMed  Google Scholar 

  3. Maher, B. Personal genomes: the case of the missing heritability. Nature 456, 18–21 (2008).

    Article  CAS  PubMed  Google Scholar 

  4. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).

    Article  CAS  PubMed  Google Scholar 

  7. Goddard, M. E. & Hayes, B. J. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Rev. Genet. 10, 381–391 (2009).

    Article  CAS  PubMed  Google Scholar 

  8. Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics 4th edn (Longman, Harlow, UK, 1996).

    Google Scholar 

  9. Hill, W. G. Understanding and using quantitative genetic variation. Philos. Trans. R. Soc. Lond. B 365, 73–85 (2010).

    Article  Google Scholar 

  10. Fisher, R. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. Earth Sci. 52, 399–433 (1918).

    Article  Google Scholar 

  11. Wright, S. Systems of mating. Parts I.–V. Genetics 6, 111–178 (1921).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Henderson, C. R. Estimation of genetic parameters. Ann. Math. Stat. 21, 309–310 (1950).

    Google Scholar 

  13. Henderson, C. R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).

    Article  CAS  PubMed  Google Scholar 

  14. Meuwissen, T. H., Hayes, B. J. & Goddard, M. E. Prediction of total genetic values using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Habier, D. Fernando, R. L. & Dekkers, J. C. M. The impact of genetic relationships information on genome-assisted breeding values. Genetics 177, 2389–2397 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. González-Recio, O. et al. Non-parametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers. Genetics 178, 2305–2313 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  17. VanRaden, P. M. et al. Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92, 16–24 (2009).

    Article  CAS  PubMed  Google Scholar 

  18. Hayes, B. J., Bowman, P. J., Chamberlain, A. J. & Goddard, M. E. Genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009).

    Article  CAS  PubMed  Google Scholar 

  19. de los Campos, G. et al. Predicting quantitative traits with regression models for dense molecular markers and pedigrees. Genetics 182, 375–385 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Weigel, K. A. et al. Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J. Dairy Sci. 92, 5248–5257 (2009).

    Article  CAS  PubMed  Google Scholar 

  21. Vazquez, A. et al. Predictive ability of subsets of SNP with and without parent average in US Holsteins. J. Dairy Sci. 2010 (doi:10.3168/jds.2010–3335).

  22. Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for non-orthogonal problems. Technometrics 12, 55–67 (1970).

    Article  Google Scholar 

  23. Tibshirani, R. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Series B 58, 267–288 (1996).

    Google Scholar 

  24. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J.R. Stat. Soc. Series B 67, 301–320 (2005).

    Article  Google Scholar 

  25. Park, T. & Casella, G. The Bayesian LASSO. J. Am. Stat. Assoc. 103, 681–686 (2008).

    Article  CAS  Google Scholar 

  26. Wahba, G. Spline Models for Observational Data (Society for Industrial and Applied Mathematics, Philadelphia, 1990).

    Book  Google Scholar 

  27. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn (Springer-Verlag, New York, 2009).

    Book  Google Scholar 

  28. Gianola, D., Fernando, R. L. & Stella, A. Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173, 1761–1776 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Gianola, D. & van Kaam, J. B. Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178, 2289–2303 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Kimeldorf, G. S. & Wahba, G. A correspondence between Bayesian estimation on stochastic process and smoothing by splines. Ann. Math. Stat. 41, 495–502 (1970).

    Article  Google Scholar 

  31. de los Campos, G., Gianola, D. & Rosa, G. J. M. Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. J. Anim. Sci. 87, 1883–1887 (2009).

    Article  CAS  PubMed  Google Scholar 

  32. de los Campos, G., Gianola, D., Rosa, G. J. M., Weigel, K. & Crossa, J. Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces regressions. Genetics Res. 92, 295–308 (2010).

    Article  CAS  Google Scholar 

  33. Shawe-Taylor, J. & Cristianini, N. Kernel Methods for Pattern Analysis (Cambridge Univ. Press, UK, 2004).

    Book  Google Scholar 

  34. Schaid, D. J. Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations. Hum. Hered. 70, 109–131 (2010).

    Article  PubMed  Google Scholar 

  35. Garrick, D. J. The nature, scope and impact of some whole-genome analyses in beef cattle in 9th World Congress on Genetics Applied to Livestock (Leipzig, Germany, 2010).

    Google Scholar 

  36. Long, N. et al. Radial basis function regression methods for predicting quantitative traits using SNP markers. Genetics Res. 92, 209–225 (2010).

    Article  CAS  Google Scholar 

  37. Crossa, J. et al. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 2 Sep 2010 (doi:10.1534/genetics.110.118521).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Piepho, H. P. Ridge regression and extensions for genomewide selection in maize. Crop Sci. 49, 1165–1176 (2009).

    Article  Google Scholar 

  39. Legarra, A., Robert-Granié, C., Manfredi, E. & Elsen, J. M. Performance of genomic selection in mice. Genetics 180, 611–618 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Jannink, J. L., Lorenz, A. J. & Hiroyoshi, I. Genomic selection in plant breeding: from theory to practice. Brief. Funct. Genomics 9, 166–177 (2010).

    Article  CAS  PubMed  Google Scholar 

  41. Goddard, M. E. Genomic selection: prediction of accuracy and maximization of long term response. Genetica 136, 245–257 (2009).

    Article  PubMed  Google Scholar 

  42. Zhong, S., Dekkers, J. C., Fernando R. L. & Jannink, J. L. Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 182, 355–364 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Gianola, D. Theory and analysis of threshold characters. J. Anim. Sci. 54, 1079–1096 (1982).

    Article  Google Scholar 

  44. Holzapfel, C. et al. Genes and lifestyle factors in obesity: results from 12462 subjects from MONICA/KORA. Int. J. Obes. 1–8 (2010).

  45. Seshadri, S. et al. Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA 303, 1832–1840 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Valenzuela, R. K. et al. Predicting phenotype from genotype: normal pigmentation. J. Forensic Sci. Soc. 55, 315–322 (2010).

    Article  CAS  Google Scholar 

  47. Willer, C. J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nature Genet. 41, 25–34 (2008).

    PubMed  Google Scholar 

  48. Zhao, J. et al. The role of obesity-associated loci identified in genome-wide association studies in the determination of pediatric BMI. Obesity 17, 2254–2257 (2009).

    Article  PubMed  Google Scholar 

  49. van Hoek, M. et al. Predicting type 2 diabetes based on polymorphisms from genome-wide association studies: a population-based study. Diabetes 57, 3122–3128 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Wary, N. R., Goddard, M. E. & Visscher, P. M. Prediction of indivual genetic risk to diseases from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).

    Article  Google Scholar 

  51. Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

    CAS  PubMed  Google Scholar 

  52. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genet. 42, 565–569 (2010).

    Article  CAS  PubMed  Google Scholar 

  53. Witten, D. M. & Tibshirani, R. Survival analysis with high-dimensional covariates. Stat. Methods Med. Res. 19, 29–51 (2010).

    Article  PubMed  Google Scholar 

  54. Box, G. E. P. & Draper, N. R. Empirical Model-Building and Response Surfaces (Wiley, New York, 1987).

    Google Scholar 

  55. Cockerham, C. C. An extension of the concept of partitioning hereditary variance for analysis of covariance among relatives when epistasis is present. Genetics 39, 859–882 (1954).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Kempthorne, O. The correlation between relatives in a random mating population. Proc. R. Soc. Lond. B 143, 103–113 (1954).

    Article  Google Scholar 

  57. Lynch, M. & Ritland, K. Estimation of pairwise relatedness with molecular markers. Genetics 152, 1753–1766 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Eding, J. H. & Meuwissen, T. H. Marker based estimates of between and within population kinships for the conservation of genetic diversity. J. Anim. Breed. Genet. 118, 141–159 (2001).

    Article  CAS  Google Scholar 

  59. Visscher, P. M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Hayes, B. J. & Goddard, M. E. Prediction of breeding values using marker-derived relationship matrices. J. Anim. Sci. 86, 2089–2092 (2008).

    Article  CAS  PubMed  Google Scholar 

  61. Feng, R., McClure, L. A., Tiwari, H. K. & Howard, G. A new estimate of family disease history providing improved prediction of disease risks. Stat. Med. 28, 1269–1283 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to K. Grimes, A. Vazquez, Y. Klimentidis and S. Cofield for their helpful comments on this paper.

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

Gustavode los Campos has served as a consultant to CIMMYT and Aviagen; both organizations work with genomic-enabled prediction of genetic values for plant and poultry breeding, respectively. Daniel Gianola serves on the International Scientific Advisory Board of Aviagen. David Allison has received numerous grants, consulting fees and donations from non-profit and for profit entities, some of which may have interests in the genomic prediction of phenotypes.

Supplementary information

Supplementary information S1 (box)

Online Box: Probit Model (PDF 99 kb)

Related links

Related links

FURTHER INFORMATION

Gustav de los Campo's homepage

dbGap

Nature Reviews Genetics series on study designs

Nature Reviews Genetics series on Modelling

Nature Reviews Genetics series on Genome-wide association studies

Glossary

Bayesian estimation

Bayesian inferences are based on the posterior distribution of the unknowns given the data. Following Bayes' rule, this distribution is proportional to the product of the distribution of the data given the unknowns times the prior distribution of the unknowns.

Basis function

In regression analysis, basis functions are functions of predictors used to construct the regression. Polynomials, exponential and logarithm are examples of basis functions commonly used for parametric regressions.

Censored phenotype

Censoring occurs when, for some individuals, the phenotypic information consists of bounds but the actual phenotypic value is unknown. This is commonly observed in longevity studies when, at the time of analysis, some patients may still be alive.

Genomic medicine

The use of genome information in the prevention, diagnosis and treatment of disorders.

Goodness of fit

A measure of how well a model fits the data in a training sample. The log likelihood and R-squared statistic are commonly used measures of goodness of fit. The residual sum of squares is a commonly used measure of lack of fit.

LASSO

The Least Absolute Shrinkage and Selection Operator23 is a penalized estimation method commonly used in regression. The penalty function in LASSO is the sum of the absolute value of the regression coefficients. LASSO performs variable selection and shrinkage simultaneously.

Objective function

The function whose value is minimized or maximized in an optimization problem.

Ordinary least squares

The ordinary least squares estimates of parameters in a regression model are obtained by minimizing the residual sum of squares of the regression.

Over-fitting

A term used to describe the situation in which a model fits the training data well but fails to perform well when used to predict outcomes of a collection of subjects (testing data) that was not used to fit the model.

Parametric regression model

A regression model in which the regression function is set to have a known functional form (for example, a polynomial).

Penalized estimation

Penalized estimates are commonly used in situations in which the number of unknowns is large with respect to the number of records. Penalized estimates are obtained by solving an optimization problem whose objective function embeds a compromise between a goodness-of-fit measure and a measure of model complexity or penalty function.

Quantitative genetic theory

Genetic, mathematical and statistical models used to study traits that are affected by a large number of genes.

Regression model

A statistical model used to describe relationships (for example, a conditional mean) between a response variable and a set of predictors through a regression function involving some parameter(s) to be estimated from data.

Semi-parametric regression model

A regression model in which the regression function is not assumed to be a member of a parametric family.

Shrinkage

In standard estimation methods (for example, maximum likelihood or OLS) estimates are obtained by optimizing with respect to a goodness-of-it or lack-of-fit measure. Relative to these estimates, Bayesian and penalized estimates are shrunk towards some values (typically zero). This prevents over-fitting and, under certain conditions, may reduce mean-squared error of estimates and predictions.

Training data

The data set used to fit a model.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de los Campos, G., Gianola, D. & Allison, D. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat Rev Genet 11, 880–886 (2010). https://doi.org/10.1038/nrg2898

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2898

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing