Genes predict village of origin in rural Europe

O'Dushlaine, Colm; McQuillan, Ruth; Weale, Michael E; Crouch, Daniel J M; Johansson, Åsa; Aulchenko, Yurii; Franklin, Christopher S; Polašek, Ozren; Fuchsberger, Christian; Corvin, Aiden; Hicks, Andrew A; Vitart, Veronique; Hayward, Caroline; Wild, Sarah H; Meitinger, Thomas; van Duijn, Cornelia M; Gyllensten, Ulf; Wright, Alan F; Campbell, Harry; Pramstaller, Peter P; Rudan, Igor; Wilson, James F

doi:10.1038/ejhg.2010.92

Download PDF

Short Report
Published: 23 June 2010

Genes predict village of origin in rural Europe

Colm O'Dushlaine¹,
Ruth McQuillan²,
Michael E Weale³,
Daniel J M Crouch³,
Åsa Johansson⁴,
Yurii Aulchenko⁵,
Christopher S Franklin²,
Ozren Polašek⁶,
Christian Fuchsberger⁷,
Aiden Corvin¹,
Andrew A Hicks⁷,
Veronique Vitart⁸,
Caroline Hayward⁸,
Sarah H Wild²,
Thomas Meitinger^9,10,
Cornelia M van Duijn⁵,
Ulf Gyllensten⁴,
Alan F Wright⁸,
Harry Campbell²,
Peter P Pramstaller⁷,
Igor Rudan^2,6,11 &
…
James F Wilson²

European Journal of Human Genetics volume 18, pages 1269–1270 (2010)Cite this article

1342 Accesses
20 Citations
8 Altmetric
Metrics details

Subjects

Abstract

The genetic structure of human populations is important in population genetics, forensics and medicine. Using genome-wide scans and individuals with all four grandparents born in the same settlement, we here demonstrate remarkable geographical structure across 8–30 km in three different parts of rural Europe. After excluding close kin and inbreeding, village of origin could still be predicted correctly on the basis of genetic data for 89–100% of individuals.

Fine-scale population structure and demographic history of British Pakistanis

Article Open access 10 December 2021

Elena Arciero, Sufyan A. Dogra, … Hilary C. Martin

Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations

Article Open access 30 November 2020

Juba Nait Saada, Georgios Kalantzis, … Pier Francesco Palamara

Local Ancestry Inference in Large Pedigrees

Article Open access 13 January 2020

Heming Wang, Tamar Sofer, … Xiaofeng Zhu

Introduction

High-density genome-wide scans have revealed a considerable degree of geographical structure among populations across the globe.¹ Even within Europe, the continent with the lowest among-population genetic diversity, populations separated by <500 km, such as the English and Irish,² Italians from Lombardy and Tuscany,¹ Finns from neighbouring regions³ and Estonians from different counties,⁴ can be differentiated. However, it remains to be seen whether the populations of villages a few kilometers apart can be distinguished.

Methods

Data are from Illumina Human Hap300 genome-wide scans (Illumina, San Diego, CA, USA). We made use of only the subset of each present-day population with all four grandparents from one location and this was reduced further when exclusions were made on the basis of kinship and inbreeding. First-, second- and third-degree relatives were removed, using genomic sharing estimates based on identity-by-state with a cutoff of 0.1 (in R). We also used more stringent thresholds until no more individuals remained in each subgroup. Principal component analysis (PCA) was performed using Eigensoft⁵ and model-based clustering using Frappe.⁶

We used PCA plus linear discriminant analysis (LDA) to predict subpopulation membership using the genetic data.⁷ We used all single-nucleotide polymorphisms (SNPs) except for those on the X chromosome and regions of high linkage disequilibrium identified in Table 1 of Price et al.⁸ Principal component (PC) scores were obtained according to described methods,⁵ substituting each SNP with the residuals produced by linear regression on the three previous SNPs. A double cross-validation procedure was used to correct for overfitting of the validation set, using the ratio of PC scores between the validation and training samples to calculate a scaling factor. A second validation cycle trained an LDA step, which was used to classify a separate outgroup of individuals, corrected with the scaling factor. The predicted classes (using the first three PCs) were compared with the known geographical groups to calculate an error rate. Written informed consent was obtained from all subjects.

Results

Using 300 000 SNPs and only unrelated, non-inbred individuals with all four grandparents from the same valley, village or isle, we here show the genomic differentiation across 8–30 km in three disparate areas of rural Europe, using genetic information alone (Figure 1). PCA of genomic sharing and model-based clustering (not shown) both allow separation of individuals with grandparents from each of three small Scottish isles, three alpine valleys in the north of Italy and two villages on one small island in Croatia. We used a supervised classification approach to predict subpopulation membership. Highly reliable levels of prediction were achieved with 100, 96 and 89% of individuals correctly classified on the basis of their genetic data for Italy, the Scottish Isles and Croatia, respectively. In each area, when individuals with grandparents from more than one village were included in PCA, they were scattered among the clusters, consistent with mixed origins (not shown).

Discussion

It is interesting to consider the time depth of this differentiation. By removing first-, second- and third-degree relatives, we controlled for structure arising from mating behaviour in the past ∼120 years, and thus focused on the patterning arising earlier than this. The signal of structure is stronger when we include close relatives, but also persists when we use more stringent thresholds for relatedness (not shown), indicating that the patterns arise from ancient shared ancestry within the villages compared to their neighbouring subpopulations. Inbreeding and more-distant shared parental ancestry will also contribute to among-population differences. We used the genomic measure F_ROH⁹ to remove all individuals with total shared parental ancestry equivalent to one second-cousin pedigree loop (F_ROH >0.015), whereas including inbred subjects led to more obvious structuring (not shown). Thus, the observed structure in each of the populations arises partly from recent relatedness and shared parental ancestry and partly from deeper patterns of kinship within the subpopulations, overlaid with mating among them and now with immigrants.

To explore how many markers are required to recover these fine scale patterns of structure, we ranked SNPs by F_ST among villages and repeated the PCA for the most differentiated subsets of 30 000, 10 000, 3000 and 300 SNPs in each population. In all three populations, 10 000 or more high F_ST SNPs recovered an essentially identical picture to that using the full data set, and even 3000 SNPs preserved considerable separation between the villages (not shown). Using only the most discriminating 300 SNPs, little structure could be observed between the two Croatian villages; however, in Scotland and Italy one of the three settlements included in each location remained completely differentiated from the other two (not shown). We note that these results are only indicative of the minimum number of SNPs required to separate these populations, as by necessity SNPs have been selected intrinsically on the basis of F_ST within the same data set, rather than extrinsically from other data.

The slightly lower differentiation of the Croatian villages is not surprising given the fact that they are physically the closest of those considered here, being 8 km apart, with only low hills separating them. In contrast, the settlements in the Scottish Isles and Italy are separated by 15–30 km of sea in the former case, and of 3000 m mountains in the latter, although there are deep connecting valleys.

Such fine-scale differentiation is consistent with the highly non-random nature of human mate choice over the millennia. The average distance between the birthplaces of spouses in rural parts of Finland, the Po valley in northern Italy and the isles of Scotland in the nineteenth century was ∼1.5–3 km.¹⁰ Such close endogamy was probably the norm in rural Europe due to lack of transport or economic opportunities. The breakdown of these isolates has since dramatically altered the population structure.¹¹

The exquisite structure preserved in the genomes of people with all grandparents from the same settlement demonstrates that very detailed genetic and geographical ancestry information can be obtained by genome-wide SNP analyses. This provides novel opportunities, under certain circumstances, to predict the micro-geographical origin of an individual. Genetic association studies that include rural populations must also model this genetic structure, but it is not a barrier to gene discovery.¹² When whole-genome sequences become widely available, the ability to use many more variants, including rarer ones, to identify short shared genomic segments will perhaps allow routine identification of regional ancestries, given a suitably large and carefully collected reference sample.

References

Li JZ, Absher DM, Tang H et al: Worldwide human relationships inferred from genome-wide patterns of variation. Science 2008; 319: 1100–1104.
Article CAS Google Scholar
Novembre J, Johnson T, Bryc K et al: Genes mirror geography within Europe. Nature 2008; 456: 98–101.
Article CAS Google Scholar
Sabatti C, Service SK, Hartikainen AL et al: Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet 2009; 41: 35–46.
Article CAS Google Scholar
Nelis M, Esko T, Mägi R et al: Genetic structure of Europeans: a view from the North-East. PLoS One 2009; 4: e5742.
Article Google Scholar
Patterson N, Price AL, Reich D : Population structure and eigenanalysis. PLoS Genet 2006; 2: e190.
Article Google Scholar
Tang H, Peng J, Wang P, Risch NJ : Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol 2005; 28: 289–301.
Article Google Scholar
Egeland T, Bøvelstad HM, Storvik GO, Salas A : Inferring the most likely geographical origin of mtDNA sequence profiles. Ann Hum Genet 2004; 68: 461–471.
Article CAS Google Scholar
Price AL, Weale ME, Patterson N et al: Long-range LD can confound genome scans in admixed populations. Am J Hum Genet 2008; 83: 132–135.
Article CAS Google Scholar
McQuillan R, Leutenegger AL, Abdel-Rahman R et al: Runs of homozygosity in European populations. Am J Hum Genet 2008; 83: 359–372.
Article CAS Google Scholar
Cavalli-Sforza LL, Menozzi P, Piazza A : The History and Geography of Human Genes. Princeton: Princeton University Press, 1994.
Google Scholar
Rudan I, Carothers AD, Polasek O et al: Quantifying the increase in average human heterozygosity due to urbanisation. Eur J Hum Genet 2008; 16: 1097–1102.
Article CAS Google Scholar
Vitart V, Rudan I, Hayward C et al: SLC2A9 is a newly identified urate transporter influencing serum urate concentration, urate excretion and gout. Nat Genet 2008; 40: 437–442.
Article CAS Google Scholar

Download references

Acknowledgements

We thank the study volunteers in each population. EUROSPAN (European Special Populations Research Network) was supported by European Union FP6 Grant number 018947 (LSHG-CT-2006-01947). For the MICROS study, we thank the primary-care practitioners and the Department of Laboratory Medicine, Hospital of Silandro. The study was supported by the Ministry of Health and Department of Educational Assistance, University and Research of the Autonomous Province of Bolzano and the South Tyrolean Sparkasse Foundation. ORCADES was supported by the Scottish Government Chief Scientist Office and the Royal Society. DNA extractions were performed at the Wellcome Trust Clinical Research Facility (WTCRF) in Edinburgh. We acknowledge the invaluable contributions of L Anderson and the research nurses and the administrative team in Edinburgh. The Croatian study was supported through grants from the Medical Research Council, UK, and the Ministry of Science, Education and Sport of the Republic of Croatia (number 108-1080315-0302). We thank Professor P Rudan and the staff of the Institute for Anthropological Research in Zagreb; genotyping of the Croatian samples was carried out at the WTCRF, Edinburgh. CO’D was funded by a postdoctoral fellowship from the Irish Research Council for Science Engineering and Technology and AC from Science Foundation Ireland.

Author information

Authors and Affiliations

Department of Psychiatry, Neuropsychiatric Genetics Research Group, Trinity College Dublin, Dublin, Ireland
Colm O'Dushlaine & Aiden Corvin
Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
Ruth McQuillan, Christopher S Franklin, Sarah H Wild, Harry Campbell, Igor Rudan & James F Wilson
Department of Medical and Molecular Genetics, King's College London, London, UK
Michael E Weale & Daniel J M Crouch
Department of Genetics and Pathology, Uppsala University, Uppsala, Sweden
Åsa Johansson & Ulf Gyllensten
Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands
Yurii Aulchenko & Cornelia M van Duijn
Gen-Info Ltd, Zagreb, Croatia
Ozren Polašek & Igor Rudan
Institute of Genetic Medicine, European Academy (EURAC), Bozen/Bolzano, Italy
Christian Fuchsberger, Andrew A Hicks & Peter P Pramstaller
MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Edinburgh, UK
Veronique Vitart, Caroline Hayward & Alan F Wright
Institute of Human Genetics, Helmholtz Zentrum München, German Research Centre for Environmental Health, Neuherberg, Germany
Thomas Meitinger
Institute of Human Genetics, Klinikum rechts der Isar, Technische Universität München, München, Germany
Thomas Meitinger
Croatian Centre for Global Health, University of Split, Split, Croatia
Igor Rudan

Authors

Colm O'Dushlaine
View author publications
You can also search for this author in PubMed Google Scholar
Ruth McQuillan
View author publications
You can also search for this author in PubMed Google Scholar
Michael E Weale
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J M Crouch
View author publications
You can also search for this author in PubMed Google Scholar
Åsa Johansson
View author publications
You can also search for this author in PubMed Google Scholar
Yurii Aulchenko
View author publications
You can also search for this author in PubMed Google Scholar
Christopher S Franklin
View author publications
You can also search for this author in PubMed Google Scholar
Ozren Polašek
View author publications
You can also search for this author in PubMed Google Scholar
Christian Fuchsberger
View author publications
You can also search for this author in PubMed Google Scholar
Aiden Corvin
View author publications
You can also search for this author in PubMed Google Scholar
Andrew A Hicks
View author publications
You can also search for this author in PubMed Google Scholar
Veronique Vitart
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Hayward
View author publications
You can also search for this author in PubMed Google Scholar
Sarah H Wild
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Meitinger
View author publications
You can also search for this author in PubMed Google Scholar
Cornelia M van Duijn
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Gyllensten
View author publications
You can also search for this author in PubMed Google Scholar
Alan F Wright
View author publications
You can also search for this author in PubMed Google Scholar
Harry Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Peter P Pramstaller
View author publications
You can also search for this author in PubMed Google Scholar
Igor Rudan
View author publications
You can also search for this author in PubMed Google Scholar
James F Wilson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James F Wilson.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

O'Dushlaine, C., McQuillan, R., Weale, M. et al. Genes predict village of origin in rural Europe. Eur J Hum Genet 18, 1269–1270 (2010). https://doi.org/10.1038/ejhg.2010.92

Download citation

Received: 07 October 2009
Revised: 21 January 2010
Accepted: 09 April 2010
Published: 23 June 2010
Issue Date: November 2010
DOI: https://doi.org/10.1038/ejhg.2010.92

Keywords

This article is cited by

Understanding 6th-century barbarian social organization and migration through paleogenomics
- Carlos Eduardo G. Amorim
- Stefania Vai
- Krishna R. Veeramah
Nature Communications (2018)
The Italian genome reflects the history of Europe and the Mediterranean basin
- Giovanni Fiorito
- Cornelia Di Gaetano
- Giuseppe Matullo
European Journal of Human Genetics (2016)
Fine-scale human genetic structure in Western France
- Matilde Karakachoff
- Nicolas Duforet-Frebourg
- Christian Dina
European Journal of Human Genetics (2015)
Geographic population structure analysis of worldwide human populations infers their biogeographical origins
- Eran Elhaik
- Tatiana Tatarinova
- Janet S. Ziegle
Nature Communications (2014)
Genetic variation in the Sorbs of eastern Germany in the context of broader European genetic diversity
- Krishna R Veeramah
- Anke Tönjes
- Michael Stumvoll
European Journal of Human Genetics (2011)