Introduction

The mitochondrial DNA has revealed to be a useful tool for studying the human settlement of different European regions,1 as well as for studying population migrations inside the subcontinent.2 Nevertheless, the successive Europe settlements have often been deduced from imprecise samples, usually collected from geographically broad regions, without taking into account regional diversities. This is particularly the case for the French regions that our paper deals with. The French territory is centrally located in Western Europe: particularly, it links Northern Europe to the Mediterranean and Iberian areas. Certainly, this peculiar geographic localisation had a strong impact on the human settlement processes as well as on the genetic structure of settled human populations. Nevertheless, the French mitochondrial gene pool remains poorly described and has almost solely been explored with a forensic point of view.3,4 In this way, although previous studies concerned relatively numerous individuals (n=185 in all), those were not regionally localised, and were only surveyed for polymorphisms in their D-Loop. Also, diagnostic RFLPs from coding regions were neglected, albeit crucial for haplogroup discussion. These lacking elements make these published data difficult to be used for further studies of the human settlements of France. Moreover, this lacking information could represent a potential distorting factor in research on the settlement of Europe, especially considering that studies of blood genetic markers have revealed a great inter-regional variability of French populations,5 and complex migratory Prehistory and History.6 The results of these studies outlined the importance of precise geographical information for samples analysed through genetic studies, in order to map the migratory patterns precisely. In order to point out the importance of sampling strategies information in human settlement of France, we report mitochondrial data that can be more properly used in this prospect. Features of the sampling procedure allowed us (i) to test the inter-regional genetic homogeneity of the French mitochondrial gene pool and its implication on potential ancient population movements; (ii) to test the ability of mtDNA in detecting admixture events that are unquestionably established by History and previously detected by classical markers data analyses.6

Material and method

Subjects and sampling procedure

Our sample consisted of 210 maternally unrelated volunteers, originating from five French locations (Figure 1). The sampled area in each region was about 2000 km2. Most of the selected subjects were born in the first part of the 20th century, and pedigree investigations were conducted over the last three (for about 50% of the subjects) or more generations to confirm the regional maternal origin of sampled individuals. These considerations should minimise genetic disruptions related to population movements inside the French territory, movements that became important on and after the 19th century (but that can be neglected for earlier periods).7 Moreover, because big towns and coastal belts are known to have been more implicated in population movements than the rural ones, we verified that the oldest known ancestor of selected subjects originated from rural and noncoastal parts of the region. Then, disruptions related to potential gene flows, anterior to the oldest known ancestor birth, should also be minimised. In order to test the impact of historical migrations on the mtDNA pool of Brittany, we included previously published European data (see Figure 1).

Figure 1
figure 1

Localisation of the analysed samples. Our samples include individuals from the central ‘department’ of Var (VR, n=37), the region of Périgord-Limousin (PL, n=72) straddling three western-central ‘departments’ (northern Dordogne, western Corrèze and southern Haute-Vienne), the regions of Caux and Bray located in Normandy (CB, n=39) and the ‘departments’ of Morbihan (BM, n=40) and Finistère (BF, n=22), both located in Brittany. Previously published data were also included in the analyses, and they concern 139 individuals from England2 (ENG), 70 individuals from Wales (WA), 50 individuals from Cornwall8 (CNW), 101 individuals from western Ireland9 (IRL), 47 individuals from north-eastern France9 (NEF) and 185 French individuals who remain regionally unlocalised3,4,9 (FRA).

mtDNA extraction and genotyping

DNA was extracted from peripheral blood mononuclear cells or hair roots as described elsewhere.10Hypervariable segments (HVS-1 and HVS-2) of the control region were simultaneously amplified by PCR using either L16025 and HV2AS,10 or L15832 (light chain, nps 15838–15858) and HV2AS. Amplifications were performed with standard PCR conditions, except that dTTPs were replaced by dUTPs in the PCR reaction mix (as in forthcoming coding regions amplifications) in order to avoid potential contamination from earlier amplifications (especially in routine PCR laboratory). Indeed, dUTPs allow the use of uracile-N-glucosidase (UNG) that destroys potential PCR products arising from earlier amplifications (by cutting DNA with uracile). The purification of PCR products and sequencing conditions followed are those described in Mogentale-Profizi et al.10 The light strand of HVS-1 and HVS-2 was sequenced, permitting the achievement of HVS-1 and HVS-2 sequences, respectively, between at least nucleotide 16055 and 16410 (precision in Supplemental Material), and between nucleotide 00037 and 00222. Since the traces were of excellent quality and unambiguous, the obtention of only one strand was sufficient. The typing of coding regions polymorphisms was performed as in Mogentale-Profizi et al10 except six additional relevant polymorph sites that were typed by sequencing: nps 4216 (for haplogroup J), 4646 (for U4), 10238 (for N1b), 11719 (for pre-HV), 15904 (for pre-V and V), 15907 (for U2), according to Richards et al9 Moreover, we report here a new method to check 4580 and 12308 positions state, based on allele-specific PCR (primers and conditions reported in Table 1). Our method was tested on samples previously typed by enzymatic restriction and sequencing, and the results were strictly reliable between the three methods. It was strictly reproducible, cheaper than sequencing and faster than enzymatic restriction method; moreover, it does not depend on specific endonucleases. Thus, this method could be extended advantageously to other diagnostic RFLP sites.

Table 1 Typing of nps 4580 and 12 308 with allele-specific PCR

As recommended,12 polymorphisms appearing incongruent were double-checked by amplification, sequencing and/or enzymatic restriction. In this way, we amplified and sequenced HVS-1 and HVS-2 twice in both senses in the case of individuals included in haplogroup H and not presenting 00073 G to A transition, or included in haplogroup U with a 00073 A transition. In these cases, the state of sites 7028 and 12308 was also double-checked. Also, deletions and/or insertions, and haplotypes with unusual polymorphism patterns were double-checked.

Statistical analysis

The DNA sequences alignments and haplotype identification were conducted as in Mogentale-Profizi et al10using CLUSTAL X 1.8113 and MEGA 2.0.14 To maximise the number of British populations in comparison with French data, statistics and analysis were conducted on HVS-1 sequence between sites 16090 and 16365. Statistical analysis focused on historical movements that led to the constitution of the present-day Breton mtDNA gene pool. The population genetic structure of the potentially implicated regions was analysed through AMOVA15 using Arlequin 2.000 package.16 Taking into account the historical framework, AMOVA analyses were applied to different – but plausible – population groupings. Gene diversity (H) and nucleotide diversity (πn)17 were estimated. As samples from Britain and Ireland perceptibly were less diverse than French ones, these statistics were used in a bivariate analysis in order to detect a potential effect on Breton diversities. We also report the estimated mean number of pairwise differences (MNP).18

Results

Molecular data

The mtDNA polymorphisms found in the D-Loop analysis of the French samples, along with the status of the samples at nps 16519 and at the typed diagnostic restriction sites, are reported in Supplemental Material. Seven individuals were found to be heteroplasmic. Six of them present one heteroplasmy at nps 16093. This observation is the somatic manifestation of the particularly fast mutation rate of this site, previously demonstrated in human pedigrees studies and evolutionary analyses.19,20

The haplogroup frequencies observed in each sample are summarised in Table 2. Most of them fall into the variability of European populations (see Simoni et al21) and cannot differentiate the studied samples (although appearing quite heterogenous). However, peculiar frequency distributions across our French sample can be noted. First, haplogroup K frequency appears higher in Morbihan (17.5%) and in Périgord-Limousin (15.3%) than the highest observed European frequency (13.3% in Norwegians and Bulgarians, the average of European K frequency being 5.6% – 21 populations have been taken into account; data from Simoni et al21). Second, haplogroup V appears particularly infrequent, and specially absent from Périgord-Limousin. Third, haplogroup U8 (as defined in Finnilä et al22) is solely present in the sample from Var, and encompasses three different haplotypes in this locality. It is interesting to note that haplogroup U8 is rare in Europe, and absent from Northern Caucasus and Near East.9 Moreover, it has rarely been observed with associated polymorphisms (only three Finnish individuals presenting the same transition at the hypervariable site 16093 were described),22 while 2/3 haplotypes belonging to haplogroup U8 presented associated polymorphisms in Var. Fourth, in north-western France (Normandy and Morbihan), we typed three haplotypes being phylogenetic intermediates between two clusters differentiated by two substitutions: cluster U5a1 and its phylogenetically related subcluster U5a1a (Figure 2). For convenience, we thus grouped these three haplotypes as an intermediate cluster: U5a1a#. And fifth, Périgord-Limousin also presents a high frequency (about 15%) of individuals belonging to haplogroups [T1, J, and (pre-HV)1] considered as introduced in Europe during the Neolithic.1 For comparison, the sample from Var, located in the Mediterranean basin, presents a lower frequency of this type of haplogroups (2.7%; Fisher's exact test: P=0.04)

Table 2 Frequency of haplogroups in French samples
Figure 2
figure 2

Median-joining network23 of French Haplogroup U5a1. Two Norman haplotypes (CB26 and CB33) and one haplotype from Morbihan (BM39) present the derived state at nps 16399, but lack the reversion at nps 16192. Although position 16192 is known to exhibit a high substitution rate, the occurrence of three mutations at this site, on the same lineage and over 24 300 years (the lower limit of the Confidence Interval of the estimated age of haplogroup U5a1)9 is unlikely. They thus have been grouped into haplogroup U5a1a#.

Statistical analysis

AMOVA analyses (Table 3) showed a genetic homogeneity between French, British and Irish populations. However, it has to be noted that combination D (Table 3) produces the highest variation among groups (0.46%) and the lowest variation within groups (0.26%), associated with the strongest significance (respectively, P=0.00489 and 0.00098). Gene diversity, nucleotide diversity and MNP of the studied samples are reported in Table 4. Correlation tests demonstrated that the sample sizes have no influence either on the nucleotide diversity or on the gene diversity (data not shown). On the other hand, we could demonstrate a strong correlation between both diversities (R2Pearson=0.7586, P=0.001; Figure 3). The sample from Morbihan exhibits an intermediate position between samples from the British/Irish Isles and samples from the continent, especially according to its gene diversity. Moreover, its nucleotide diversity appears relatively higher compared with its gene diversity.

Table 3 Results of AMOVA analyses
Table 4 Gene diversity (H), nucleotide diversity (πn) and mean number of pairwise differences (MNP) for each sample
Figure 3
figure 3

Bivariate analysis of H and πn (abbreviations as in Figure 1).

Discussion

Although our French samples could not be statistically differentiated by their haplogroup frequencies, each one harbours some qualitative and quantitative peculiarities that may reflect a rather different history. Albeit located in a region considered to have been largely affected during Neolithic,24 the sample from Var exhibits a pattern that does not appear to have been genetically influenced by potential ‘Neolithic waves’ from the Near East. This observation could be the consequence of human demographical and ecological peculiarities of the Mediterranean coast at the end of the Neolithic period, as proposed by Simoni et al21 or could also reflect a sexual-specific pattern of migration (in favour of men) for this period, as suggested by a study by Semino et al.25 On the contrary, the Périgord-Limousin sample exhibits a quite large Neolithic component (15%), estimated from the frequency of Neolithic-specific haplogroups.1 Then, according to Richards et al9 all European regions experienced a large immigration at the end of the Upper Palaeolithic. In western France (represented here by samples from Brittany and Périgord-Limousin), the high observed frequency of haplogroup K, introduced in Europe from the Near and/or Middle East, suggests that the majority of Late Upper Palaeolithic immigrants belonged to this haplogroup. Finally, two previous and successive studies26,27 detected a re-expansion of populations from the south-western Europe to north-eastern Europe after the last glacial maximum (LGM). Albeit the postglacial re-expansion undoubtedly concerned French regions because of their immediate proximity, haplogroup V appeared to be very rare in our samples and absent from Périgord-Limousin (although this region is included in the northern part of the refugium formed during the LGM). Thus, we postulate that the traces of the re-expansion have been at least partially erased by posterior migrations, on and/or after the Mesolithic period. The relatively great Neolithic component (see above) in this region could partially explain the erasure of immediate postglacial pattern.

Brittany was affected by well-established historical migrations from Britain and Ireland. Earliest massive historical migrations from Britain occurred during the fourth century, involving about 40 000 individuals. Those migrations essentially concerned soldiers, women being largely in minority.28 A second migration wave occurred during the sixth and seventh centuries, involving this time a real settlement from Britain in Brittany, the immigrants becoming numerous inside native Continentals.28 The principal immigration that followed involved Irish people (during the War of Ireland, 1641–1651), bringing small Irish groups (about 35 000 persons in all) fleet to Lower-Brittany29 (present-day Finistère, most of Morbihan territory and western Côtes d'Armor). Marriage, birth and death certificates from this period indicate integration of these Irish immigrants with the native people.29 All together, historical elements show that the Breton gene pool has undoubtedly been affected by British and Irish genes. This case appeared opportune for us to test some methods that could enable the detection of such an admixture event. As inferring admixture proportions and processes from single-locus analysis does not seem feasible for only slightly differentiated human populations,30 indirect methods, which could be less dependent on the differentiation degree of parental populations, should be proposed in such a homogenous context. The results of the bivariate analysis of H and πn are compatible with the historical data of successive British and Irish migrations into Brittany. Indeed, the position of the sample from Morbihan with regard to the regression line is what can be expected from an admixed population (Figure 3): the gene diversity of population resulting from the admixture event should average gene diversities of parental populations, whereas the nucleotide diversity should tend to increase. Indeed, because the probability that two separated populations (even since a relatively short time) generated by mutation and/or fixed by drift the same alleles is low, the gene pool combination of these populations will result in a higher nucleotide diversity compared to the previous separated populations ones. Here, the nucleotide diversity of Morbihan is higher than the ones measured in insular populations and a nonlocalised French sample, and is quite high with regard to the gene diversity of the entire sample. Indeed, successive migrations of British/Irish populations (usually less diverse than the French ones, see Figure 3 and Table 4) appear to have reduced the gene diversity of the indigenous French mitochondrial gene pool in Morbihan. Moreover, AMOVA analyses showed that Morbihan is more similar to British/Irish populations than French ones (see combination D, Table 3), supporting the historically reported migrations. Considering results from the bivariate and AMOVA analyses, this Breton sample appears to stem from an admixture event. Also, if one considers that some migrations from the Britain and Ireland involved only (military) men,28,31 the detected signal (through mitochondrial diversity) would solely be an under-estimation of the real flows impact. In contrast, the Breton sample from Finistère presents a different position from the Morbihan sample. Finistère is located at the extremity of the distribution of the bivariate analysis. Neither AMOVA nor the bivariate analysis results permit to show that this population stems from the admixture between British/Irish and French populations. This observation moderates the impact of immigrations from Britain and Ireland on the mitochondrial gene pool of Brittany. These migrations should have affected the entire Lower-Brittany mtDNA pool; nonetheless, our results tend to show that mtDNA from Finistère was less, until not, affected by the historical migrations considered. Then, these results underline the importance of the sampling when micro-geographical problems are inquired. Indeed, if only people from Finistère were sampled, or if Brittany was considered as a whole and that Bretons were sampled without care of their precise regional origins, historically well-documented immigrations in Brittany would not have been detected.

Our French study has also permitted to suggest information on the spread and/or origin of specific European haplogroups. This is the case of haplogroup U5a1a, for which the regional origin remains unclear, and that is reported to have differentiated very recently (between 2200 and 12 800 BP).9 The fact that three different haplotypes belonging to U5a1a# were found in Normandy and Morbihan suggests that this haplogroup appeared in north-western Europe: it would have subsequently spread across Europe from this centre. One part of U5a1a would have reached the Near East, and certainly subsequently back-migrated into Europe, as suggested by Richards et al9 Then, haplogroup U8 was found principally in Alpine and north-eastern European regions.9,22 Although it was previously described as diversity-free in most cases, U8 exhibits diversity in our Var sample. The age estimate of this cluster is 44 400±27 010 years BP (calculated as in Saillard et al32 with Network 3.0 – http://www.fluxus-engineering.com, including all the 12 haplotypes previously described,9,22 and the three haplotypes from Var), and clearly shows that this cluster is of Palaeolithic origin. Haplogroup U8 has most probably differentiated on the west central Mediterranean coast during the Upper Palaeolithic, and has subsequently migrated into north-eastern Europe (possibly via Alpine region), during the population re-expansions that occurred after the LGM. Thus, this cluster suggests that the French Mediterranean coast acted as a refugium during the LGM, and indicates that, as well as haplogroup V, some other clusters have differentiated in Western Europe during the Upper Palaeolithic and could be taken into account to evaluate the impact of the post-LGM expansion on the present-day European gene pool.

Our results showed that the quality of the sampling procedure can act on the qualitative and quantitative results obtained from the analysis of mtDNA data. Dealing with recent migratory events, we could suggest that a sample better reflects the regional history when it is well localised geographically, and if it does take into account the individuals maternal ancestry. In this way, we can conclude that the previous nonlocalised French sample, used to represent the French population, is not suitable to reconstruct population flows into French territory. Owing to its heterogeneous migratory history, data from various regional samples are necessary to elucidate successive settlements of the French territory, and more generally, settlements of Europe. We can also conclude that precise regional sampling can permit the detection of dispersed or very localised haplotypes, and can help to better understand the phylogeographical history of some clusters (here U5a1a and U8). All together, these elements are essential for understanding the history of the European mitochondrial gene pool. Finally, studies dealing with other DNA markers would enrich and precise the picture given by mtDNA, allowing to access to paternal parameters (with Y-chromosome polymorphisms) and to the resultant of the interaction of paternal and maternal gene pools (with autosomic markers). For instance, concerning Var, it would be interesting to test whether, contrary to mtDNA, the analysis of Y-chromosome polymorphisms would permit the detection of a Neolithic impact on the French Mediterranean coast.