Introduction

Sequence variants on the male-specific region of the human Y chromosome have been useful in examining the evolution and migration patterns of world populations, including Pakistan (Qamar et al. 2002). To further delineate the population sub-structure in Pakistan, 8.5 kb of Y-chromosomal DNA were screened in 93 Pakistani and 2 African samples for novel single nucleotide polymorphisms (SNPs) using denaturing high-performance liquid chromatography (DHPLC).

Materials and methods

Using sequences available in public databases, the non-recombining portion of the human Y chromosome was screened in silico. RepeatMasker2 software (Smit 1996) was used to exclude human repeat DNA sequences and primers were designed to amplify 150–700 bp of unique male sequences using Primer3 software (Rozen and Skaletsky 2000). Heteroduplex analyses were carried using a WAVE DNA Fragment Analysis System (Transgenomic, Crewe, UK) as described elsewhere (Underhill et al. 2000). Novel Y-SNPs were identified by the appearance of two or more peaks in the elution profiles and confirmed by DNA sequencing. The ancestral state of each SNP was determined in two chimpanzee samples. ARMS or RFLP-PCR assays were designed for rapid screening of these novel SNPs in the Pakistani population (supplementary Table 1).

Results

Four novel Y SNPs (PK2-5) were identified in the Pakistani samples and one (PK1) in the Africans (Table 1, Fig. 1). The African individuals belonged to haplogroup A2, that is restricted to southern Africa (Underhill et al. 2000). The PK2 polymorphism was observed in clade C3 Y-chromosomes in the Hazara and Burusho populations at frequencies of 43% and 9%, respectively. PK3, PK4 and PK5 polymorphisms represent new branches (L/-4, O2a1a and R1a1/-d respectively) of the Y phylogeny (Fig. 1). PK3 was found exclusively in the Kalash (23%). PK4 was detected in 4% of the Pathan samples of the AusoKhel sub-tribe. The PK5 transition was found in two unrelated Burusho individuals. This seems to be a recurrent mutation, because it was detected in the chimpanzee samples but was absent from a gorilla sample.

Table 1 Description of the novel Y SNPs
Fig. 1
figure 1

Y chromosome phylogeny indicating the positions (black arrows) of the five novel Y-SNPs. The name of each haplogroup is given at the tip of lineage along with its frequency (parenthesis) in 869 Pakistani individuals. The haplogroup nomenclature is according to the Y Chromosome Consortium (2002) and Jobling and Tyler-Smith (2003). The name of each polymorphism is shown along the branches

Discussion

In this study, five novel Y-SNPs were identified. The PK2 polymorphism, found in the Burusho and Hazara, distinguished between the northern and southern clade C3 lineages within Pakistan. The derived allele for this marker was found in individuals that were part of a previous study in which the Hazara samples formed a star cluster with 16 different populations (Zerjal et al. 2003). The Burusho samples did not fall within this cluster. We extrapolate that all chromosomes within the star cluster should be derived for this mutation. The Mongolian origin of the Hazara is well documented historically and genetically (Zerjal et al. 2003; Qamar et al. 2002), whereas not much is known about the origins of the Burusho. According to some, the Burusho are descendants of Greek soldiers that came to this area with Alexander the Great. Others describe them as descendants of Dards from Central Asia (Biddulph 1977). Haplogroup C chromosomes are not found in Greece (Francalacci et al. 2003; Rootsi et al. 2004) and studies with autosomal genetic markers suggest the Burusho are genetically closer to their geographical neighbours (Mansoor et al. 2004; Ayub et al. 2003). In an earlier study (Wells et al. 2001) populations from Tajikistan were shown to cluster with the Hunza Burusho. The presence of the PK2 polymorphism at high frequencies in both the Hazara and Burusho (43% and 9%, respectively) suggests that this Y-SNP may be an ancient polymorphism that probably arose in Central Asia before the separation of these two populations. This is corroborated by a BATWING analysis (Wilson and Balding 1998). Incorporating data from 19 Y-SNPs, including the novel Y-SNPs and 16 microsatellite loci, gave TMRCA estimates of between 9,400 (5,200–17,200) YBP for the PK2 polymorphism.

PK3 was found solely in the Kalash population of Pakistan. They inhabit remote valleys in the Hindu Kush Mountains in the Northern Areas of Pakistan. Previous studies (Mansoor et al. 2004; Rosenberg et al. 2002) reported a Eurasian influence in this isolated population. Principal-component analyses (Cavalli-Sforza et al. 1994), carried out using a larger number of markers, in this study clustered the Kalash population with the Yadhavas (data obtained from Wells et al. 2001), a Dravidian speaking group from south India (Fig. 2). This could be because of shared Eurasian ancestry, as demonstrated in the earlier study (Wells et al. 2001) in which the Yadhavas grouped together with other Central Asian populations. Y-STR variation across 16 loci enabled estimation of the median TMRCA as approximately 3,400 (1,400–8,100) YBP, which corresponds to the time of invasion of the Indo-Pak subcontinent by Indo European-speaking tribes from Central Asia (Wolpert 2000). Analysis of the PK3 polymorphism in the Indian population could shed further light on this relationship.

Fig. 2
figure 2

Principal-components analysis of Y haplogroup frequencies in fourteen Pakistani populations, three Indian populations, and a Greek population. Indo-European speakers are indicated by black triangles, Sino-Tibetan speakers by a white triangle, Dravidian speakers by a circle, and the language-isolate Burusho by a diamond. The principal-component plots were constructed using SPSS version 10.0 software and the first and second principal components were plotted using the Microsoft Excel for Windows

PK4 detected in four Pathan samples represents a new branch, O2a1a of clade O (Fig. 1). It is absent in the remaining haplogroup O samples from Pakistan. The four Pathan individuals carrying this SNP belong to a sub tribe (AusoKhel) of one of the major Pathan tribes (Yousafzai) from the Dir area (between 71°20′ and 72°30′E; 34°22′ and 35°50′N) in the North West Frontier Province (NWFP) in Pakistan (Bokhari 1993). The presence of several male lineages in Pathans reflects the diversity that exists in this population, contrary to oral traditions that claim that the Pathans have a single male ancestor (Dorn 1999).

The PK5 transition found in two Burusho individuals represents a new branch, R1a1/-d, on the M17 background. These Burusho individuals, although unrelated and belonging to different villages, shared the same haplotype across all 16 Y-STRs, indicating that this may be a recent population-specific polymorphism. A TMRCA of 350 (14-1790) YBP was obtained for this polymorphism.

In this study 874 Pakistani individuals were analyzed for a large number (82) of Y chromosomal markers that included five novel Y-SNPs. Three of these SNPs identify population-specific lineages within Pakistan. Typing of these novel Y-SNPs (PK2, PK3, and PK4) in Eurasian populations will provide a comprehensive portrait of the complex genetic architecture of extant Pakistani ethnic groups and shed light on their origins and migration patterns.