Introduction

The birth of the Lapita Cultural Complex in the southwestern Pacific approximately 3,500 years before present (ybp) enabled the discovery and/or settlement of every island of the world’s largest ocean by 1,000 ybp. The Lapita Cultural Complex evolved from the interaction of two distinct cultures, both capable of horticulture and advanced interisland transport. How these two cultures collided, integrated local and intrusive materials and technology, and generated the innovative Lapita Cultural Complex (Green 1991a) along the north coast and offshore islands of New Guinea has yet to be examined at the local level of intervillage genetic exchange.

Human settlement of New Guinea began in the late Pleistocene. The earliest archeological sites confirm human presence on the island by 40,000 ybp (Groube et al. 1986). Settlers first reached the Bismarck Archipelago between 35–39,000 ybp (Allen et al. 1988; Leavesely and Chappell 2004) and the northern Solomon Islands by 32,000 ybp (Wickler and Spriggs 1988). By 20,000 ybp, humans were transporting resources, including obsidian and tree kangaroos, among New Guinea, the Bismarck Archipelago, and the northern Solomon Islands (Summerhayes and Allen 1993). Furthermore, by 20,000 ybp, the island of Manus was settled (Ambrose 2002) in the first blind ocean crossing in human prehistory, and by 9,000 ybp, horticulture was developed independently in the Highlands of New Guinea (Allen 1970).

The Pleistocene-settled area extending from New Guinea to the Solomon Islands, known collectively as Near Oceania (Green 1991b), is coincident with the region’s current distribution of Papuan languages, a noncohesive group of 23 language families (Ross 2005) consisting of some 950 mutually unintelligible languages (Gordon 2005).

By 3,500 ybp, a second wave of migrants arrived in Near Oceania from Island Southeast Asia (ISEA) (Diamond 1988; Kirch and Hunt 1988). With the arrival of this intrusive group came the introduction of new domestic animals from Southeast Asia, including the dog, pig, chicken, and rat (Matisoo-Smith and Robins 2004; Lum et al. 2006; Matisoo-Smith 2007; Storey et al. 2007). The intrusive Holocene settlers incorporated horticultural (e.g., tree and root crops) and technological achievements of the original Pleistocene settlers, giving rise to the innovative Lapita Cultural Complex that would eventually settle the rest of the Pacific. Evidence of this innovation is seen in the first appearance of Lapita pottery, a red slipped, lime in-filled ceramic thought to be derived from similar forms found in ISEA (Kirch 2000), but stylistically unique to the southwestern Pacific. The earliest Lapita pottery was found in the Bismarck Archipelago (Kirch and Hunt 1988; Summerhayes 2001), but within a few generations, nearly identical pottery was distributed as far east as the Tongan and Samoan archipelagoes (Hendrick 1971; Green 1976; Davidson et al. 1990; Sand 1997). From the Bismarck Archipelago to western Polynesia, the distribution of Lapita pottery coincides with the current distribution of Austronesian languages, a cohesive family of languages with origins in ISEA, but spoken today as far west as Madagascar and as far east as Rapa Nui (Easter Island) (Kirch and Hunt 1988; Bellwood 1989; Spriggs 1989). The Austronesian language family includes more than 1,200 unambiguously related languages (Gordon 2005) thought to have been dispersed with oceanic voyages within the past 5,000 years (Pawley and Green 1973; Bellwood 1991).

Since the mid-Holocene arrival of the presumed Austronesian speakers and the birth and spread of the Lapita Cultural Complex, more recent localized migrations have taken place. Boiken speakers from the Sepik plain are thought to have expanded northward, displacing some previous occupants of the Prince Alexander Mountains and the coastal regions in today’s East Sepik Province, Papua New Guinea (PNG) (Roscoe 1989). Evidence supporting this expansion also comes from linguistic and archeological data (Swadling and Hide 2005).

Previous genetic studies of mitochondrial deoxynucleic acid (DNA) (mtDNA) from Pacific Island populations show that three major haplogroups (B, P, and Q; formerly I, IV, II, respectively; Lum et al. 1994; Lum and Cann 2000) characterize the majority of people from Melanesia (PNG, the Solomon Islands, Vanuatu, New Caledonia, and Fiji). Two of these haplogroups (Q and P) decrease in frequency as one moves east from New Guinea to Polynesia, whereas the third (B) increases along the same trajectory. Haplogroups P and Q account for approximately 98% of lineages in the Highlands of PNG (Stoneking et al. 1990), decreasing in frequency as one moves east through Melanesia, and ultimately accounting for approximately 3.5% (0–13%) of Polynesians (Lum et al. 1994; Redd et al. 1995; Sykes et al. 1995; Lum and Cann 1998, 2000). Haplogroups P and Q are likewise very rare (< 3%) in ISEA (Hill et al. 2007).

Haplogroup B frequencies range from 2% to 45% throughout ISEA (Sykes et al. 1995; Hill et al. 2007), accounting for 34% of Aboriginal Taiwanese (Sykes et al. 1995; Trejault et al. 2005). Throughout coastal and island Melanesia, haplogroup B frequencies range from 10% to 25% (Stoneking et al. 1990; Lum and Cann 1998; Friedlaender et al. 2007) yet are absent in the New Guinea Highlands (Stoneking et al. 1990; Lum and Cann 1998; Forster et al. 2001). In Fiji, the easternmost archipelago of Melanesia, haplogroup B accounts for 64% of lineages (Lum and Cann 1998). Throughout Melanesia, haplogroup B lineages are found predominantly in coastal regions and offshore islands associated with prehistoric Lapita settlements and the current distribution of Austronesian languages (Merriwether et al. 1999). Haplogroup B approaches fixation throughout Polynesia, as well as on the central-eastern Micronesian islands derived from the expansion of the Lapita Cultural Complex (Lum et al 1994; Sykes et al. 1995; Lum and Cann 1998, 2000), with variable frequencies observed in western Micronesia (Lum and Cann 1998, 2000; Lynch et al. 2008).

The formation of the Lapita Cultural Complex and the consequent settlement of the Pacific are the result of the innovations born from the cultural and biological integration of the Pleistocene settlers and the Holocene intruders (Green 1991a). Previous studies compared genetic data from Austronesian and Papuan-speaking populations dispersed across the Pacific to construct scenarios of prehistoric cultural interactions. Here we examined Austronesian- and Papuan-speaking populations separated by realistic prehistoric interaction distances along a transect from off-shore islands to the Sepik Plain. Our study addresses the following questions, previously unanswerable through larger scale studies: (1) Do genetic barriers remain between neighboring Austronesian and Papuan speakers after 3,500 years of potential gene flow? (2) Do linguistic similarities predict genetic affinities? (3) Does geographic proximity predict genetic affinity? (4) To what extent do current genetic patterns reflect the recent migration of Boiken speakers from the East Sepik Plain, and did this migration obscure previous population movements?

Materials and methods

To address these questions, we analyzed the mtDNA sequences from 219 volunteers from eight villages in the Wewak and Yangoru districts of East Sepik Province on the north coast of PNG (Fig. 1a). Two villages were selected from each of four ecological zones (off-shore islands, coast, Prince Alexander Range, and Sepik Plain) along a 69-km transect from 3.15′S to 4′S, with intervillage distances averaging 21 km. Four indigenous languages from three linguistic families, one Austronesian and two Papuan (Torricelli and Sepik-Ramu, Fig. 1b), are spoken in our study area. Two languages are spoken in only one village of our study area: Kairiru is the Austronesian (A) language of St. Martin, and Arapesh is the Torricelli (T) language of Dagua. Abelam is the Sepik-Ramu (S-R) language spoken in Wingei and Witupe. Boiken, also an S-R language, is spoken in four villages, one village from each ecological zone: Walis, Boiken, Warabung, and Kiniambu (Fig. 1).

Fig. 1
figure 1

a Map of Papua New Guinea and the East Sepik Province. The eight villages in our study are located two each in four ecological zones: islands (circles), coast (squares), mountains (triangles), and plains (diamonds). b The topology of the four languages in our study area: Kairiru (white), Arapesh (striped), Abelam (gray), and Boiken (black)

Approval for this study was obtained from the National Department of Health Medical Research Advisory Committee of PNG, the Tokyo Women’s Medical University Ethical Committee, and the Binghamton University Human Subjects Research Review Committee. In addition, the research participants gave consent prior to their inclusion in this study.

Laboratory methods

DNA was extracted from 219 blood samples using chaotropes guanidium thiocyanate with size-fractionated silica particles using an abbreviated protocol from Boom et al. (1990) (Tsukahara et al. 2006). A 405 base pair (bp) segment of the mtDNA hypervariable region 1 (HVS1) was amplified by polymerase chain reaction (PCR) using primers L15996 and H16401 as described previously (Vigilant et al. 1989; Lum and Cann 2000). The PCR product was purified using a Manu03050 Filter Plate (Millipore Corporation, Billerica, MA, USA) and sequenced in both directions with the Big Dye Terminator Kit version 3.1 in an Applied Biosystems 3730xl 96-capillary DNA analyzer (Applied Biosystems, Foster City, CA, USA).

Sequence analysis

In total, 333 bp of unambiguous mtDNA HVS1 sequence was determined from each participant and used for phylogenetic analyses. The 219 sequences (Genbank accession numbers EU312981-EU313199) were aligned using ClustalX (Thompson et al. 1997). A median-joining network was constructed from the sequences using Network 4.500 (www.fluxus-engineering.com), and times of coalescence were estimated using a mutation rate of one substitution every 20,180 years (Richards et al. 1998).

The estimated time of coalescence derived from the median-joining network of all individuals was 123,700 years, a time depth approximately three times greater than the initial peopling of New Guinea (Groube et al. 1986; Hudjashov et al. 2007) and approximately twice the estimated age of the dispersal out of Africa by anatomically modern humans (Mellars 2006). The times of coalescence for both haplogroups P and Q roughly matched the archeological dates of the first arrival of humans to New Guinea, whereas the estimated age for haplogroup B was consistent with the more recent arrival of the presumed Austronesian speakers and the consequent expansion of the descendant Lapita Cultural Complex. Furthermore, the events we are interested in occurred within the past 3,500 years, less than one fifth the average time inferred for single nucleotide polymorphisms (SNP) to evolve in the HVS1 region of mtDNA (Richards et al. 1998). Given the recent timeframe of the formation of the Lapita Cultural Complex and the Boiken expansion, we treated distinct mtDNA HVS1 lineages as alleles and estimated FST genetic distances among villages from frequencies of shared alleles (Lum and Cann 2000).

Statistical analysis

The 219 sequences were first sorted into haplogroups B, P, and Q by suites of shared mtDNA mutations, and the frequencies of these haplogroups were determined (Table 1). Intervillage FST genetic distances were estimated from frequencies of shared unique lineages using Arlequin 2.0 (Schneider et al. 2000) and were used to create a neighbor-joining tree (Fig. 3a) (Felsentein 2005). Individuals were permuted randomly between populations and FST values were recalculated for each of these 10,000 simulated random migrations (Schneider et al. 2000). The extent of gene flow was inferred from the proportion of permutations of individuals between populations, resulting in a higher FST value than observed in the original data set. Seven out of 28 pairs of populations (25%) were not statistically different at the 0.1 level and were thus categorized as having high gene flow (0.44 > P > 0.10) (Lum et al. 2002). Lines representing high intervillage gene flow (P > 0.10) were drawn onto a map of the East Sepik (Fig. 3b). In contrast, populations were considered to be significantly distinct and genetically isolated when FST P values were < 0.05. The pairwise FST matrix is available from the authors upon request.

Table 1 Geographic location, language spoken, number of participants, and haplogroup frequencies of the eight studied villages

The correlation between haplogroup B frequency and south latitude was evaluated using Spearman’s rank order correlation (Fig. 4). Lastly, the 219 sequences were compared to other previously published sequences in GenBank to provide a regional context to our local interpretations. Lineages were considered ancestral if they were found at high frequencies throughout the region and derived if found in low frequencies in restricted geographic locations.

Results

Haplogroups and lineages

Of the 219 participants, 93% (n = 203) were grouped into one of the three haplogroups mentioned earlier: B (11%, n = 23), P (20%, n = 43), and Q (63%, n = 137). Haplogroup frequencies varied substantially across the eight villages (Table 1). Other haplogroups commonly found in ISEA were also present but in low numbers (e.g., E: n = 2, M: n = 1) (Trejault et al. 2005; Meriwether et al. 2005; Hill et al. 2007).

The 219 sequences yielded 91 unique lineages (Table 2). The median-joining network diagram (Fig. 2) of the 91 lineages was characterized by three nonoverlapping monophyletic clusters corresponding to the three main haplogroups (B, P, and Q). Haplogroups B and Q were characterized by at least one high-frequency node (n > 7). These larger nodes are ancestral and were found in either three or all four ecological zones (Table 2). Haplogroup Q also had multiple terminal tips of derived lineages branching from several large serially ancestral nodes. The smaller tips were derived relatively recently from the larger ancestral nodes, and thus expected to be geographically restricted. This pattern of large ancestral nodes with multiple radiating tips is consistent with the constant growth of an expanding population. In contrast, haplogroup P formed a lattice of dispersed nodes of relatively similar frequencies. This suggests that haplogroup P had a restricted or reduced population size over time. The times of coalescence for each individual monophyletic cluster were 42,237 ± 13,818 years for P, 37,052 ± 10,022 years for Q, and 4,386 ± 3,617 years for B. Of the 91 unique lineages, 25 were shared between villages. All villages shared at least one lineage with all other villages, and some village pairs shared more than one. Nine of the 25 shared lineages were inferred to be derived branch tips from the pattern observed in the network diagram (Fig. 2) of small localized tips radiating from larger, more ubiquitous, nodal lineages one or two polymorphisms away. Additionally, eight of the nine lineages did not match any other sequences previously submitted to GenBank, whereas the large ancestral nodes did match sequences previously found outside our study area. These patterns suggest that the network tips were localized and unique and therefore derived and indicative of recent gene flow when shared among populations.

Table 2 List of 91 unique lineages and their distribution by village
Fig. 2
figure 2

Median-joining network of the 91 lineages: haplogroup B (n = 3), haplogroup P (n = 18), haplogroup Q (n = 59), and Others (n = 11). Lineages found in more than one village (shared) have assigned numbers corresponding to Table 2

Inter-village FST values

The village of Boiken (S-R) and the three other Boiken-speaking villages were linked by high gene flow (P > 0.10). Likewise, the two Abelam-speaking villages (Wingei and Witupe) were not significantly distinct (P > 0.05) from each other or from Warabung (S-R), the neighboring ridge village (Fig. 3b), suggesting high gene flow among villages where the same and related Sepik-Ramu languages are spoken (Fig. 3a). St. Martin (A) was significantly different (P < 0.05) from all other villages except the coastal village of Dagua (T).

Fig. 3
figure 3

a Neighbor-joining tree based on FST genetic distances. The scale represents the proportion of difference among populations. b Bold lines correspond to high intervillage gene flow inferred from pairwise FST genetic distances determined to be highly nonsignificant (P > 0.10)

FST genetic distances between neighboring villages on the coast were not significantly distinct (P > 0.05). In contrast, the two Sepik Plain villages only 13 km apart, Witupe (S-R) and Kiniambu (S-R), were significantly distinct from each other (P < 0.05).

Haplogroup B and distance inland

A significant inverse correlation was observed between haplogroup B frequencies and south latitude (r = −0.7, P < 0.05) (Fig. 4), reflecting the higher frequency of this haplogroup in the coastal and offshore island villages and its absence in three of the four inland villages.

Fig. 4
figure 4

Significant inverse correlation (r = −0.7, P < 0.05) between south latitude (distance inland) and haplogroup B frequencies

Discussion

In this study, we addressed several questions pertinent to the formation of the Lapita Cultural Complex, the settlement of the Pacific, and regional migrations within northern New Guinea: (1) Do genetic patterns today still reflect the integration of the Pleistocene and the Holocene populations that gave rise to the Lapita Cultural Complex? (2) Do linguistic similarities predict genetic affinities? (3) Does geographic proximity predict genetic affinity? (4) And to what extend do genetic patterns reflect regional migratory events, such as the seaward expansion of the Boiken speakers of the Sepik Plain?

Gene flow and language

Overall, language was a good predictor of genetic affinities (Figs. 1b, 3b). FST genetic distances between the two Abelam-speaking villages and between the village of Boiken (S-R) and the three other Boiken-speaking villages spread across a 69-km north–south transect were not significantly distinct (P > 0.05), indicating substantial gene flow within both language groups. In contrast, St. Martin (A) was significantly different (P < 0.05) to all Sepik-Ramu villages, suggestive of genetic isolation. These patterns of significant gene flow among the Sepik-Ramu villages and limited gene flow between the Austronesian-speaking village and each Sepik-Ramu village suggest that linguistic similarities have encouraged gene flow, whereas linguistic barriers have restricted gene flow among these populations. Thus, linguistic similarities predicted genetic affinities among these closely spaced villages in addition to populations dispersed at greater distances across the Pacific (Lum and Cann 1998; Lum et al. 1998; Merriwether et al. 1999).

Gene flow and geography

We observed substantial gene flow between neighboring villages in two of our four ecological zones. FST genetic distances between coastal villages of Dagua (T) and Boiken (S-R) and mountain villages of Wingei (S-R) and Warabung (S-R) were not significantly distinct (P > 0.10), suggesting substantial gene flow between populations speaking different languages. The two mountain villages shared six lineages (Table 2), two of which were derived and inferred to result from recent gene flow. In contrast, the FST genetic distances between St. Martin (A) and Walis (S-R), the two island villages, as well as between Kiniambu (S-R) and Witupe (S-R), the plains villages, were significantly distinct (P < 0.01). These last two villages only shared one high frequency, presumably ancestral Q lineage (Table 2). This restricted gene flow between the two villages of the Sepik Plain is consistent with oral traditions that describe generational animosity between the people of Kiniambu (S-R) and Witupe (S-R) (Kazim 1983). This animosity likely reflected resource competition in one of the island’s most densely populated areas and may have fomented the expansion north and seaward by the Sepik Plain’s Boiken speakers. Within our study area, substantial gene flow was observed in half of the ecological zones; thus, genetic affinity was random with respect to geographic proximity contra Terrell et al. (1997, 2001).

Gene flow and the Boiken expansion

The Boiken-speaking villages of Walis (S-R) and Kiniambu (S-R), the two most distant sites of our study (69 km apart), shared five lineages (Table 2) and had the most similar haplogroup frequencies of any two villages (Table 1). These villages were also not significantly distinct (P > 0.05) from the coastal village of Boiken (S-R) as evaluated by the FST genetic distances. These data suggest a genetic continuity among the Boiken-speaking villages, even across great distances and ecological barriers (mountains and ocean). This pattern is consistent with archeological and linguistic data describing an expansion of the Boiken speakers (Roscoe 1989) from the inland plain out to the offshore islands. It is also consistent with an oral account of the settlement of the island of Walis. The story tells of an intrusive movement from the East Sepik Plain [specifically the Kiniambu (S-R) area] across the Prince Alexander Mountains to the coasts and islands, displacing and replacing people along the way (Anurim and Kabaru 1980).

Haplogroup B and south latitude

The significant inverse correlation between haplogroup B frequencies and south latitude (distance inland) (r = −0.7, P < 0.05) is consistent with a coastal dispersal of haplogroup B followed by limited dispersal inland. This pattern parallels the restricted distribution of Austronesian languages on offshore islands and along the north coast of New Guinea. Tsukahara et al. (2006) examined two loci from a subset of the same populations and observed a similar pattern. Similar to haplogroup B, the Southeast Asian Ovalocytosis 27 bp deletion is thought to have been introduced with the mid-Holocene arrival of Austronesian speakers from ISEA (Tsukahara et al. 2006). In both studies, alleles and lineages thought to have been introduced by the mid-Holocene settlers remained predominantly limited to the coast and islands, only occurring inland in Boiken-speaking Kiniambu (S-R). This exception may be the result of gene flow from the offshore islands and coast to the Sepik Plain during the Boiken expansion and of limited gene flow within the Sepik Plain.

Data in local and regional context

A GenBank blast search of our 91 unique lineages identified eight exact matches with previously published sequences. Six haplogroup P and Q lineages had been previously identified in high numbers in other Melanesian populations (Stoneking et al. 1990; Redd and Stoneking 1999; Tomasseo-Ponzetta et al. 2002; Ingman and Gyllensten 2003; Friedlaender et al. 2005). Four of those six P and Q lineages had also been found in small numbers outside Melanesia, specifically in eastern Indonesia (Hill et al. 1997), Polynesia (Lum et al. 1994; Sykes et al. 1995), and Micronesia (Lum et al. 2000). Those Q lineages found outside Melanesia were consistently the larger, presumably ancestral, nodes (n > 7) at the center of the network clusters (Fig. 2). In contrast, only one of the nine shared tips inferred to be recently derived matched a previously published GenBank sequence, a single individual from the Bismarck Archipelago of PNG (Friedlaender et al. 2007).

Two haplogroup B lineages (lineages 1 and 2) were also found in high frequency throughout the Pacific. Lineage 1, referred to as the Polynesian Motif (Redd et al. 1995) and the Austronesian Motif (Lum and Cann 2000), is the most common Micronesian and Polynesian lineage, whereas lineage 2 has also been found across Asia and the Americas. This pattern of dispersed central/ancestral lineages and localized derived tip lineages (private polymorphisms) is consistent with an early expansion across the Pacific followed by more restricted movement and the accumulation of locally specific variants.

Conclusion

Analyses of mtDNA sequences from the East Sepik province of PNG allowed us to reconstruct patterns of gene flow associated with several events taking place over the past 3,500 years. We inferred from our coalescent time estimates that haplogroups P and Q arrived in New Guinea in the Pleistocene, potentially at different times and experiencing distinct demographic histories. In the mid-Holocene, we see the intrusion of haplogroup B from ISEA. This haplogroup, like its associated Austronesian languages, was limited to the coast and islands where it did not replace, but instead integrated with, the gene pool of the Pleistocene settlers. The resulting combination of haplogroups (B, P, and Q) were then dispersed with the Lapita Cultural Complex, resulting in high frequencies of B lineages with traces of Q and P lineages throughout Polynesia and Micronesia (Sykes et al. 1995; Lum et al. 1998, 2000). In the past 3,000 years, the Boiken expansion (Roscoe 1989) created a zone of genetic and linguistic homogeneity from the Sepik Plain to the offshore islands. Localized gene flow, inferred from both shared derived/tip lineages and nonsignificant FST genetic distances (P > 0.05), followed the expansion within two of the four ecological zones (coast and mountain). This localized gene flow was absent in the Sepik plain, as inferred from within zone FST genetic distances (P < 0.01) and from the retention in Kiniambu (S-R) of Boiken-shared lineages from all three haplogroups. Thus, genetic patterns associated with linguistic differences have been maintained for more than 3,000 years allowing us to reconstruct the patterns of biological interactions that resulted in the Lapita Cultural Complex and the settlement of the Pacific.

The patterns observed and the conclusions inferred from this study are based on maternally inherited mtDNA. If greater male gene flow is associated with warfare, differential migration, patterns of land tenure, and regional conquest, as suggested by oral histories of the settlement of Walis (Anurim and Kabaru 1980), we would expect to see even stronger genetic affinities among the Boiken speakers based on paternally inherited Y-chromosome diversity. To explore these possibilities, we have begun analysis of Y-chromosome SNP variations from these and five additional East Sepik Province villages. The forthcoming analyses will help complete the picture of regional gene flow between the two parental populations that birthed the Lapita Cultural Complex.