Abstract

DNA contamination arising from the manipulation of ancient calcified tissue samples is a poorly understood, yet fundamental, problem that affects the reliability of ancient DNA (aDNA) studies. We have typed the mitochondrial DNA hypervariable region I of the only 6 people involved in the excavation, washing, and subsequent anthropological and genetic study of 23 Neolithic remains excavated from Granollers (Barcelona, Spain) and searched for their presence among the 572 clones generated during the aDNA analyses of teeth from these samples. Of the cloned sequences, 17.13% could be unambiguously identified as contaminants, with those derived from the people involved in the retrieval and washing of the remains present in higher frequencies than those of the anthropologist and genetic researchers. This finding confirms, for the first time, previous hypotheses that teeth samples are most susceptible to contamination at their initial excavation. More worrying, the cloned contaminant sequences exhibit substitutions that can be attributed to DNA damage after the contamination event, and we demonstrate that the level of such damage increases with time: contaminants that are >10 years old have approximately 5 times more damage than those that are recent. Furthermore, we demonstrate that in this data set, the damage rate of the old contaminant sequences is indistinguishable from that of the endogenous DNA sequences. As such, the commonly used argument that miscoding lesions observed among cloned aDNA sequences can be used to support data authenticity is misleading in scenarios where the presence of old contaminant sequences is possible. We argue therefore that the typing of those involved in the manipulation of the ancient human specimens is critical in order to ensure that generated results are accurate.

Introduction

A major problem facing ancient DNA (aDNA) studies (particularly those on human and microorganism DNA) is sample contamination with exogenous (i.e., not derived from the sample) sources of DNA. Degradation of the endogenous (authentic and original) DNA-reducing polymerase chain reaction (PCR)–amplifiable templates to very low levels results in a susceptibility for specimens to become contaminated with higher levels of contaminant DNA, whether derived from contact with living tissue containing similar DNA or from previously PCR-amplified DNA. As the younger contaminant DNA is likely to be less degraded than the endogenous DNA, it has been argued that they may be preferentially amplified in subsequent PCR analyses, to the detriment of the endogenous DNA (Handt, Hoss, et al. 1994). Historically, the field of aDNA has focused on the problems associated with laboratory-derived contamination, and a number of guidelines have been suggested to help deal with this issue. Examples include independent replication of results in a second laboratory, separation of pre- and post-PCR laboratories, adherence to sterile techniques, blank controls in amplifications and extractions, cloning PCR products, and quantitation of the number of DNA templates (Handt et al. 1996; Cooper and Poinar 2000). Recently, however, a number of studies have raised the issue that perhaps a more problematic, harder to resolve issue is when the analyzed specimen has been contaminated prior to preparation for genetic analysis (here referred to as prelaboratory) (Pääbo et al. 2004; Gilbert, Bandelt, et al. 2005; Willerslev and Cooper 2005). For example, a large proportion of human European archeological material is handled, washed, and labeled during excavation and subjected to subsequent archiving by ethnic Europeans, who may have DNA sequences that are closely related, if not indistinguishable, to those of the specimen. This, therefore, renders such studies particularly subject to criticism (e.g., Gilbert, Bandelt, et al. 2005), even when additional supporting evidence such as biochemical preservation data or analysis of associated fauna is presented (e.g., Caramelli et al. 2003).

Prelaboratory contamination has been reported in a number of studies. For example, several authors report the presence of human DNA in samples that are not expected to naturally contain modern human DNA. Examples include human DNA PCR amplified from archeological and historical specimens of calcified tissues from pigs (Richards et al. 1995), cave bears (Hofreiter, Serre et al. 2001), foxes (Wandeler et al. 2003), dogs (Malmström et al. 2005), and Neandertals (Krings et al. 1997; Serre et al. 2004; Lalueza-Fox et al. 2005). In other situations, prelaboratory contaminant DNA sequences have been directly identified in human remains, normally through observations of the presence of multiple DNA haplotypes among cloned sequences from one sample or inconsistent results between samples from one skeleton (e.g., Handt et al. 1996; Kolman and Tuross 2000; Gilbert, Rudbeck, et al. 2005; Gilbert et al. 2006). However, despite evidence that such contamination is a problem, it has proved more difficult to investigate the actual nature of prelaboratory contamination in detail, particularly, the processes through which contamination enters ancient samples and how it persists once there.

Prelaboratory contamination of bones and teeth does appear to depend upon sample preservation and porosity, with better preserved samples being more resistant than less well-preserved samples from specific data sets that have undergone similar treatments (Gilbert, Rudbeck, et al. 2005). Furthermore, studies on poorly preserved Neandertal remains, where PCR amplification of conserved human genetic markers enables co-amplification of contaminant and authentic DNA, reveal that modern human mitochondrial DNA (mtDNA) makes up a very large proportion of the DNA, and several studies report endogenous Neandertal sequences constituting only around 5% of the total sequences retrieved (Serre et al. 2004; Lalueza-Fox et al. 2005). Furthermore, it seems that prelaboratory contamination can occur at any stage prior to genetic analyses. For example, in a Neandertal from El Sidrón (Astúrias, Spain), some modern contaminant sequences could be attributed to a researcher who handled the remains prior to the arrival of the specimen to the laboratory (Lalueza-Fox et al. 2005). However, other studies attempting to “freshly” contaminate human bones and teeth excavated many years before have noted that contamination seems to be most problematic in the period immediately after excavation (Gilbert, Rudbeck, et al. 2005; Gilbert et al. 2006). As for how contaminants enter the specimens, anecdotal evidence suggests that the likely route is through direct handling and washing, presumably due to DNA derived from the handler permeating throughout dentinal tubules into the pulp cavity (in teeth) and the Haversian system (in bone) (Gilbert, Rudbeck, et al. 2005), although possibly not permeating as far as the osteocytes (Malmström et al. 2005; Salamon et al. 2005).

In summary, despite over 20 years of aDNA studies, surprisingly little is known about this complex, yet fundamental, area. Why so little is known is simple as further understanding has required studies on controlled samples, where complete information on handling history is available, something which is rarely available. In this study, we have attempted to address this problem through the extraction, PCR amplification, and cloning of mtDNA from teeth sampled from 23 human Neolithic skeletons. These remains are unique as we have been able to monitor all those involved in the manipulations of the specimens, both before and after their sampling for genetic analyses. As we therefore know exactly when and who retrieved, washed, and studied the skeletal remains, handling-derived contaminant DNA could be tracked down in the resulting cloned DNA sequences. In addition, as the manipulations were made over a 10-year period prior to the current genetic study, we were also able to test whether subsequent degradation of the contaminants has produced any sequence modifications, something that has previously been postulated (Willerslev and Cooper 2005), although not definitively observed. Furthermore, as the 3 people involved in the excavation and anthropological study of the remains and the 3 people involved in the genetic analyses have different mtDNA haplotypes, we have also been able to address directly for the first time where and when the risks of contamination are highest.

Materials and Methods

The Neolithic remains studied were excavated in 1994 at the Camí de Can Grau site (Granollers, Barcelona, Spain). The site comprised 23 tombs that have been C14 dated to between 3,500 and 3,000 years BC. As with most archeological excavations of human samples that have previously been used in aDNA analyses, no particular precautions were taken by the excavators to prevent direct contact between the handlers' skin or other sources of DNA (e.g., sweat) with the material. Immediately after excavation, the remains were washed under running water and allowed to dry naturally. Subsequently, the remains underwent an anthropological investigation, before being stored for 10 years in sealed plastic bags, within closed boxes, in a storage room of the local museum of Granollers. Roser Pou (R.P.) and Miquel Martí (M.M.) were the archeologists who excavated, cleaned, and washed the remains. Elisenda Vives (E.V.) undertook the anthropological study (table 1), during which the cranial and dental fragments were glued together and the bones and skulls measured with standard anthropological instruments.

Table 1

Mitochondrial Haplotypes of the Only 6 Researchers Who Have Been in Contact with the Samples


Researcher

Task

HVR1 Haplotype

MtDNA Lineage

% Found in Iberian Data Set
R.P.Excavation, washing of remains16069 T, 16126 C, 16278 T, 16366 TJ*0.0%
M.M.Excavation, washing of remains16129 AH*2.82%
E.V.Anthropological study16298 CV*3.13%
M.L.S.Laboratory analysis16069 T, 16126 C, 16185 T, 16189 CJ*0.0%
C.L.-F.Laboratory analysis16126 C, 16294 T, 16296 T, 16304 CT21.04%
D.C.
Laboratory analysis
16193 T, 16278 T
J2
0.0%

Researcher

Task

HVR1 Haplotype

MtDNA Lineage

% Found in Iberian Data Set
R.P.Excavation, washing of remains16069 T, 16126 C, 16278 T, 16366 TJ*0.0%
M.M.Excavation, washing of remains16129 AH*2.82%
E.V.Anthropological study16298 CV*3.13%
M.L.S.Laboratory analysis16069 T, 16126 C, 16185 T, 16189 CJ*0.0%
C.L.-F.Laboratory analysis16126 C, 16294 T, 16296 T, 16304 CT21.04%
D.C.
Laboratory analysis
16193 T, 16278 T
J2
0.0%
Table 1

Mitochondrial Haplotypes of the Only 6 Researchers Who Have Been in Contact with the Samples


Researcher

Task

HVR1 Haplotype

MtDNA Lineage

% Found in Iberian Data Set
R.P.Excavation, washing of remains16069 T, 16126 C, 16278 T, 16366 TJ*0.0%
M.M.Excavation, washing of remains16129 AH*2.82%
E.V.Anthropological study16298 CV*3.13%
M.L.S.Laboratory analysis16069 T, 16126 C, 16185 T, 16189 CJ*0.0%
C.L.-F.Laboratory analysis16126 C, 16294 T, 16296 T, 16304 CT21.04%
D.C.
Laboratory analysis
16193 T, 16278 T
J2
0.0%

Researcher

Task

HVR1 Haplotype

MtDNA Lineage

% Found in Iberian Data Set
R.P.Excavation, washing of remains16069 T, 16126 C, 16278 T, 16366 TJ*0.0%
M.M.Excavation, washing of remains16129 AH*2.82%
E.V.Anthropological study16298 CV*3.13%
M.L.S.Laboratory analysis16069 T, 16126 C, 16185 T, 16189 CJ*0.0%
C.L.-F.Laboratory analysis16126 C, 16294 T, 16296 T, 16304 CT21.04%
D.C.
Laboratory analysis
16193 T, 16278 T
J2
0.0%

Hair samples were taken from all 3 of the above-mentioned handlers for DNA analysis and extracted by a chelex protocol. In brief, 2–3 cm of hair were mixed with chelex 5%, proteinase k, and dithiothreitol and heated for 1 h at 56 °C, and the sample was boiled for 10 min and the supernatant retrieved. The hypervariable region I (HVR1) of the mtDNA was determined by PCR amplification using the primers L15996 (5′-CTCCACCATTAGCACCCAAAGC-3′) and H408 (5′-CTGTTAAAAGTGCATACCGCC-3′) (table 1). The mtDNA haplotype of the researchers involved in the genetic analyses (M.L.S., C.L.-F., and D.C.) have been previously determined (table 1).

The analysis of the Neolithic remains was undertaken in a dedicated aDNA laboratory at the University Pompeu Fabra (Barcelona) that has a physically isolated pre-PCR area, with nightly UV irradiation, positive air pressure, and routine cleaning of surfaces with bleach. Recommended authentication criteria were adopted to prevent sample contamination during the DNA extractions, such as partial independent replication of results, blank controls, multiple extractions from the same specimens, amino acid analysis, quantification of starting templates, uracil-N-glycosylase treatment, and systematic cloning of all PCR products (Cooper and Poinar 2000; Willerslev and Cooper 2005).

The aDNA was recovered by C.L.-F. and M.L.S. from the roots of teeth that were pulled directly from the alveolus. This method is commonly used in aDNA studies as roots from embedded teeth have been suggested to be better protected from contamination than associated bone samples that may have undergone more handling (Oota et al. 1995). For 2 skeletons, it was possible to obtain 2 teeth, one of which was used for independent replication of the DNA sequences by D.C. at the University of Florence. In accordance with standard tooth decontamination protocols (e.g., Richards et al. 1995), teeth root surfaces were first cleaned with bleach and then ground to powder. The extraction method has been described elsewhere in detail (Caramelli et al. 2003; Sampietro et al. 2005). In brief, the tooth powder was incubated overnight with 0.5 M ethylenediaminetetraacetic acid to remove mineral salts; after centrifugation, the remaining sample was incubated overnight at 50 °C in a lysis solution (1 ml sodium dodecyl sulfate 5%, 0.5 ml TRIS 1 M, 8.5 ml sterile water and proteinase K). Following digestion, the samples were extracted with phenol, phenol–chloroform, and chloroform–isoamylic alcohol, followed by concentration using Centricon centrifugal filters (Millipore, Billerica, USA) to a final volume of 50–100 μl.

The mtDNA HVR1 region was amplified from the Neolithic specimens using a number of overlapping fragments with sizes ranging from 78 to 192 bp (excluding primers), combining several primer pairs (table 2). PCR amplifications were performed in 25-μl reactions with 1 μl of extract, 1.2 U of Taq DNA polymerase (Ecogen, Madrid, Spain), 1× reaction buffer (Ecogen), 1.4 mg/ml bovine serum albumin, 2.5–2.1 mM MgCl2, 0.2 mM dNTP, and 1 μM of each primer. The PCR reactions were subjected to 40 amplification cycles (1 min at 94 °C, 1 min at 50 °C, and 1 min at 72 °C) with an initial denaturation step at 94 °C for 5 min and a last elongation step at 72 °C for 7 min. PCR products were verified using agarose gel electrophoresis and subsequently cloned using the pMOSBlue blunt-end cloning kit (Amersham Biosciences, Uppsala, Sweden) following the manufacturer's instructions. Cloned inserts of the correct size were purified and then sequenced using an ABI 3100 DNA sequencer (Applied Biosystems, Foster City, USA).

Table 2

Contaminant Clones Observed for Each PCR Fragment


Fragment

Total Clones

Contaminated Clones

Expected Contaminant Clones
16131–162092434.11
16131–1621162610.62
16055–161421272721.76
16122–162184177.02
16247–163781482325.36
16122–16261672411.48
16209–16356811.37
16225–16383520.86
16186–16377540.86
16209–16401911.54
Other fragments
76
0
13.02

Fragment

Total Clones

Contaminated Clones

Expected Contaminant Clones
16131–162092434.11
16131–1621162610.62
16055–161421272721.76
16122–162184177.02
16247–163781482325.36
16122–16261672411.48
16209–16356811.37
16225–16383520.86
16186–16377540.86
16209–16401911.54
Other fragments
76
0
13.02

NOTE.—Other fragments (where no contaminants were detected): 16185–16261, 16311–16401, 16023–16142, 16023–16156, and 16084–16280. Expected contaminant clones were calculated under the assumption that the occurrence of contaminants is equally likely among different PCR products.

Table 2

Contaminant Clones Observed for Each PCR Fragment


Fragment

Total Clones

Contaminated Clones

Expected Contaminant Clones
16131–162092434.11
16131–1621162610.62
16055–161421272721.76
16122–162184177.02
16247–163781482325.36
16122–16261672411.48
16209–16356811.37
16225–16383520.86
16186–16377540.86
16209–16401911.54
Other fragments
76
0
13.02

Fragment

Total Clones

Contaminated Clones

Expected Contaminant Clones
16131–162092434.11
16131–1621162610.62
16055–161421272721.76
16122–162184177.02
16247–163781482325.36
16122–16261672411.48
16209–16356811.37
16225–16383520.86
16186–16377540.86
16209–16401911.54
Other fragments
76
0
13.02

NOTE.—Other fragments (where no contaminants were detected): 16185–16261, 16311–16401, 16023–16142, 16023–16156, and 16084–16280. Expected contaminant clones were calculated under the assumption that the occurrence of contaminants is equally likely among different PCR products.

Results

Two samples yielded no amplification products and were subsequently discarded; 6 produced partial HVR1 sequences. In the remaining 15 samples, it was possible to determine the complete HVR1 sequence in overlapping amplifications. A total of 572 clones were sequenced (table 3) of which 98 (17.13%) could be definitely identified as being contaminant DNA sequences derived from 1 of the 6 handlers, the only people to have ever had access to the samples (table 1). Each HVR1 sequence is unique, facilitating the identification of contaminant sequences. Moreover, the percentage of each of those sequences found in a data set of 957 Iberian HVR1 mtDNA sequences is very low (table 1); even the most frequent one (that of E.V.) is only found in 3.13% of the Iberian population. This facilitates the assignment of the contaminant sequences found to the specific persons involved in this study. Among all the cloned sequences that were analyzed, we did not observe any haplotypes other than those that could be assigned to 1 of the 6 sources of contamination or to the endogenous sequence. The endogenous sequences themselves were preliminarily identified as those left following the removal of the contaminant sequences. Several aspects of the data set suggest that they were accurately identified. First, the only possible contaminants are known, unless it can be argued that the bones were somehow initially contaminated prior to excavation through some as yet unknown mechanism. However, in this case, as the endogenous sequences were rarely found in more than a single individual specimen, such an argument would require that during such contamination, individual samples were contaminated by unique DNA sources, which adds yet to the implausibility of the explanation. Second, the generation of the endogenous sequences was reproducible—the different PCRs on the extracts from single specimens repeatedly yielded the same putative endogenous sequence. Third, the putative endogenous sequences were found at higher ratios than the contaminants. Fourth, the putative endogenous sequences also make phylogenetic sense, that is, they are found in modern populations (notably in the Iberian Peninsula) and thus unlikely to represent mosaic or phantom mutation data sets (e.g., Bandelt 2005; Brandstatter et al. 2005).

Table 3

Detected and Expected Contaminants for Each Researcher


Researcher

Detected Contaminants

% in Total of Detected Contaminant Clones

% in Total of Sequencing Clones

Expected Contaminants
C.L.-F.1111.221.9217.79
D.C.11.020.17
E.V.77.141.2210.68
M.L.S.1313.272.2718.32
M.M.2020.413.5033.58
R.P.1414.292.4518.76
Combined R.P.–M.L.S.
32
32.65
5.59


Researcher

Detected Contaminants

% in Total of Detected Contaminant Clones

% in Total of Sequencing Clones

Expected Contaminants
C.L.-F.1111.221.9217.79
D.C.11.020.17
E.V.77.141.2210.68
M.L.S.1313.272.2718.32
M.M.2020.413.5033.58
R.P.1414.292.4518.76
Combined R.P.–M.L.S.
32
32.65
5.59

NOTE.—Total detected contaminant clones, N = 98 clones; total sequenced clones, N = 572 clones. The expected contaminants for D.C. and combined R.P.–M.L.S. were not estimated.

Table 3

Detected and Expected Contaminants for Each Researcher


Researcher

Detected Contaminants

% in Total of Detected Contaminant Clones

% in Total of Sequencing Clones

Expected Contaminants
C.L.-F.1111.221.9217.79
D.C.11.020.17
E.V.77.141.2210.68
M.L.S.1313.272.2718.32
M.M.2020.413.5033.58
R.P.1414.292.4518.76
Combined R.P.–M.L.S.
32
32.65
5.59


Researcher

Detected Contaminants

% in Total of Detected Contaminant Clones

% in Total of Sequencing Clones

Expected Contaminants
C.L.-F.1111.221.9217.79
D.C.11.020.17
E.V.77.141.2210.68
M.L.S.1313.272.2718.32
M.M.2020.413.5033.58
R.P.1414.292.4518.76
Combined R.P.–M.L.S.
32
32.65
5.59

NOTE.—Total detected contaminant clones, N = 98 clones; total sequenced clones, N = 572 clones. The expected contaminants for D.C. and combined R.P.–M.L.S. were not estimated.

We did face one problem, however, that arose due to the small size of the PCR products that were amplified from the ancient extracts. Specifically, in a number of situations, we were not able to discriminate between several of the different contaminants. For example, the DNA sequences of one excavating archeologist (R.P.) and one geneticist (M.L.S.) are identical over the first section of the HVR1, up to the nucleotide position 16185 with reference to the Cambridge Reference Sequence (CRS) (Anderson et al. 1981), rendering them indistinguishable from each other in PCR products amplified using primers 16055–16142. Furthermore, in other PCR fragments, endogenous sequences exhibited no differences to some contaminants. For instance, PCR amplification with the 16209–16400 fragment would not enable the M.M. contaminant (that differs from the CRS only in the substitution 16129 A) to be discriminated from a potentially endogenous sequence with the CRS sequence; alternatively, the amplification of the 16055–16142 fragment would not allow to distinguish the E.V. contaminant (that has only a substitution in 16298) from a CRS sequence.

The problems described above were not, however, intractable. The capability of distinguishing between an endogenous sequence and a contaminant sequence is dependent on the number of nucleotides that both sequences share. Because this value depends on each individual, it is unavoidable that the detection of contaminant sequences will be differentially underestimated in some fragments depending on the studied contaminant. One way to correct this bias, therefore, is through the calculation of expected frequencies of the possible source of contamination (i) following the formula:
\[\mathrm{Expected}_{i}{=}\frac{\mathrm{Detected}_{i}}{P(i{\neq}e)},\]
where expected = total number of expected contaminants and detected = total number of detected contaminants.
\[P(i{\neq}e){=}\frac{{{\sum}_{F{=}1}^{n}}N_{F}{\times}g(x)}{\mathrm{Total{\,}number{\,}of{\,}clones}(572)},\]
where F = amplified fragment and NF = number of clones by amplified fragment.
\[g(x){=}\left\{\begin{array}{ll}g(x){=}1&x{\rightarrow}i{\neq}e\\g(x){=}0&x{\rightarrow}i{\neq}e\end{array}\right.,\]
where i = mtDNA sequences of each handler and e = endogenous sequence of each Neolithic sample.

A chi-square exact test comparing the observed contaminant frequencies with the expected contaminant frequencies calculated for each of the handlers using our formula (table 3) demonstrates that the difference between the frequency of detected and expected contaminants is statistically significant (P = 0.008); in this test, we have excluded contaminants derived from D.C., who only extracted DNA from 2 teeth, and also the sequences shared between M.L.S. and R.P. that cannot be discriminated from each other. This confirms our argument that the subsample of contaminants that we could detect is not a representative subsample of the total amount of contaminants that really exist. Therefore, we have used the calculated expected frequencies in the subsequent comparisons and statistical tests.

The expected frequency of each contaminant DNA source varied significantly in our data (table 3) (chi-square exact test P = 0.006), indicating that biases exist during the contamination process. Furthermore, the expected contaminants derived from those people who were involved in the initial washing and cleaning of the remains (M.M. and R.P.) are represented at a higher frequency (33.58 and 18.76 clones that represent 33.88% and 18.92%, respectively) than the expected frequency of contaminant DNA sequences derived from the other participants. In contrast, DNA from the person who conducted the anthropological study and handled the already washed and dried remains (E.V.) accounts for only 10.68 % of the expected contaminants. This is significantly lower than the expected frequency of contaminants derived from initial handling and washing (Fisher exact test P < 0.005), even if R.P.'s contribution is taken as the underestimate due to the discriminatory problems. Therefore, this finding supports hypotheses from previous contamination studies that samples are most susceptible to contamination when initially excavated (Gilbert, Rudbeck, et al. 2005; Gilbert et al. 2006). In addition, the researcher who did the independent replication of few samples (D.C.) is only found as a residual contaminant (one clone) in the clones generated in Florence, and it is obviously absent from the clones that were generated in Barcelona. This marked difference accounts for the effectiveness of independent replication in avoiding or minimizing intralaboratory contaminations.

The frequency of the detected contaminant sequences is not distributed randomly among the different PCR fragments that were investigated (chi-square exact test, P < 0.0005), with some appearing to have more contaminants than expected (e.g., 16122–16261), whereas others contain less than expected (e.g., 16131–16211) (table 2). The length of the PCR products analyzed in this study ranges from 82 to 192 bp. It has been previously postulated that PCRs on longer fragments should be at greater risk of contamination due to the inverse correlation that exists between template copy number and fragment size in degraded DNA (e.g., Handt, Richards, et al. 1994; Malmström et al. 2005). Although we observe a slight increase in contaminant molecules in fragments over 100 bp (55.11% of contaminants) with respect to fragments shorter than 100 bp (44.89% of contaminants), no significant trend is apparent in our data. However, we note that the size range of fragments analyzed in our data set is very limited, and a study on larger fragments (e.g., 300, 400 bp+) may indeed demonstrate that such a correlation exists. Finally, we compared the number of contaminants in the 15 samples that yielded complete HVR1 sequences with those observed in the 6 samples with partial HVR1 sequences (this being a potential indicator of worse endogenous DNA preservation); however, the differences were not statistically significant (Mann–Whitney U test, P = 0.413).

A further interesting feature of the contaminant sequences is that some of them display additional substitutions, forming sequences that represent phylogenetically incoherent haplotypes, and in light of the known contaminant sequences, the similarity of the observed base change frequencies with previously published analyses on postmortem damage (2 transversions and 42 transitions, 19:23 transition type I:type II ratio—see Supplementary Material online) (Hansen et al. 2001; Hofreiter, Jaenicke et al. 2001; Gilbert et al. 2003; Binladen et al. 2006), these changes are likely attributable to postmortem damage in the contaminant sequences (Pääbo 1989)—although it is possible that an unknown but small fraction of these additional changes might be due to cloning artifacts. The damage rate of the contaminant sequences was calculated as the number of transitions per cloned nucleotide (following Gilbert et al. 2003) and compared between the “old” contaminants (those derived from the excavators and the anthropologist, originally occurring over 10 years before the genetic analysis) and “new” contaminants (those derived from the laboratory researchers). The results (table 4) indicate that modern contaminants contain significantly fewer transitions per nucleotide than the old contaminants (Mann–Whitney U test, P = 0.005). However, when the damage rate of old contaminant sequences is compared with that of the endogenous sequences (specifically, only those sequences that we can be completely certain are endogenous due to the presence of specific diagnostic single-nucleotide polymorphisms that distinguish them from the 6 potential contaminant sequences) (table 5), we find no statistically significant difference between the two (Mann–Whitney U test, P = 0.425). Thus, the same pattern of damage is observed in very ancient endogenous DNA sequences and in the contaminant DNA sequences with a relative age higher than 10 years.

Table 4

Number of Clones of Each Type of Contaminant (modern, ancient, and undistinguishable) Detected


Type of Contaminant

Number of Clones

Transitions/Clone

Transition/Nucleotide
Modern (C.L.-F., M.L.S., and D.C.)250.120.00099
Ancient (M.M., R.P., and E.V.)410.610.00545
Undistinguishable (R.P.–M.L.S.)
32
0.44
0.00377

Type of Contaminant

Number of Clones

Transitions/Clone

Transition/Nucleotide
Modern (C.L.-F., M.L.S., and D.C.)250.120.00099
Ancient (M.M., R.P., and E.V.)410.610.00545
Undistinguishable (R.P.–M.L.S.)
32
0.44
0.00377

NOTE.—Transitions: A → G, C → T, G → A, and T → C changes.

Table 4

Number of Clones of Each Type of Contaminant (modern, ancient, and undistinguishable) Detected


Type of Contaminant

Number of Clones

Transitions/Clone

Transition/Nucleotide
Modern (C.L.-F., M.L.S., and D.C.)250.120.00099
Ancient (M.M., R.P., and E.V.)410.610.00545
Undistinguishable (R.P.–M.L.S.)
32
0.44
0.00377

Type of Contaminant

Number of Clones

Transitions/Clone

Transition/Nucleotide
Modern (C.L.-F., M.L.S., and D.C.)250.120.00099
Ancient (M.M., R.P., and E.V.)410.610.00545
Undistinguishable (R.P.–M.L.S.)
32
0.44
0.00377

NOTE.—Transitions: A → G, C → T, G → A, and T → C changes.

Table 5

Damage Ratio Per Nucleotide in Endogenous Sequences and Contaminants (old and new)




Transversion/Nucleotide

Transitions/Nucleotide

Type I/Nucleotide

Type II/Nucleotide
Endogenous sequences00.0053400.0019580.003382
Contaminants0.0001770.0037080.0016780.002031
Old0.0002180.0054470.0030500.002397
New
0
0.000992
0
0.000992



Transversion/Nucleotide

Transitions/Nucleotide

Type I/Nucleotide

Type II/Nucleotide
Endogenous sequences00.0053400.0019580.003382
Contaminants0.0001770.0037080.0016780.002031
Old0.0002180.0054470.0030500.002397
New
0
0.000992
0
0.000992

NOTE.—Transversion: A → C, A → T, C → A, C → G, G → C, G → T, T → A, and T → G changes. Transitions: A → G, C → T, G → A, and T → C changes. Type I transitions: A → G and T → C changes. Type II transitions: C → T and G → A changes.

Table 5

Damage Ratio Per Nucleotide in Endogenous Sequences and Contaminants (old and new)




Transversion/Nucleotide

Transitions/Nucleotide

Type I/Nucleotide

Type II/Nucleotide
Endogenous sequences00.0053400.0019580.003382
Contaminants0.0001770.0037080.0016780.002031
Old0.0002180.0054470.0030500.002397
New
0
0.000992
0
0.000992



Transversion/Nucleotide

Transitions/Nucleotide

Type I/Nucleotide

Type II/Nucleotide
Endogenous sequences00.0053400.0019580.003382
Contaminants0.0001770.0037080.0016780.002031
Old0.0002180.0054470.0030500.002397
New
0
0.000992
0
0.000992

NOTE.—Transversion: A → C, A → T, C → A, C → G, G → C, G → T, T → A, and T → G changes. Transitions: A → G, C → T, G → A, and T → C changes. Type I transitions: A → G and T → C changes. Type II transitions: C → T and G → A changes.

Discussion

The results of this study corroborate previous observations of high levels of prelaboratory-derived DNA contamination in human and animal calcified remains, even when routine decontamination and sample prepreparation guidelines to remove such contaminants are followed (e.g., Richards et al. 1995; Handt et al. 1996; Kolman and Tuross 2000; Hofreiter, Serre et al. 2001; Wandeler et al. 2003; Gilbert, Rudbeck, et al. 2005; Malmström et al. 2005; Gilbert et al. 2006). Therefore, these results add yet further proof to the inadequacy of current methods used to ensure the generation of authentic aDNA recovered from teeth. Naturally, the specific results of this study will vary from those derived from alternative data sets as no two data sets will have undergone exactly the same handling treatment, importantly with consideration to extent of handling (particularly with reference to total time handled and the numbers of people in contact with the specimens). Furthermore, as it has been demonstrated that sample preservation, in particular porosity, correlates with susceptibility to prelaboratory contamination (Gilbert, Rudbeck, et al. 2005), specific samples may be more or less susceptible than others.

The results of this study are generated from teeth and as such may not accurately reflect on contamination in bone. However, previous studies have demonstrated correlations between contamination levels in paired teeth and bone samples (Gilbert, Rudbeck, et al. 2005; Gilbert et al. 2006). Further, it has been demonstrated that femur is easier to contaminate than teeth from identical skeletons (Gilbert et al. 2006). Therefore, we argue that with regards to bone, these results present a best-case scenario. As such, the susceptibility to prelaboratory contamination and, thus, the subsequent associated problems are probably greater in bone than in teeth (Gilbert et al. 2006).

Although, as discussed above, our results are dependent on factors such as sample preservation and the extent of sample manipulation at excavation and during subsequent anthropological analyses, our data demonstrate for the first time support for previous hypotheses (Gilbert, Rudbeck, et al. 2005; Gilbert et al. 2006) that human teeth samples are most susceptible to contamination at initial excavation and washing. As enamel is impermeable to water-based liquids (Hillson et al. 2005) (and as such is believed to be an unlikely route of contaminant DNA entry into teeth) and as the teeth in this study were directly sampled from the alveolus, the permeable roots were thus not handled prior to sampling for genetic analyses, and contamination derived from the initial excavators must have derived from the sample washing. Therefore, the data also confirm previous hypotheses that washing the specimens is a critical step for contaminating the samples (Gilbert et al. 2006). The exact mechanism of contamination by handling is still poorly understood, but the transport of DNA molecules from the exogenous source to the interior of the specimen seems to be related to sample moisture content. In fact, it has been argued elsewhere that once the remains are excavated and dried, mechanisms such as collapse of collagen bundles within bones and teeth or sedimentation and precipitation of minerals in water might block further entry of waterborne contaminant DNA deep within the sample (Gilbert, Rudbeck, et al. 2005). However, to explore this mechanism further, including the possibility of differentiating between contaminants and endogenous molecules through histological or other examinations, more research is clearly needed.

In this study, we also demonstrate directly that contaminant sequences can undergo observable levels of miscoding lesion damage posthandling, in particular transitions. Furthermore, not only is the damage level found in old (approximately 10 years old) contaminant sequences greater than that in new contaminants, thus demonstrating a time-dependent occurrence, but the damage also is found at levels that are indistinguishable from those in the believed endogenous DNA sequences. These findings are extremely important as one commonly used argument for data authenticity is the presence of such damage, following the reasoning that as damage accumulates roughly with time, authentic, thus old, sequences will be damaged, whereas new contaminants will remain undamaged. Clearly, the data presented here demonstrate the above argument to be flawed and as such has to be used with caution. Furthermore, we note that as deamination rates are exponentially linked to absolute temperature (Lindahl and Nyberg 1974), then one further worrying possibility is the following: if excavated samples are contaminated and then stored at a temperature that is substantially higher than their original archeological environment and for a nonnegligible amount of time, then it is possible that a similar amount of posthandling damage–driven miscoding lesions might be observed in the contaminants as the endogenous DNA.

In conclusion, it is clear that contamination is a serious problem in aDNA studies, and as such all possible authentication criteria are needed while studying ancient human samples. A potential guideline to control prelaboratory contaminants would be that followed in the present study: to type every single person involved in the manipulation and study of the remains. In the future, a possible way to study samples that could have DNA molecules identical to modern humans (e.g., Cromagnon specimens) and overcome these problems would be to excavate the samples under strictly controlled conditions, including the use of sterile gloves, face masks, and coveralls; the placement of excavated samples intended for later DNA analyses in sterile, sealed DNA-free containers; and the avoidance of any sample washing, or if washing cannot be avoided, this should be done with sterile water under controlled conditions.

Supplementary Material

A supplementary table is available at Molecular Biology and Evolution online (http://mbe.oxfordjournals.org/).

Anne Stone, Associate Editor

We are grateful to M.M., R.P., and E.V. for granting us access to the Neolithic samples and for allowing us to analyze their hairs. The authors want to thank Monica Valles (Universitat Pompeu Fabra) for providing technical support. This research was supported by Project BFU2004-02002 and CGL2006-03987 from the Ministry of Education of Science, Grup de Recerca Consolidat, and Distinció a la Recerca Universitària from the Generalitat de Catalunya and a special grant from the Institut d'Estudis Catalans. M.L.S. has a PhD fellowship (AP2002-1065).

References

Anderson S, Bankier AT, Arrell BG, et al. (14 co-authors).

1981
. Sequence and organisation of the human mitochondrial genome.
Nature
290
:
457
–65.

Bandelt HJ.

2005
. Mosaics of ancient mitochondrial DNA: positive indicators of nonauthenticity.
Eur J Hum Genet
13
:
1106
–12.

Binladen J, Wiuf C, Gilbert MTP, Bunce M, Larson G, Barnett R, Hansen AJ, Willerslev E.

2006
. Comparing miscoding lesion damage in mitochondrial and nuclear ancient DNA.
Genetics
172
:
733
–41.

Brandstatter A, Sanger T, Lutz-Bonengel S, Parson W, Beraud-Colomb E, Wen B, Kong QP, Bravi CM, Bandelt HJ.

2005
. Phantom mutation hotspots in human mitochondrial DNA.
Electrophoresis
18
:
3414
–29.

Caramelli D, Lalueza-Fox C, Vernesi C, et al. (11 co-authors).

2003
. Evidence for a genetic discontinuity between Neanderthal and 24,000-year-old anatomically modern Europeans.
Proc Natl Acad Sci USA
100
:
6593
–7.

Cooper A, Poinar H.

2000
. Ancient DNA: do it right or not at all.
Science
289
:
1139
.

Gilbert MTP, Bandelt HJ, Hofreiter M, Barnes I.

2005
. Assessing ancient DNA studies.
Trends Ecol Evol
20
:
541
–4.

Gilbert MTP, Hansen AJ, Willerslev E, Rudbeck L, Barnes I, Lynnerup N, Cooper A.

2003
. Characterisation of genetic miscoding lesions caused by post mortem damage.
Am J Hum Genet
72
:
48
–61.

Gilbert MTP, Hansen AJ, Willerslev E, Turner-Walker G, Collins M.

2006
. Insights into the processes behind the contamination of degraded human teeth and bone samples with exogenous sources of DNA.
Int J Osteoarchaeol
16
:
156
–64.

Gilbert MTP, Rudbeck L, Willerslev E, et al. (15 co-authors).

2005
. Biochemical and physical correlates of DNA contamination in archaeological human bones and teeth excavated at Matera, Italy.
J Archaeol Sci
32
:
783
–95.

Handt O, Hoss M, Krings M, Pääbo S.

1994
. Ancient DNA: methodological challenges.
Experientia
15
:
524
–9.

Handt O, Krings M, Ward R, Pääbo S.

1996
. The retrieval of ancient human DNA sequences.
Am J Hum Genet
59
:
368
–76.

Handt O, Richards M, Trommsdorff M, et al. (13 co-authors).

1994
. Molecular genetic analyses of the Tyrolean Ice Man.
Science
264
:
1775
–8.

Hansen A, Willerslev E, Wiuf C, Mourier T, Arctander P.

2001
. Statistical evidence for miscoding lesions in ancient DNA templates.
Mol Biol Evol
18
:
262
–5.

Hillson S.

2005
. Teeth. 2nd ed. Cambridge: Cambridge University Press.

Hofreiter M, Jaenicke V, Serre D, von Haeseler A, Pääbo S.

2001
. DNA sequences from multiple amplifications reveal artefacts induced by cytosine deamination in ancient DNA.
Nucleic Acids Res
29
:
4693
–799.

Hofreiter M, Serre D, Poinar HN, Kuch M, Pääbo S.

2001
. Ancient DNA.
Nat Rev Genet
2
:
353
–8.

Kolman CJ, Tuross N.

2000
. Ancient DNA analysis of human populations.
Am J Phys Anthropol
111
:
5
–23.

Krings M, Stone A, Schmitz R, Krainitzki H, Stoneking M, Pääbo S.

1997
. Neanderthal DNA sequences and the origin of modern humans.
Cell
90
:
19
–30.

Lalueza-Fox C, Sampietro ML, Caramelli D, et al. (12 co-authors).

2005
. Neandertal evolutionary genetics: mitochondrial DNA data from the Iberian Peninsula.
Mol Biol Evol
22
:
1077
–81.

Lindahl T, Nyberg B.

1974
. Heat induced deamination of cytosine residues in deoxyribonucleic acid.
Biochemistry
12
:
3405
–10.

Malmström H, Stora J, Dalen L, Holmlund G, Gotherstrom A.

2005
. Extensive human DNA contamination in extracts from ancient dog bones and teeth.
Mol Biol Evol
22
:
2040
–7.

Oota H, Saitou N, Matsushita T, Ueda S.

1995
. A genetic study of 2,000 year old human remains from Japan using mitochondrial DNA sequences.
Am J Phys Anthropol
98
:
133
–45.

Pääbo S.

1989
. Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification.
Proc Natl Acad Sci USA
86
:
1939
–43.

Pääbo S, Poinar H, Serre D, Jaenicke-Despres V, Hebler J, Rohland N, Kuch M, Krause J, Vigilant L, Hofreiter M.

2004
. Genetic analyses from ancient DNA.
Annu Rev Genet
38
:
645
–79.

Richards M, Sykes B, Hedges R.

1995
. Authenticating DNA extracted from ancient skeletal remains.
J Archaeol Sci
22
:
291
–9.

Salamon M, Tuross N, Arensburg B, Weiner S.

2005
. Relatively well preserved DNA is present in the crystal aggregates of fossil bones.
Proc Natl Acad Sci USA
102
:
13783
–8.

Sampietro ML, Caramelli D, Lao O, Calafell F, Comas D, Lari M, Agusti B, Bertranpetit J, Lalueza-Fox C.

2005
. The genetics of the pre-Roman Iberian Peninsula: a mtDNA study of ancient Iberians.
Ann Hum Genet
69
:
535
–48.

Serre D, Langaney A, Chech M, Teschler-Nicola M, Paunovic M, Mennecier P, Hofreiter M, Possnert G, Pääbo S.

2004
. No evidence of Neandertal mtDNA contribution to early modern humans.
PLoS Biol
2
:
E57
.

Wandeler P, Smith S, Morin PA, Pettifor RA, Funk SM.

2003
. Patterns of nuclear DNA degeneration over time—a case study in historic teeth samples.
Mol Ecol
12
:
1087
–93.

Willerslev E, Cooper A.

2005
. Ancient DNA.
Proc R Soc B
272
:
3
–16.

Author notes

*Unitat de Biologia Evolutiva, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain; †Ancient DNA Group, Niels Bohr Institute, Copenhagen, Denmark; ‡Department of Forensic Molecular Biology, Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands; §Dipartimento di Biologia Animale e Genetica, Laboratori di Antropologia, Università di Florence, Florence Italy; and ‖Unitat d'Antropologia, Departament de Biologia Animal, Universitat de Barcelona, Barcelona, Spain