- Split View
-
Views
-
Cite
Cite
Patricia M.R. Aldred, Edward J. Hollox, John A.L. Armour, Copy number polymorphism and expression level variation of the human α-defensin genes DEFA1 and DEFA3, Human Molecular Genetics, Volume 14, Issue 14, 15 July 2005, Pages 2045–2052, https://doi.org/10.1093/hmg/ddi209
- Share Icon Share
Abstract
We have defined unexpectedly extensive copy number variation at the human anti-microbial α-defensin genes DEFA1 and DEFA3, encoding human neutrophil peptides HNP-1, HNP-2 and HNP-3. There was variation in both number and position of DEFA1/DEFA3 genes in arrays of 19 kb tandem repeats on 8p23.1, so that the DEFA1 and DEFA3 genes appear to be interchangeable variant cassettes within tandem gene arrays. For this reason, the official symbol for this locus has been revised to DEFA1A3. The total number of gene copies per diploid genome varied between four and 11 in a sample of 111 control individuals from the UK, with ∼10% (11/111) of people lacking DEFA3 completely. DEFA1 appeared to be at high copy number in all great apes studied; at one variable site in the repeat unit, both variants have persisted in humans, chimpanzees and gorillas since their divergence. Analysis of expression levels in human white blood cells showed a clear correlation between the relative proportions of DEFA1:DEFA3 mRNA and corresponding gene numbers. However, there was no relationship between total (DEFA1+DEFA3) mRNA levels and total gene copy number, suggesting the superimposed influence of trans-acting factors. The persistence of DEFA1 at high copy number in other apes suggests an alternative model for the early stages of the evolution of novel genes by duplication and divergence. Duplicated genes present in variant tandem arrays may have greater potential than simple duplications for the combinatorial creation of new functions by recombination and gene conversion, while still preserving pre-existing functions on the same haplotype.
INTRODUCTION
Defensins are small cationic peptides that form an important part of the innate immune system (1). Hexamers of defensins create voltage-dependent ion channels in the target cell membrane causing permeabilization and, ultimately, cell death (2,3). The six human α-defensins characterized to date show broad anti-bacterial activity (4,5) as well as anti-HIV properties (6,7). The anti-HIV-1 activity of DEFA1 was recently shown to consist of a direct effect on the virus, combined with a serum-dependent effect on infected cells (8). DEFA1 to DEFA4 are constitutively produced by neutrophils (9), whereas DEFA5 and DEFA6 are produced in the Paneth cells of the small intestine. DEFA1 and DEFA3 (also called HNP-1 and HNP-3, respectively) differ only by the first amino acid of the mature peptides, whereas DEFA2 (HNP-2) lacks this residue. Given that no gene for DEFA2 has been discovered, it is thought that DEFA2 is a proteolytic product of one or both of DEFA1 and DEFA3 (OMIM 125220 and 604522, respectively). DEFA1 and DEFA2 are active against Candida albicans (10) and are chemotactic for T-cells, whereas DEFA3 is not (11). DEFA1 and DEFA2 have recently been observed to be more potent than DEFA3 against the Gram negative bacteria Enterobacter aerogenes and Escherichia coli as well as the Gram positive Staphylococcus aureus and Bacillus cereus (12).
Recent work has shown that large-scale copy number polymorphisms are a major source of genetic polymorphism (13–15). One of these large-scale polymorphisms involves the β-defensin cluster on 8p23.1, which is a copy number polymorphism with a repeat size of at least 240 kb. Carriers of a euchromatic variant that is visible cytogenetically (16,17) have nine to 12 copies of the region whereas most other normal people have two to seven copies (18). β-Defensin copy number is correlated with expression level, raising the possibility that variable expression levels could lead to differing susceptibility to infectious diseases. The human α-defensin genes DEFA1 and DEFA3 and the 𝛉-defensin pseudogene DEFT1 are present in a cluster that is close to but independent of the β-defensin cluster on 8p23.1. Previous work suggested that DEFA1 and DEFA3 have a complex organization that includes copy number variation (19) but the extent and precise nature of this variation were unknown.
Here, we use a combination of multiplex amplifiable probe hybridization (MAPH), pulsed-field gel electrophoresis (PFGE), Southern blotting and restriction digest ratio assays to clarify α-defensin and 𝛉-defensin genomic organization. We examine the evolutionary history of this region in humans and other primates and characterize a common copy number polymorphism in a sample of 111 normal individuals from a UK population. We also use semi-quantitative RT–PCR to investigate the relationship between α-defensin genomic copy number and α-defensin gene expression.
RESULTS
Copy number measurement
Initial investigation by MAPH showed probe DEFA1.2 to be of variable copy number independent of the β-defensin cluster (18). We determined α-defensin copy number in a set of seven reference samples using PFGE and Southern blotting. Three common HpaI restriction fragments were observed, corresponding to the lengths of alleles containing three (68 kb), four (87 kb) or five (106 kb) copies of DEFA1/DEFA3 predicted by analysis of BAC sequences including DEFA1/DEFA3 repeat units. Calibrated against these seven reference samples, 111 samples from a normal UK population were tested in duplicate by MAPH, in conjunction with measurements of the ratios of variable restriction sites for HaeIII, BsmI and MseI (see Figs 1 and 2 and Materials and Methods section). The relationship between probes DEFA1.1 and DEFA1.2 conformed with that predicted from GenBank sequence analysis (Fig. 1) with DEFA1.1 being present at two copies fewer than DEFA1.2 per diploid genome. Total DEFA1/DEFA3 copy number was found to range between four and 11 copies per diploid genome with five to nine copies being most common (Fig. 3). The HaeIII digest ratio assay distinguishes DEFA1 from DEFA3 via a single base difference (G–T) between the coding sequences of the two genes, allowing the putative copy numbers of each gene to be inferred. DEFA1 ranged from three to nine copies with the modal value being five per diploid genome. DEFA3 copy number was also variable, ranging from zero to four copies with the majority of individuals having one or two copies (Fig. 3). Eleven of the 111 individuals tested with the HaeIII digest ratio assay lacked DEFA3 altogether, consistent with previous reports where ∼10% of people lack the DEFA3 peptide (20). Codominant segregation of DEFA1 and DEFA3 copy numbers and of haplotypes of variant BsmI (+/−) sites, was confirmed in CEPH pedigree 12 (Fig. 4).
Organization of the repeat array
To sample the repeat array structure, we typed the genes present at each end of the array in 100 DNA samples. Long PCRs specifically amplified either the 5′ or the 3′ end repeat, followed by a secondary PCR using the HaeIII digest ratio assay to distinguish DEFA1 and DEFA3. Taking into consideration the bias due to shorter alleles having a higher proportion of end repeat positions, significantly more DEFA3 genes were present at the 5′ end repeat position (χ2-test, P=7×10−5) than that would be expected if the genes were distributed at random within the repeat array. Nevertheless, about half of DEFA3 genes (54%) are not at end positions. DEFA1 and DEFA3 thus appear to be interspersed along repeat arrays, suggesting that recombination has been active in shaping diversity. Similarly, although most DEFA3 genes are associated with an adjacent MseI(+) site, we found 5/106 associated with MseI(−), suggesting recombination of an ancestral DEFA3/MseI(+) with other repeats.
α-Defensin evolution in primates
Restriction enzyme tests and sequence analysis showed that chimpanzee, bonobo, gorilla and orangutan all have DEFA1 and not DEFA3, indicating that DEFA3 is probably human specific, and established MseI(+) as the ancestral state for this variant. However, one chimpanzee and one gorilla sample showed both BsmI (+) and (−) repeat units [(+):(−) ratios of 1 and 0.75, respectively], indicating the long-term persistence of BsmI (+)/(−) variation. MAPH confirmed that DEFA1 has been present in multiple copies since before the divergence of great apes from gibbons around 25 million years ago, with eight to 11 copies in chimpanzee and orangutan and three to five copies in gorilla and gibbon.
Active 𝛉-defensins are found in the bone marrow of Rhesus macaques (21) but the human DEFT1 pseudogene sequence found in DEFA1/3 repeat units contains a stop codon in the signal peptide that prevents processing to produce the mature peptide. Sequencing of 𝛉-defensin genes showed human, chimpanzee, bonobo and gorilla to have the same stop codon, whereas orangutan, gibbon and long-tailed macaque have sequences capable of encoding active peptides, suggesting that 𝛉-defensins are present in old world monkeys and have since been lost in the ape lineage (22).
Copy number association with α-defensin expression levels
We used duplex semi-quantitative RT–PCR to compare defensin cDNA levels with those of the housekeeping gene SDHA. cDNA was obtained from 17 individuals with known DEFA1/DEFA3 genomic copy numbers. The ratio of DEFA1 and DEFA3 in the cDNA was calculated and compared with the genomic ratios obtained previously. A significant positive correlation (r2=0.71, P<0.01) was observed between genomic and cDNA ratios of the two defensins (Fig. 5) with an ∼2-fold relative over-expression of DEFA3. All samples known to have DEFA3 were seen to express the gene, indicating that all genes present in the repeat array are expressed. The total defensin level was calibrated with the SDHA level of each sample and plotted against the total copy number. No significant relationship was observed between defensin levels and genomic copy number (r2=0.02) and variation in expression levels between individuals within the same copy number class was seen (Fig. 5). Confirmation of the results was obtained by calibration of defensin cDNA levels against a different housekeeping gene: transferrin receptor (TFRC) (data not shown). On repeat sampling of five individuals, the relative levels of total α-defensin were consistent with those observed previously. Total defensin expression was compared with BsmI status, whereas only DEFA1 expression was compared with MseI status [as all DEFA3 repeats in this sample set were known to be on MseI(+) backgrounds]. No association between alleles and expression levels was observed (P=0.61, total versus BsmI status; P=0.80, DEFA1 versus MseI status, two-tailed Mann–Whitney U-test).
DISCUSSION
The α-defensin gene DEFA1 has been in a multi-copy array for at least 25 million years, with DEFA3 being a more recent (human-specific) variant. Both genes vary in copy number between normal individuals and can differ in location within the repeat array with respect to each other. We have also demonstrated that DEFA3 is likely to have arisen at the 5′ end repeat position and has transferred to other positions within the array through unequal recombination between alleles. Thus, rather than regarding DEFA1 and DEFA3 as distinct loci, or even alleles at a single locus, it is more realistic to view them as variant repeats in gene array haplotypes. For this reason, the HUGO Gene Nomenclature Committee (http://www.gene.ucl.ac.uk/nomenclature/) has recommended that this gene locus should now be designated as DEFA1A3. The associated 𝛉-defensin sequence here designated as DEFT1 will be renamed as DEFT1P to clarify its status as a pseudogene.
Approximately 10% of individuals sampled did not have the DEFA3 gene and it is not yet clear whether this affects the innate immune response in these individuals. The documented differences between DEFA1, DEFA2 and DEFA3 peptides show DEFA3 to be the least active of these three defensins (10,11). DEFA3 is expressed at approximately twice the level of DEFA1, so higher expression levels may compensate for the lower specific activity. DEFA3 may have another function, and it is not clear whether it may be the precursor of the biologically more active peptide DEFA2. Otherwise, DEFA3 may simply be a less active variant of DEFA1 that has little or no effect on carriers and at one extreme may be an incipient pseudogene. Further functional analysis of the protein is required to assess the phenotypic impact of haplotypes lacking DEFA3. The independent copy number polymorphism at both α- and β-defensin genes, with the polymorphic expression levels attributable to variable gene copy number of CCL3L1 chemokine (23,24), suggests that copy number change may be a frequent mode of adaptation for genes involved in immune responses.
Our results suggest that all the α-defensin genes present on the repeat array are expressed and the simplest relationship would be for all to be expressed equally. This would result in the gene copy number being directly proportional to the level of protein. We show that genomic copy number ratio of DEFA1 to DEFA3 accurately predicts the expression ratio of these defensins, suggesting that all copies of DEFA1 genes are expressed at the same level, as are all copies of DEFA3. Indeed, studies of the DEFA1 and DEFA3 promoter sequences have found a 200 bp region, which is identical in both genes, to be sufficient for gene expression (25). This region contains a variety of potential transcription binding sites and PU.1, CCAAT/enhancer binding protein (C/EBP) and c-Myb sites were found to be essential for efficient transcription (25,26). C/EBP is involved in the immune and inflammatory response.
Each copy of DEFA3 appears to be expressed at about twice the level of DEFA1, which suggests that the single nucleotide variant distinguishing DEFA3 (or another variant associated with it) causes this upregulation. No association between either BsmI or MseI status with defensin expression level was observed, showing that these variants (or any unknown common variants associated with them) do not affect expression levels of these genes. Nevertheless, combined expression levels of DEFA1 and DEFA3 are not correlated with genomic copy number, indicating that variation in expression of both these genes may be modulated by an independent trans-acting factor. Indeed, heritable trans-acting factor variation has been found to be common and is expected to account for a high proportion of gene expression variability between individuals (27).
Large-scale copy number variation has recently been found to be more common in the human genome than previously appreciated (14). In contrast to large-scale copy number polymorphisms that can have repeat units of several hundred kilo base pairs, the human α- and 𝛉-defensin genes are present on a smaller repeat unit of just 19 kb. Although one recent modification of array-comparative genome hybridization allows the definition of copy number changes at single-exon resolution equivalent to MAPH, most genome-wide surveys of copy number have relied on genomic clones, and even the best have effective resolution limits in the 15–30 kb range (28).
Representational oligonucleotide microarray analysis compares the relative concentration of DNA in two samples by hybridizing differentially labelled samples with a set of probes. DNA complexity is reduced by restriction enzyme digestion leading to an average resolution of 30 kb (29). Such methods would be unlikely to detect the copy number variation observed here and methods such as MAPH, which has a resolution of a few hundred base pairs, may reveal that medium scale (100 bp–100 kb) copy number polymorphism is more common than currently thought.
Polymorphisms in medium-scale repeats are yet another under-appreciated source of genetic variation with potentially important consequences in gene expression. In the DEFA1/DEFA3 array, the persistence of high copy number throughout great ape evolution, and the evidence that both coding and non-coding repeat variants can be shuffled by recombination into novel repeat sequences, suggests an alternative paradigm for the creation of evolutionary novelty by duplication and divergence of genes. Arrays of genes may preserve a more diverse pool of variant sequences than simple duplication and may provide the opportunity for the combinatorial emergence of novel genes by recombining together variants that have originated on distinct repeats. Although our current picture of DEFA1/DEFA3 diversity and evolution is still compatible with neutral copy number mutation and random drift in allele frequency, variable multi-copy arrays like these may be common intermediates in the early history of clustered gene families.
MATERIALS AND METHODS
DNA and RNA extractions
Genomic DNA was extracted from whole blood from a normal UK population using a standard phenol/chloroform method adapted from the Nucleon DNA extraction kit. DNAs from chimpanzee (Pan troglodytes), bonobo (Pan paniscus), gorilla (Gorilla gorilla), orangutan (Pongo pygmaeus), gibbon (Hylobates lar) and long-tailed macaque (Macaca fascicularis) were obtained from the European Collection of Animal Cell Cultures (ECACC, http://www.ecacc.org.uk). Total RNA was extracted from the leukocyte fraction of whole blood using a standard Trizol method (Invitrogen).
MAPH and analysis
All MAPH probes were generated by PCR amplification and subcloning into pZero-2 vector (Invitrogen) and were fully sequenced to confirm identity. Probe DEFA1.2 was designed to hybridize specifically to the DEFA1 and DEFA3 gene sequences (18), whereas probe DEFA1.1 targets a region close to the DEFT1 pseudogene (Fig. 1). The final probe set was a mixture of these test probes with probes from a set previously used to screen subtelomeric regions for deletions and duplications (30). These subtelomeric probes do not report common polymorphisms and can therefore be used as a reference framework against which to normalize α-defensin probes.
The full experimental details of MAPH have been published previously (31) and updates to the protocol are available at http://www.maph.info. Hybridization of the α-defensin probe set was performed as previously described using other probe sets. After stringent washing, amplifiable probes bound to the genomic DNA were released by heating to 95°C in a 50 µl solution [75 mM Tris–HCl (pH 8.8 at 25°C), 20 mM (NH4)2SO4 and 0.01% (v/v) Tween-20 (1 ×PCR buffer IV from Abgene)]. About 1 µl of this solution was used to seed a PCR reaction using primers PZA (5′-AGTAACGGCCGCCAGTGTGCTG-3′) and PZB (5′-CGAGCGGCCGCCAGTGTGATG-3′) for 25 cycles of 95°C for 30 s, 60°C for 1 min and 70°C for 1 min followed by a final step of 72°C for 20 min. About 1.5 µl of this PCR reaction was added to 10 µl HiDi containing ROX500 marker (Applied Biosystems) and run on an ABI 3100 capillary system with a 30 s injection time.
After electrophoretic separation of the 54 different probes, Genescan and Genotyper software (Applied Biosystems) were used to quantify the area under each peak. Each probe was normalized against its four nearest reference peaks within each sample. The peaks for the α-defensin probes showed such marked variation that the average copy number was unclear. Seven reference samples, for which length of the α-defensin repeat array had been directly measured by PFGE and Southern blot (discussed subsequently), were used to calibrate the MAPH data. Given standard deviations of 10–15% observed within the reference probe set, we combined MAPH data with restriction fragment digest ratio assays and a maximum likelihood calculation to predict the most likely copy number for each sample (discussed subsequently).
PFGE and Southern blotting
GenBank sequences AF238378 and AF233439 were used to reconstruct a predicted sequence spanning the whole α-defensin region of 8p23.1 and HpaI was identified as cutting close on either side, and not within the repeat array. Peripheral blood leukocytes were taken from seven reference samples and DNA was prepared in agarose blocks using standard methods (32). Restriction endonuclease digestion, PFGE using a CHEF-DR III system (BioRad), Southern blotting onto uncharged nylon filter (MSI MAGNA) and hybridization with an [α-32P]dCTP-labelled probe for DEFA1/DEFA3 were performed using standard methods (32).
Restriction variant digest ratio assays
All α-defensin repeat unit sequences in GenBank were aligned, and restriction sites that were present in some repeats but absent in others were identified. PCR products were designed around BsmI, HaeIII and MseI sites using fluorescently labelled primers (FAM or HEX). Fragments were PCR amplified and digested with the appropriate enzyme. About 3 µl of digestion product was added to 10 µl HiDi containing ROX500 marker (Applied Biosystems) and run on an ABI 3100 capillary system. Peak areas were quantified using Genescan and Genotyper software (Applied Biosystems) and the amount of cut to uncut product was calculated. For each sample, the three restriction variant digest ratio assays were completed in duplicate and the resulting six ratios were used in a maximum likelihood calculation to predict the copy number that maximized the joint likelihood of those particular ratios, assuming flat priors. For example, average digest ratio results of 6.8:1 (HaeIII), 0.9:1 (MseI) and 1.7:1 (BsmI) would be most consistent with a copy number of 8 (7:1, 4:4 and 5:3).
cDNA generation and semi-quantitative RT–PCR
Reverse transcriptase (Reverse-IT™ RTase blend, ABgene) and anchored oligo dT primers (5′-TTTTTTTTTTTTTVN-3′) were used to make cDNA in accordance with instructions provided by the enzyme manufacturer (ABgene). Human succinate dehydrogenase (SDHA, OMIM 600857) was used as a housekeeping reference gene. An intron-spanning PCR product was designed around the HaeIII site in the DEFA1/DEFA3 cDNA so that a 322 bp fragment could be amplified from cDNA. The fragment was amplified using a fluorescently labelled primer in a duplex reaction with a fragment of similar size from SDHA. In all amplifications, a control reaction was included using solution from a ‘no reverse transcriptase’ control. After HaeIII digestion and capillary electrophoretic separation, peak areas could be used to calculate DEFA1:DEFA3 and (DEFA1+DEFA3):SDHA cDNA ratios.
End-repeat PCR
Long PCR assays were designed to specifically amplify the 5′ or 3′ end repeat units of the array. In each case, one primer was of unique sequence outside the array and the other was positioned within the repeat giving products of 6.3 and 6.8 kb for the respective end repeats. PCR components were same as for MAPH probe set amplification with the addition of 0.3% glycerol [v/v] and 30 mM Tris-base. Reaction conditions were 25 cycles of 95°C for 30 s, 62°C and 60°C (3′ and 5′ end repeats, respectively) for 1 min and 70°C for 10 min. Primary PCR products were used to seed HaeIII restriction digest ratio assay reactions and processed accordingly (see Restriction Variant Digest Ratio Assays).
Allele-specific PCR
Two allele-specific PCR assays, utilizing a fluorescent primer, were designed to be specific for DEFA1 and DEFA3, respectively. The products were designed to incorporate the variable MseI site within the same repeat units as the specified gene. PCR components and conditions were as for MAPH probe set amplification except for using an annealing temperature of 63°C for 30 cycles. Primary products were digested with MseI (see restriction fragment digest ratio assays).
ACKNOWLEDGEMENTS
We are grateful to John Brookfield and Tamsin Majerus for helpful suggestions. P.M.R.A. was supported by a University of Nottingham PhD studentship and E.J.H. was supported by a Wellcome Trust Bioarchaeology Postdoctoral Fellowship (grant no. 071024).
Conflict of Interest statement. None declared.
Present Address: Department of Genetics, University of Leicester, University Road, Leicester LE1 7RH, UK.
References
Kagan, B.L., Selsted, M.E., Ganz, T. and Lehrer, R.I. (
Wimley, W.C., Selsted, M.E. and White, S.H. (
Ghosh, D., Porter, E., Shen, B., Lee, S.K., Wilk, D., Drazba, J., Yadav, S.P., Crabb, J.W., Ganz, T. and Bevins, C.L. (
Salzman, N.H., Ghosh, D., Huttner, K.M., Paterson, Y. and Bevins, C.L. (
Zhang, L., Yu, W., He, T., Yu, J., Caffrey, R.E., Dalmasso, E.A., Fu, S., Pham, T., Mei, J., Ho, J.J. et al. (
Zhang, L., Lopez, P., He, T., Yu, W. and Ho, D.D. (
Chang, T.L., Vargas, J.J., DelPortillo, A. and Klotman, M.E. (
Yount, N.Y., Wang, M.S., Yuan, J., Banaiee, N., Ouellette, A.J. and Selsted, M.E. (
Lehrer, R.I., Ganz, T., Szklarek, D. and Selsted, M.E. (
Chertov, O., Michiel, D.F., Xu, L., Wang, J.M., Tani, K., Murphy, W.J., Longo, D.L., Taub, D.D. and Oppenheim, J.J. (
Ericksen, B., Wu, Z., Lu, W. and Lehrer, R.I. (
Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi, M. et al. (
Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W. and Lee, C. (
Fredman, D., White, S.J., Potter, S., Eichler, E.E., Den Dunnen, J.T. and Brookes, A.J. (
Barber, J.C., Joyce, C.A., Collinson, M.N., Nicholson, J.C., Willatt, L.R., Dyson, H.M., Bateman, M.S., Green, A.J., Yates, J.R. and Dennis, N.R. (
O'Malley, D.P. and Storto, P.D. (
Hollox, E.J., Armour, J.A. and Barber, J.C. (
Mars, W.M., Patmasiriwat, P., Maity, T., Huff, V., Weil, M.M. and Saunders, G.F. (
Tang, Y.Q., Yuan, J., Osapay, G., Osapay, K., Tran, D., Miller, C.J., Ouellette, A.J. and Selsted, M.E. (
Nguyen, T.X., Cole, A.M. and Lehrer, R.I. (
Irving, S.G., Zipfel, P.F., Balke, J., McBride, O.W., Morton, C., Burd, P.R., Siebenlist, U. and Kelly, K. (
Townson, J.R., Barcellos, L.F. and Nibbs, R.J.B. (
Tsutsumi-Ishii, Y., Hasebe, T. and Nagaoka, I. (
Ma, Y., Su, Q. and Tempst, P. (
Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S. and Cheung, V.G. (
Dhami, P., Coffey, A.J., Abbs, S., Vermeesch, J.R., Dumanski, J.P., Woodward, K.J., Andrews, R.M., Langford, C. and Vetrie, D. (
Lucito, R., Healy, J., Alexander, J., Reiner, A., Esposito, D., Chi, M., Rodgers, L., Brady, A., Sebat, J., Troge, J. et al. (
Hollox, E.J., Atia, T., Cross, G., Parkin, T. and Armour, J.A. (
Armour, J.A., Sismani, C., Patsalis, P.C. and Cross, G. (