Comparative sequence analyses reveal rapid and divergent evolutionary changes of the WFDC locus in the primate lineage

  1. Belen Hurle1,
  2. Willie Swanson2,
  3. NISC Comparative Sequencing Program1,3, and
  4. Eric D. Green1,3,4
  1. 1 Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  2. 2 Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA;
  3. 3 NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, Maryland 20852, USA

Abstract

The initial comparison of the human and chimpanzee genome sequences revealed 16 genomic regions with an unusually high density of rapidly evolving genes. One such region is the whey acidic protein (WAP) four-disulfide core domain locus (or WFDC locus), which contains 14 WFDC genes organized in two subloci on human chromosome 20q13. WAP protease inhibitors have roles in innate immunity and/or the regulation of a group of endogenous proteolytic enzymes called kallikreins. In human, the centromeric WFDC sublocus also contains the rapidly evolving seminal genes, semenogelin 1 and 2 (SEMG1 and SEMG2). The rate of SEMG2 evolution in primates has been proposed to correlate with female promiscuity and semen coagulation, perhaps related to post-copulatory sperm competition. We mapped and sequenced the centromeric WFDC sublocus in 12 primate species that collectively represent four different mating systems. Our analyses reveal a 130-kb region with a notably complex evolutionary history that has included nested duplications, deletions, and significant interspecies divergence of both coding and noncoding sequences; together, this has led to striking differences of this region among primates and between primates and rodents. Further, this region contains six closely linked genes (WFDC12, PI3, SEMG1, SEMG2, SLPI, and MATN4) that show strong patterns of adaptive selection, although an unambiguous correlation between gene mutation rates and mating systems could not be established.

Footnotes

  • 4 Corresponding author.

    4 E-mail egreen{at}nhgri.nih.gov; fax (301) 402-2040.

  • [Supplemental material is available online at www.genome.org. Genomic sequences reported in this manuscript have been submitted to GenBank under accession numbers DP000036 to DP000048.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6004607

    • Received September 30, 2006.
    • Accepted December 7, 2006.
| Table of Contents

Preprint Server