Introduction

Classical MHC genes (i.e. class I and II) are the most diverse genes identified so far in vertebrates. These genes are textbook examples for the maintenance of genetic polymorphism by diversifying (positive Darwinian) selection (e.g. Klein 1986). They encode MHC class I and II molecules situated at the cell surface that are central for the function of the adaptive immune system. MHC molecules present antigens to T cells, triggering the adaptive immune response. Recently, bony fish (teleosts) have considerably advanced our understanding of how selection by parasites in conjunction with sexual selection drives the polymorphism of the classical major histocompatibility complex class I and II genes (Landry et al. 2001; Langefors et al. 2001a; Reusch et al. 2001; Wegner et al. 2003a). However, a detailed picture of the underlying genomic organisation of teleosts is lacking, although our knowledge on the organisation of the MHC in non-mammalian taxa is rapidly improving (Hess and Edwards 2002; Stet et al. 2003). Several hallmarks of MHC genetics and phylogeny that were originally derived from mammalian studies may not hold in fish, suggesting a major split between the genomic organisation of the MHC in bony fishes and other vertebrates, early in vertebrate evolution (Flajnik and Kasahara 2001; Shum et al. 2001; Stet et al. 2003).

The most notable differences in the organisation of the MHC of the bony fishes (Actinopterygii) are the non-linkage of the MHC class I and II regions (Sato et al. 2000) and the wide dispersion of many genes that are associated with MHC class I and II in mammals across several linkage groups (Nonaka et al. 2001; Phillips et al. 2003; Sambrook et al. 2002). As a consequence, Stet and coworkers proposed to adopt the original terminology for immunity-related genes or regions in bony fish, major histocompatibility (MH) (Stet et al. 2003), which is followed here.

Within bony fishes, the genomic architecture of the classical MH genes (class I and II) is more variable than in mammals. In Atlantic salmon (Salmo salar), for example, there is only one expressed classical class I and II locus (Langefors et al. 2001b) that resembles the situation of the “minimal essential MHC” in chicken (Stet et al. 2003). In zebrafish, there are several linkage groups with classical class II genes, yet most of these are silenced and truncated pseudogenes (Sültmann et al. 1994). Other teleosts, such as the cichlids, have >10 class II gene loci (Malaga-Trillo et al. 1998).

Here, we present a genomic analysis of a 99.5 kb segment comprising parts of the MH class II region in the three-spined stickleback, Gasterosteus aculeatus. Along with mice (Penn et al. 2002), experiments with sticklebacks currently provide the most convincing evidence for a direct role for parasite selection in the maintenance of MH(C) polymorphism (Wegner et al. 2003a,b). Moreover, in sticklebacks, sexual selection during mate choice has also been shown to enhance the polymorphism of MH genes in offspring (Reusch et al. 2001). Both selection processes interact positively, ultimately increasing the numbers of MH sequence polymorphisms that can be maintained in populations over longer time periods.

As a modification of the earlier findings of simple heterozygote advantage (Penn and Potts 1999), in sticklebacks there seems to be an optimal number of MH variants that is correlated with a minimal parasite load (Wegner et al. 2003a,b) and with the highest attractiveness of potential male mating partners (Aeschlimann et al. 2003), depending on the individual MH diversity of the choosing female fish. Different allele numbers can be produced by both heterozygosity at single loci and differences in MH class II beta gene duplication number across haplotypes (e.g. Malaga-Trillo et al. 1998). At the moment it is not clear whether or not this selection pattern is confined to species with a relatively flexible genomic architecture such as the stickleback and other teleosts with haplotype variation in their MH locus duplication numbers, or whether it represents a more general feature that has been overlooked in previous experiments. Therefore, further sequence information on contiguous segments of the genomic region surrounding the classical MH class II loci is highly warranted, along with data on the possible age and origin of MH gene duplication. To this end, we present a genomic analysis of a 99.5 kb segment spanning two classical MH class II alpha and beta loci.

Materials and methods

BAC library and identification of target clone

A genomic bacterial artificial chromosome (BAC) library was constructed in collaboration with Amplicon Express (Pullman, USA) using a single female fish from a lake (Vierer See, Germany). This individual was chosen because it revealed eight distinct SSCP signals for exon 2 of the class II beta genes (Reusch et al. 2001). Eight detectable SSCP signals are above the average individual MHC diversity found in northern Germany (Wegner et al. 2003a,b), suggesting eight sequence variants, and hence probably four distinct loci in the target individual. High molecular weight DNA was partially digested using MboI, and cloned into the pECBAC1 vector with an average insert size of 145 kb. A total of 38,400 colonies were obtained, yielding an average genomic coverage of 8-fold, assuming a genome size of 700 Mb for G. aculeatus. High-density colony nylon filters (obtainable from Amplicon Express) each contained 19,200 colonies in duplicate. The filters were screened using a radioactive probe that encompasses 207 bp of exon 2 of the stickleback MHC class II beta genes. The probe was obtained by using a mixture of two different PCR products to maximise the number of variable exon 2 sequences covered during screening (reverse primer R02: CGGACTTAGTCAGCACATTG; reverse primer R08: ATATTGTTGTAATCGATCTGGA). In both PCR reactions, the forward primer was identical (GA11: AAC TCC ACT GAG CTG AAG GAC ATC). The primers utilised in the PCR were designed based on a sample of 83 MHC class II beta sequences (accession numbers AF395709–730 and T.B.H. Reusch, unpublished data). The PCR product was cut from the agarose gel and re-amplified using the same primers and PCR conditions. Approximately 25 ng of the purified probe were labelled with the Rediprime II DNA random labelling system with α-[32P]dCTP (Amersham Biosciences) according to the manufacturer’s protocol. Labelled probe was purified with MicroSpin HR Columns (Amersham Biosciences). The hybridisation reaction took place in 25 ml ExpressHyb hybridisation buffer (Clontech) overnight at 60°C. Washing steps were three times each in 2× standard sodium citrate (SSC) and 0.1× SSC (with 0.1% SDS) at 60°C. Exposure to X-ray film was overnight. The screen produced 17 positive signals. This number of positive clones corresponds to an approximate total size of genomic segments containing MH class II genes of 300 kb, given an average insert size of 145 kb and an 8-fold coverage of the BAC library.

Among the 17 candidate clones, the presence of MH class II genes was confirmed using a PCR approach with subsequent analysis of different MH variants by single-strand conformation polymorphism (SSCP) on a capillary sequencer (Binz et al. 2001). Details are provided elsewhere (Reusch et al. 2001). In brief, diluted plasmid DNA from the positive clones was subjected to PCR reactions with fluorescently labelled primers. Two primer combinations were used (forward: always 6-Fam-CAG CAG CTC AGT GGG GAA G, reverse standard: GTG GTT CAG ACA GTA AAC CTC CTT C; reverse mod: GTT GTG CAG ACA GTA AAC CTC CTT C-3). Separation of the sequence variants and fluorescent detection took place under native capillary electrophoresis on an ABI 3100 genetic analyser (Applied Biosystems). Using native GeneScan polymer, a 120-bp segment (excluding primers) of the most polymorphic section of exon 2 encompassing 50% of the peptide binding region (PBR) was separated according to its sequence polymorphisms. In this way, we identified the target clone containing the maximal number of different SSCP signals, i.e. covering a maximal portion of the region of interest.

Shotgun cloning and sequence assembly

The BAC clone was propagated using the Large Construct Kit (Qiagen) and then shotgun cloned into the plasmid pUC19. For this purpose, 20 μg of BAC DNA was sonicated and the resulting fragments were purified by agarose gel electrophoresis. The fraction containing segments of 1,200–1,800 bp was eluted from the gel and the DNA fragments were blunted with T4 DNA polymerase. These fragments were subcloned into a SmaI-digested, alkaline phosphatase-treated pUC19 sequencing vector. The subclones were sequenced using BigDye terminator chemistry (Applied Biosystems). Data were collected using ABI 3730 automated sequencers (Applied Biosystems). A total of approximately 1,200 subclones were sequenced to cover the complete BAC approximately 9-fold. Sequence data were further processed by the program PHRED (Phil Green) and assembled using the program GAP4 (Roger Staden, MRC Cambridge). To enter the assembly, all sequences had to pass a Phred score >20. After an initial round of sequencing, 17 safe contiguous segments of >2 kb in length could be assembled. The remaining gaps were closed using primer walking with custom-made sequencing primers either on pUC19 templates, or on directly on BAC DNA, in combination with the Big Dye v.3.1. terminators (Applied Biosystems).

Expression analysis

Except for a small quantity of genomic DNA, the entire fish was used for the preparation of the BAC library, leaving no material for the construction of a cDNA library as would have been desirable. Therefore, we had to indirectly test for expression of the putative MH gene loci. Total RNA was extracted from the spleen tissue of a sample of 14 additional fish from the same location (Vierer See), by the use of the NucleoSpin RNA II kit (Macherey-Nagel, Düren) including on-column DNase I digestion. Approximately 1 μg of total RNA was used for reverse transcription with Omniscript Reverse Transcriptase (Qiagen, Hilden) and anchored dV(T)21 primers. After transcription into cDNA, the 14 genotypes plus the BAC clone were subjected to SSCP as outlined above. Expression of MH class II beta genes was detected by similar fluorescent SSCP signals generated from the cDNA of wild-caught fish, using the BAC plasmid DNA as a PCR template.

Data analysis and bioinformatics tools

No full-length gene sequences are available for the classical MH genes of the three-spined stickleback. Hence, their position and the exon/intron organisation of the class II alpha and beta genes was identified via alignment using a combination of partial cDNA sequences from an EST library of the three-spined stickleback (D. Kingsley et al. unpublished, available at http://cegs.stanford.edu/blast/blast_stickleback.html), plus full-length cDNA sequences of a close relative, Stizostedion vitreum (walleye, K. Fujiki et al. unpublished, accession numbers AY158870–AY158873 for the class II alpha genes, and AY158837 and AY158838 for the beta genes).

Open reading frames other than those belonging to MH class II alpha and beta genes were identified using the gene prediction software GENSCAN (available at http://genes.mit.edu/GENSCAN.html), in combination with BLAST-P on the inferred amino acid sequences. Protein functions, if available, were predicted using ScanProsite (http://kr.expasy.org/tools/scnpsite.html), and by homology to human gene function (Weizman Institute, http://bioinfo.weizmann.ac.il/cards/). Signal peptides of putative proteins and other domains were predicted via the PSORT_II prediction program (available at http://psort.nibb.ac.jp/). Repetitive elements were identified and quantified using the RepeatMasker software (available at http://repeatmasker.genome.washington.edu/cgi-bin/repeatmasker). CpG islands were located and quantified using CpG island searcher (Takai and Jones 2003) [(available at http://www.cpgislands.com)], using the criteria outline in Takai and Jones (2002).

We also compared the MH class II genomic region of the zebrafish (Danio rerio) http://www.ensembl.org/Danio_rerio/ located on Chromosome 8 with the partial MH class II region of the three-spined stickleback. We used the local BLAST-P option to locate genes that were homologous to the candidate genes identified on the genomic segment of the three-spined stickleback.

Detection of gene conversion

The importance of (inter-locus) gene conversion for driving the diversity at MH(C) loci is still under debate (e.g. Martinsohn et al. 1999; Ohta 1999). Hence, we were interested as to whether there would be evidence for inter-locus gene conversion in the putative MH class II loci (Ohta 1999). We use the terms inter-locus recombination and gene conversion interchangeably because the resulting sequence pattern, the segmental exchange of motifs, cannot be distinguished among both processes (Richman et al. 2001; Sawyer 1999). Testing for gene conversion requires >3 sequence samples (Posada 2002). Therefore, we included an independently derived sequence data set from populations of three-spined sticklebacks from the Schleswig-Holstein area of Northern Germany (accession numbers AY687829–AY687855) with the MH sequences to be characterised using the BAC sequencing. The additional data comprised 27 partial class II beta sequences of 210 bp ranging from codon positions 28 to 89 of the PBR according to Brown et al. (1993), and reaching further into intron 2 up to base pairs 390–414 (depending on the presence/absence of indels). The presence of gene conversion was tested using a substitution model implemented in the software GENECONV (Sawyer 1999). This method has one of the highest probabilities of correctly inferring gene conversion when it is present (Posada 2002). Global and pairwise statistical tests were performed with 10,000 permutation runs. In the case of multiple tests, the software has an integrated Bonferroni correction that protects against the inflation of type I errors of the many pairwise tests that are performed in a reasonably large sequence sample. In our case, the 27 sequences analysed resulted in 27×26/2=351 pairwise comparisons. The corrected P-value was thus multiplied by 351 to obtain a nominal (global) α-value of 0.05, which is a very conservative procedure.

Results and discussion

MH class II alpha genes and beta genes: structure and the age of duplication

On the 99.5 kb segment of the partial MH region (deposited in GenBank under AY713945), we found two clusters of MH class II alpha genes and class II beta genes in a tandem arrangement 1,449 and 2,919 bp distant from one another, respectively (Fig. 1a). Both copies of the class II alpha genes and the beta genes were divided into introns and exons based on alignment with partial cDNA sequences from the stickleback, and complete coding sequence from a related teleost, the walleye Stizostedion vitreum. Other criteria were recognition sequences of splice sites. The inferred amino acid sequences of the class II alpha genes, designated Gaac-DAA*01 and Gaac-DBA*01, contained six protein domains dispersed over four exons, a leader peptide, an alpha 1 and alpha 2 domain, a connecting peptide, a transmembrane region, and a cytoplasmatic tail (Fig. 2a). In one of the copies of the class II alpha genes, the intron 3/exon 4 splice site of the Gaac-DAA gene was 27 bp downstream, resulting in a protein with a connecting peptide that is 9 amino acids shorter than the Gaac-DBA gene (Fig. 2a). Alternatively, the Gaac-DAA gene may be non-functional because of a frameshift mutation at amino acid position 200 near the 5′-end of exon 4. We can only rule this out once full-length cDNA sequences have been obtained, preferentially from the same stickleback population where the individual used for construction of the BAC library was isolated. A sequencing error in the questionable segment is unlikely as there was >12-fold sequence coverage in this specific area from the shotgun cloning.

Fig. 1
figure 1

Diagrammatic depiction of gene location a CpG-island density and b repetitive elements c in a 99.5 kb segment of the major histocompatibility class II region in three-spined stickleback (Gasterosteus aculeatus). In a genes on the plus-strand are above, while those on the complementary strand are below the line. In b the width of the bars is proportional to CpG-island length, their height to the observed/expected CpG ratio

Fig. 2
figure 2

a Inferred amino acid sequence and gene domains in two MH class II alpha genes of the three-spined stickleback (Gasterosteus aculeatus), designated Gaac-DAA and Gaac-DBA. Dashes are alignment gaps, while dots refer to identity with the topmost sequence. For comparison, the full-length amino acid sequences of the walleye, Stizostdium vitreum (based on transcribed mRNA) are also given. b The same information as in a for two MH class II beta genes, designated Gaac-DAB and Gaac-DBB. CP = connecting peptide, TM = transmembrane region, CY = Cytoplasmatic tail

Both class II beta gene copies (Gaac-DAB*01 and Gaac-DBB*01) were composed of six protein domains distributed on six exons, similar to the alpha gene except for the beta 1 and beta 2 domains that were found on exons 2 and 3, respectively (Fig. 2b). At the genomic level, we found promoter sites (TATA box) and poly(A)+ signals at the 3′ end of the gene.

Copies of the complete alpha and beta genes, including all introns plus a 200- to 300-bp segment upstream and downstream of the respective genes, were aligned. Since there was an unalignable segment of approximately 2 kb between both tandem duplications of the class II alpha and beta loci, the gene duplication was probably not made in one block, but occurred independently for each alpha and beta gene. The alignment had four and 29 indels for the alpha genes and the beta genes, respectively, most of which (70%) were single nucleotides. Most indels in the comparison of the beta gene copies were at the 3′ end of intron 2. Intron 2 also shows pronounced length variation (1,478–1,534 bp in Gaac-DAB*01 and Gaac-DBB*01; see also Sato et al. (1998). This is in line with findings by Stet and coworkers (2002), who also identified pronounced intron length polymorphism in MH class II genes of Atlantic salmon (S. salar). In the case of the stickleback, this intron length variation is probably caused by the repetitive nature of intron 2. Here, we find the motif (CCTTTTAGAA) or its modifications (CCTTGTAGAA or CC-TGTAGAA) repeated over 25 times. Additions or deletions of such minisatellite-like repeats by unequal crossing over are probably one of the reasons for pronounced length polymorphism in intron 2 (as opposed to the other introns).

When comparing both copies of the alpha genes and the beta genes, the former gene was considerably less polymorphic than the latter, in line with other recent findings from teleost species (Dixon et al. 1995; Stet et al. 2002). In class II beta genes only, the rate of replacement substitutions (dN) over silent substitutions (dS) was elevated 9-fold in the codons of the PBR found on exon 2, indicating strong positive selection (Hughes and Nei 1988), Table 1) that was statistically highly significant (P=0.009). For both paralogous class II alpha genes, we were unable to correlate the stickleback residues with those of the human PBR in HLA class II genes (Hosa-DRA; Brown et al. 1993). The silent nucleotide diversity (synonymous substitutions per synonymous site, dS) varied consistently among exons of both genes. While it was relatively high in areas of exon 2 that form parts of the peptide binding pocket (dS=0.073 and 0.097, for the alpha genes and the beta genes, respectively), silent substitution rates were considerably lower outside these areas (dS=0.014 and 0.007, respectively, Table 1).

Table 1 Number of nonsynonymous substitutions per nonsynonymous site (dN) and synonymous substitutions per synonymous site (dS) estimated for two recently diverged paralogous MHC class II alpha and beta genes, respectively, identified on a genomic segment of the three-spined stickleback. N is the number of codons, values in parentheses are standard errors estimated by bootstrapping. P is the probability value for a test of positive selection (Z-test) implemented in MEGA2, conducted for codons of exon 2 only. For the class II beta genes only (i.e. Gaac-DAB and Gaac-DBB), values were calculated separately for exon 2 codons inside and outside the peptide binding region (PBR). PBR codons were inferred from putative peptide binding residues in human class II beta genes (HLA-DRB1; Brown et al. 1993). n.t. Not tested

It is well known that silent nucleotide polymorphisms are elevated due to hitchhiking effects in the vicinity of codons subjected to diversifying selection (Hughes 2000). Therefore, we consider the dS ratios in the non-PBR codons of both genes to be the most reliable for divergence time estimates. Divergence times (T in MYA) were estimated from T=dAB/2r, where dAB is the pairwise divergence of both gene copies and r is the substitution rate (r=2.85×10−9 per site per year; Dixon et al. 1996). Accordingly, we estimate that both class II alpha copies originated from a gene duplication approximately 2.5±1.6 MYA, while the class II beta genes diverged approximately 1.2±1.2 MYA. Since the confidence intervals of both estimates overlap, the divergence time may in fact not be different. It is clear, however, that it is considerably lower than in many other species (see below). These findings of a recent age for the Gaac-DAA/DBA loci and Gaac-DAB/DBB loci are consistent with the phylogenetic analysis of 27 partial MH class II beta sequence variants from northern Germany. Here, no phylogenetically supported allelic lineages were apparent that may indicate locus affiliation of MHC class II beta sequence variants as is the case in other fish (e.g. Figueroa et al. 2000; Sültmann et al. 1994). This applied to both the phylogenetic analysis of exon polymorphisms, as well as to the intron 2 point substitution and indel polymorphisms (T.B.H. Reusch, unpublished data).

The estimated age of gene duplication in class II genes of the three-spined stickleback is the shortest that we are aware of for any classical MH(C) gene. Arguably, the standard errors of divergence time estimates are high because they are only based on 162 and 161 codons for the alpha genes and the beta genes, respectively. Nevertheless, even when assuming a true value at the upper limit of our confidence interval, we are still one order of magnitude below the age of other known paralogous MHC class II beta genes. The very young age of both paralogous gene copies is unexpected since in many other species, different MHC class II genes are much older than the class I genes within species (Piontkivska and Nei 2003). The deepest evolutionary lineages of MHC class II are found among mammalian DR-DQ and DN loci with a time to the least common ancestor of >100 MYA (Takahashi et al. 2000). Note that in mammals, even allelic lineages within the same class II locus are often older than the divergence times identified for the different class II gene loci of sticklebacks (>50 MYA, Takahashi et al. 2000). However, in some teleosts, relatively old divergence times for class II gene loci (of the order of 10 s of MYA) have been identified in zebrafish (Sültmann et al. 1994), and in barbles (Kruiswijk et al. 2004). In contrast, in salmon and brown trout, lineages of the same class I gene (UBA) are much older than the class II allelic lineages of the DAB gene (Shum et al. 2001). In conclusion, many generalisations regarding the age of the classical MH genes will need modification once more genomic data have been accumulated within teleosts, and across all vertebrates.

MH class II gene expression

We found that both MH class II beta genes identified on the genomic segment were expressed in a sample of 14 wild-caught fish captured from the same site as the individual used for the construction of the BAC library. SSCP signals in four and two individual fish, respectively, were identical among the BAC plasmid DNA and cDNA clones as the template for the Gaac-DAB and DBB genes, respectively, indicating transcription into mRNA. Thus, in contrast to the situation in zebrafish (Sültmann et al. 1994), where there are numerous silenced pseudogenes, there is no evidence for silencing of duplicated MH class II loci in stickleback. This was also shown recently in an experimental study that assessed the resistance to parasite infection in sticklebacks (Wegner et al. 2003a). In this study, cDNA was used as a PCR template for genotyping MH class II beta variants. Interestingly, the observed numerical relationship of optimal individual diversity matches almost exactly the findings from field populations that were typed based on their genomic DNA (Wegner et al. 2003b).

Other genes associated with MH region of three-spined stickleback

In addition to the two MH class II alpha genes and the beta genes, we identified seven additional putative genes using the GENSCAN prediction software (URL at http://genes.mit.edu/cgi-bin/genscanw.cgi) in combination with a homology search based on translated amino acid sequences using BLAST-P (Table 2, Fig. 1a). Assuming functional homology with other vertebrate taxa, two of these putative genes (cytokine and complement regulatory factor) are associated with immunity-related functions (Table 2). We also identified a protein of unknown function similar to an immune-associated protein (IMP4) in humans. We are aware that similarity searches without an in-depth, phylogenomic analysis may cause misleading functional predictions (Eizen and Fraser 2003), in particular when hits concern phylogenetically distant taxa. Nevertheless, an applied homology search is the starting point in any functional annotation with or without a phylogenomic perspective (Sjölander 2004), while studying possible functional switches of the non-classical MH candidate genes found in the stickleback MH region is a project in itself and beyond the scope of this paper.

Table 2 Identification of putative genes on a 99.5 kb segment of the major histocompatibility locus of the three-spined stickleback Gasterosteus aculeatus. Open reading frames were identified using GENESCAN, predicted proteins were subjected to homology searches using BLAST2. Except for complement regulatory factor, all putative genes had promoter and poly(A)+ signals. The P-value is the BLAST-P score based on the protein search, the taxon in brackets is that with the lowest P-value. + Gene orientation on the sense strand, c on the complementary strand. The GeneCard identification is available via the Weizman Institute (http://bioinfo.weizmann.ac.il/cards/)

We then intended to compare the genomic arrangement of the MH class II alpha and beta genes in the stickleback with the corresponding region in zebrafish (Danio rerio) that represents the genomically best characterised other teleost species. The functional Dare-DAA and DAB genes are located on Chr 8 and are surrounded by silenced and truncated pseudogenes (Dare-DCA, DCB, DDB; Kuroda et al. 2002). In contrast, in the stickleback, there is no evidence for pseudogenes in the 3′ region of the two pairs of expressed class II alpha genes and beta genes, while we cannot exclude that they may occur in the 5′ direction. In a local BLAST-P search, none of the other putative genes associated with class II genes in stickleback (cf. Table 2) were associated with classical class II genes in zebrafish, indicating extensive genomic translocation (Kuroda et al. 2002). Interestingly, of the seven putative genes identified in the MH class II region in stickleback, only two occurred in a linkage group in zebrafish (DKK and IMP4 on Chr 12), while the others were dispersed over several chromosomes. Differences in the genomic organisation of segments carrying classical class II genes in the stickleback thus seem even more striking than differences among medaka, fugu, and zebrafish in terms of class I gene arrangement (Matsuo et al. 2002). Note that stickleback and zebrafish had their least common ancestor 140–270 MYA , depending on the calibration of a cytochrome b molecular clock with tetrapod or mamalian evolution (K.M. Wegner and H. Schaschl, unpublished). This undoubtedly has left ample time for extensive genomic rearrangement to take place (Kuroda et al. 2002). Not surprisingly, the same comparison with the MHC class II region in the mouse and in humans also revealed that none of the genes found to be associated with the MH region in sticklebacks occured in the vicinity of the class II genes in these two mammals, supporting the view that in teleosts classical MH(C) genes are translocated to different chromosomal regions (Nonaka et al. 2001; Phillips et al. 2003; Sambrook et al. 2002).

The genomic segment carrying the classical MH II genes of the stickleback is very gene-dense even in comparison with the MH(C) region of other species. The 99.5 kb sequence characterised spanning two class II alpha genes and the two beta genes of the three-spined stickleback displayed one gene per 9 kb. This is considerably higher than the gene density in the MHC of the human genome (The MHC Sequencing Consortium 1999; one gene every 16 kb). This may partly be due to the relative compact size of the stickleback genome, comprising an estimated 700 Mb, as compared to 3,000 Mb in humans.

Repetitive elements

In total, 3.87% of the 99.5 kb genomic sequence were composed of simple sequence repeats (microsatellites), comprising one mononucleotide, 31 dinucleotide, 13 trinucleotide and six tetranucleotide motifs (Fig. 1c, Table 3). Low complexity areas covered 2.57% of the sequence, composed mainly of AT-rich regions (Table 3). With more than one microsatellite location per 2 kb, the analysed MH class II segment is very rich in microsatellite sequence motifs compared to other vertebrates (Goldstein and Schlötterer 1999). A preliminary assay testing seven of these microsatellites in an array of 14 fish from a single population revealed polymorphisms in five fish that may be linked to MHC haplotype polymorphism (Table 4). All microsatellite primers occasionally yielded no amplification product at all, suggesting the presence of abundant null alleles. More likely, however, is that the neighbouring MH locus that the specific microsatellite is linked to may be absent. This may be an indication of inter-haplotype differences in locus duplication number. Microsatellite typing may represent an interesting alternative for haplotype identification and segregation analysis of MH class II genotypes in three-spined sticklebacks. A complete list of annotated repetitive elements comes along with the annotated sequence deposited in GenBank.

Table 3 Repetitive elements in a 99.5 kb segment of the major histocompatibility class II region of the three-spined stickleback
Table 4 Microsatellites located in intergenic parts close to MH class II alpha and beta genes in the three-spined stickleback. The location of Gaac-DAA/DBA (except the UTR) is between nucleotides 6,294–7,782 and 32,242–33,729; and of Gaac-DBA/DBB is between positions 9,231–12,017 and 36,648–39,472, respectively. Location Start of the repeat region on BAC, n=number of alleles observed in a sample of 14 wild-caught genotypes from Vierer See

We also identified four short interspersed nucleotide elements (SINEs) that each revealed homology to chichlids (1), salmon (2), and shark (1). In addition, we found one 2,023-bp long LINE element composed of four pseudo-exonic regions that is homologous (BLAST-X P-value=6×10−129) to the Pol-like protein recently identified in the green pufferfish (Tetraodon nigroviridis, accession number AB097135).

CpG islands and gene conversion

The MH class II segment we studied was rich in CpG islands, stretches of the genome that exceed the expected ratio of C posterior to G nucleotides, which is normally suppressed to a ratio of about 0.2. Thirty percent of the MH class II region sequenced was composed of a total of 69 CpG islands >200 bp in length, with a GC content >0.55, and an expected to observed CpG ratio of >0.65 (Fig. 1b). Although these criteria are relatively stringent (Takai and Jones 2002), we find 10- to 20-fold more CpG islands in the three-spined stickleback MH class II region than in the human, mouse or Drosophila genome. The frequency of CpG islands in the stickleback MH class II region is also approximately five times higher than in the medaka (Oryza latipes) MH class I region (Matsuo et al. 2002), although their genome size is roughly identical (600 Mb for the medaka, 700 Mb for the three-spined stickleback). Whether a higher CpG island frequency is a mere consequence of the denser packing of genes in the stickleback class II region as compared to medaka class I region remains to be elucidated.

CpG islands are involved in the regulation of expression of genes (Antequera 2003) and may thus indicate the presence of promoter sites in the MH class II segment sequenced. This may particularly apply to the putative genes SMS2, SCYE1 (cytokine), DKK2, and CFI-B (complement regulatory factor) identified on the genomic segment (Fig. 1a, b).

Even more interestingly, gene conversion in MH class II genes has been associated with CpG islands (Högstrand and Böhme 1999). These findings are consistent with an analysis of gene conversion in this study based on sequence substitution patterns. We used a previously generated sequence sample of a total of 27 MH class II beta sequences isolated from the same geographical region as the fish that was the source of the BAC library. Partial MH class II beta sequences comprised approximately 70% of the peptide binding region of the beta chain of the class II molecule, plus an adjacent 400 bp segment of intron 2. Note that the two MH class II variants identified on the genomic segment were independently identified. In this representative sample of sequences we detected gene conversion according to a substitution model implemented in the software GENECONV (Sawyer 1999). This result was highly significant in 10,000 non-parametric permutation runs (P<0.0001), Further pairwise significance testing revealed that both the Gaac-DAB and Gaac-DBB genes identified here participated in a pairwise gene conversion event (Bonferroni adjusted P=0.007). To our knowledge, this is the first direct evidence in any vertebrate species that two alleles from unambiguously defined paralogous loci have exchanged sequence motifs in the recent past. With better bioinformatics software at hand (Posada 2002), the evidence for an important role of intragenic and possibly intergenic recombination for the origin of MHC sequence polymorphism is accumulating from population sequence data (Langefors et al. 2001b; Richman et al. 2001, 2003), supporting direct experimental observation using sperm typing (Högstrand and Böhme 1994). Inter-locus gene conversion will increase the allelic diversity and divergence (Ohta 1999), thereby providing an additional mechanism for the generation of the striking polymorphism of the classical MH genes.

Conclusions

Birds also do not share many of the common features of mammalian MHC genomic organisation, such as old age of MHC lineages and trans-specific evolution. The recently observed similarity among avian MHC loci was tentatively attributed to two different (non-exclusive) evolutionary scenarios: the “recent duplication” and the “concerted evolution” model (Hess and Edwards 2002). Unfortunately, the available data are not yet sufficient to discriminate between the two models in teleost fish. In the three-spined stickleback, the inter-locus gene conversion we detected will influence the pairwise nucleotide polymorphism and estimates of divergence times (Hughes 2000). It is not easy to predict whether the bias will be upwards or downwards in terms of the timing of the gene duplication, because the selection regime within the genomic segment of interest will determine whether recombination leads to genetic homogenisation (under no selection and purifying selection), or will increase the diversity (Ohta 1999). However, since low rates of synonymous substitutions are spread throughout the entire class II alpha gene and beta gene, respectively, we consider it unlikely that our divergence time estimates of the gene duplication event are strongly biased.

Given the relative recency of the gene duplication, in conjunction with evidence for inter-locus gene conversion, it becomes clear that the development of a single-locus genotyping technique in sticklebacks will be very difficult. The alternative route is a motif-specific PCR typing technique that tries to PCR amplify certain motifs from the MH sequences, a route that has already been taken (Reusch et al. 2001; Richardson and Westerdahl 2003). MHC researchers are becoming increasingly aware that our ignorance of exact locus designation may be a normal situation in non-model species that do not share the ancient MHC gene and allele lineages found in mammals (see, for example, Kruiswijk et al. 2004; Westerdahl et al. 2000) and may be evolutionarily even more dynamic than already acknowledged (Dawkins et al. 1999; Shum et al. 2001; Hess and Edwards 2002; Stet et al. 2003).