Introduction

Positive correlations between allozyme heterozygosity and fitness-related traits have been recognized for decades in organisms as diverse as plants (Ledig, 1986), vertebrates (Mitton & Grant, 1984; Danzmann et al., 1988) or marine molluscs (Zouros, 1987). Heterozygosity-fitness correlations (HFC) historically appeared as an opportunity to develop marker-based genetics, an approach boosted by the discovery of allozyme polymorphism (Lewontin & Hubby, 1966), to address two main issues. The first issue was the genetic basis of inbreeding depression and heterosis. Geneticists had known since as early as 1910–20 that manipulations of genomic heterozygosity (using inbred lines) could dramatically affect fitness components and crop yields (Shull, 1952). HFC provided a potential ‘microscopic’ basis of the ‘macroscopic’ effects observed using inbred strains. Although the relationship between heterozygosity at the genome level and fitness was not in doubt, the question came in identifying the particular genes involved, and how they affected fitness. Genes showing HFC were obviously good candidates. The second issue was the neutralist-selectionist controversy, focused on allozyme polymorphisms. If allozymes were indeed under selection, then Kimura's (1983) theory of transient neutral polymorphism could not apply. On the other hand, maintaining most allozyme polymorphisms by selection would imply a huge genetic load hardly sustainable by natural populations (Lewontin & Hubby, 1966).

The outcome of over 20 years of research seems disappointing as neither of the two issues has been solved. However, progress has been achieved by the accumulation of data, the clarification of concepts and the development of a theoretical background. Rather than an exhaustive compilation of the literature, I provide an account of this progress. I examine (i) the nature, importance and limitations of the evidence for HFC, and (ii) the possible origins of HFC.

Definitions

Terms such as ‘overdominant phenotype’ (Zouros et al., 1980) or ‘marker-associated heterosis’ (e.g. Zouros & Foltz, 1987; Houle, 1989; David et al., 1995), although widely used, are misleading as they tend to identify HFC with one of its possible causes (overdominance) or with a genomic property that may, or may not, reflect the same causes as HFC (heterosis). The definitions used here are in Box 1. The concept of ‘fitness-related trait’ (or fitness trait) is central but has unclear boundaries. Only estimators of growth, survival, or fecundity (including male mating success), assumed to affect fitness directly, will be considered. Other phenotypic variables, such as the deviation of a morphology from the population mean, do not belong obviously to this category. Therefore correlations between heterozygosity and morphological variance (including the case of fluctuating asymmetry) will not be treated.

The evidence for HFC

Reviewing the evidence for HFC means answering three questions. Does HFC exist? If so, is it quantitatively important? Finally is it consistent across samples? The existence of HFC seems widely accepted, based on previous reviews (Mitton & Grant, 1984; Zouros & Foltz, 1987). However, numerous published studies yielded null results (e.g. Houle, 1989; Booth et al., 1990; Elliott & Pierce, 1992; Whitlock, 1993; Savolainen & Hedrick, 1995) and there are possibly more due to publication bias in favour of significant correlations. Recently, Britten (1996) concluded from a meta-analysis that HFC was on the whole significant, and would remain so even considering a reasonable number of unpublished null results. However, most results are not significant or only weakly so (Britten, 1996). The overall significance results from a few major studies, mainly on bivalve molluscs and pine trees. Among trees, only Pinus rigida shows strong HFC, whereas among molluscs, a study on Mulinia lateralis (Koehn et al., 1988; Gaffney et al., 1990) dominates most other studies by its large sample size. Null results may have two nonexclusive origins: (i) the actual effects are small and often remain undetected or (ii) HFC is restricted to some species or to some samples of a given species.

HFC is quantified by the variance in a fitness trait explained by its regression on heterozygosity (r2). Most estimates yield an order of magnitude of 0.01–0.05. However, sample size (N) must be considered as, even if the true correlation is zero, an average value of 1/(N−1) is expected by chance (Sokal & Rohlf, 1995). As most samples are of the order 102, 1/(N−1) is of the same magnitude as the r2 values observed. For multiple regressions (L loci as separate ‘heterozygosity’ variables) the null expectation for R2 is L/(N−1), a serious problem if L is not <<N (e.g. Bush et al., 1987). In conclusion, HFC is weak. Sample sizes of hundreds are insufficient, thousands are needed, so that the signal is approximately one order of magnitude higher than the noise. Only a handful of studies have such sample sizes (including Samollow, 1980; Zouros et al., 1980; Koehn et al., 1988; David & Jarne, 1997b).

Whether HFC is restricted to some species is difficult to test, the taxonomic range explored being hugely biased. However, the overall lack of significant correlations in natural populations of such a model organism as Drosophila (Houle, 1989) and in many other species, even among bivalves or pines (Britten, 1996), strongly suggests that HFC is not universal. The fact that researchers concentrate their efforts on a few taxa reflects their belief that only some species show HFC. What characterizes these taxa is an interesting though difficult question. Houle (1989) suggests that they may be partially inbred. At first sight, this does not fit bivalves, whose external fertilization, large populations, and extensive larval migration, leave little room for inbreeding. However, the hypothesis that bivalves occur in large, panmictic populations has been recently challenged by observations of small-scale temporal (Hedgecock, 1994) and spatial (David & Jarne, 1997b) genetic heterogeneity.

Not only is HFC variable among species but also among samples of the same species. Gaffney et al. (1990) showed a striking lack of repeatability of HFC in Mytilus edulis, and David & Jarne (1997b) documented significant variation in HFC among cohorts and sites in the bivalve Spisula ovalis. Three sources may contribute to such variation: (i) environmental stress enhances HFC (Danzmann et al., 1988), although too much stress suppresses it (Scott & Koehn, 1990; Audo & Diehl, 1995); (ii) HFC decreases with age (David & Jarne, 1997b) because growth and survival differences are maximal early in life, and because unfit genotypes are selectively eliminated in ageing cohorts (Koehn & Gaffney, 1984); and (iii) different samples may have different genetic backgrounds and consequently different genetic variances for fitness traits (for example, they may not be inbred to the same degree). The effect of (ii) is controlled when age is known, but (i) and (iii) are hard to separate as natural populations differ both genetically and ecologically. (i) has received empirical support from artificial stress experiments (Scott & Koehn, 1990; Audo & Diehl, 1995). (iii) is illustrated, in bivalves, by the presence of HFC in wild populations, though not in laboratory stocks originated with two or a few genitors (see below).

Limits of the data: the phenotypic side

Another important limitation of the heterozygosity-fitness dataset is the lack of homogeneity in the fitness trait used. Growth and survival are important fitness components, especially in indeterminate growers with size-dependent fecundity (e.g. trees, marine bivalves). However, they are measured in many different ways. When age is unknown, body size reflects age as well as growth (David et al., 1995), except in determinate growers (e.g. insects). Even when age is known, the growth index used may be more or less informative. Many studies use ‘size-at-age’ (proportional to S/t, where S is size, and t is age), which is an average growth rate from birth to collection. Growth rate is more precisely estimated when size is measured at two times allowing the computation of absolute growth rate dS/dt (Gaffney et al., 1990) or relative growth rate dS/Sdt (Diehl and Audo 1995). The best indices summarize the whole individual growth history using growth rings (trees) or growth lines (bivalve shells), to compute, respectively, mean surface increments (e.g. Bush et al., 1987) or von Bertalanffy parameters (e.g. David et al., 1995).

Estimates of reproductive biomass (e.g. Savolainen & Hedrick, 1995) can also be used as fitness traits, although instantaneous estimates of fecundity may be obscured by variation in the timing of reproduction. Moreover, as fecundity is usually size-dependent, the same caveats apply as for growth estimates.

Survival is less easy to handle than growth, as individual survival can rarely be estimated. Studies recording individual time to death (Borsa et al., 1992) remain exceptional. Heterozygosity-viability correlations, reviewed in David & Jarne (1997a), are usually inferred from heterozygosity-age correlations among individuals caught at the same time (e.g. Schaal & Levin, 1976). These results are insufficient, as the differences between age classes may have been present from birth, reflecting temporal population structure rather than differential survival of genotypes. Indeed, slight but significant genetic differences among cohorts are present from recruitment in marine molluscs (Beaumont, 1982; Johnson & Black, 1984). Only when the same cohort is sampled at different times can one compute proper estimates of genotype-specific relative survival rates. In the few studies providing such evidence, genotypes clearly survived differently, but the effect of heterozygosity was weak and even absent during some time intervals (Samollow, 1980; David & Jarne, 1997a). Allele- or genotype-specific effects were prominent though unpredictible, with heterozygote advantage emerging as a slight overall tendency. Therefore heterozygosity-viability correlations are not as well documented as heterozygosity-growth correlations.

Limits of the data: the genotypic side

For a long time allozymes have been the most reliable and convenient genetic markers. This situation is now changing with the development of PCR-based codominant markers such as microsatellites, anonymous RFLP and intron length polymorphisms (Mitton, 1994). Two questions are of major concern: (i) what are the qualities of allozymes as genetic markers and (ii) is HFC allozyme-specific or does it affect any marker? For various reasons allozymes can provide incorrect information about genotypes. Heterozygosity may be underestimated due to lack of resolution (alleles with similar mobility appear as the same allele), somatic aneuploidy, or null alleles. Resolution is usually maximized by choosing an adequate buffer (Beaumont & Beveridge, 1983), but the two other problems have no technical solution. Aneuploidy, detected in bivalve gills (Thiriot-Quiévreux, 1986), could generate HFC, if the loss of a chromosomal segment at once reduces fitness and generates apparent homozygosity. Although negatively correlated with growth in Crassostrea virginica, aneuploidy, affecting small patches of cells, does not modify electrophoretic patterns in this species and therefore does not bias HFC (Zouros et al., 1996). Null alleles could similarly generate apparent HFC, if null heterozygotes (scored as homozygotes) have reduced fitness. Although detected in considerable frequencies in species showing HFC, null heterozygotes do not show decreased performances (Gaffney, 1994). Even so, they may partly obscure HFC as heterozygosity is underestimated.

Unfortunately, most PCR-based markers also have null alleles and, even worse, apparent heterozygosity may depend on PCR conditions (Hare et al., 1996). Most molecular markers have the same inherent defects as allozymes. However, PCR markers, unlike allozymes, allow scoring of very small organisms (such as bivalve larvae) at many polymorphic loci. Very polymorphic markers (microsatellites) seem unsuitable as almost all individuals will be heterozygous. However, microsatellites provide estimates of the quantity of divergence between two alleles (specifically, the difference in repeat number, under a stepwise mutation model), an interesting alternative to the traditional heterozygosity (in which alleles are classified as either identical or different). Moreover molecular markers can be used in conjunction with allozymes, in order to compare the latter with a priori neutral loci. Such experiments allow evaluation of possible causes of HFC other than null alleles or aneuploidy (see below).

The origins of HFC: alternative hypotheses

Two questions can be asked. First, what is the relationship between the marker loci actually scored and the agent loci (i.e. loci directly contributing to the observed phenotypic variation)? Second, are the agent loci dominant or overdominant?

If marker and agent loci are the same, the allozyme polymorphism observed is under direct selection. Overdominance and partial or total dominance for the fitness trait may produce HFC, as in both cases the heterozygote has a larger fitness than the average homozygote. However, without overdominance, the polymorphism is quickly eliminated. The same holds in multilocus systems (Turelli & Ginzburg, 1983). Therefore, the hypothesis of direct effects for HFC reduces to direct overdominance at the loci scored. Direct dominance would be possible only with very high mutation rates (deleterious allozymes maintained at mutation-selection equilibrium), which could be the case for null alleles.

When marker and agent loci are distinct, the term ‘associative overdominance’ is often used (Ohta, 1971). In this case marker loci reflect heterozygosity at agent loci through a genetic correlation, which may be linkage disequilibrium (nonrandom associations of alleles in gametes) or identity disequilibrium (nonrandom association of diploid genotypes in zygotes, e.g. more multiheterozygotes and multihomozygotes than expected from single-locus heterozygosities). Weak linkage disequilibrium occurs in finite populations, as a result of genetic drift (Hill & Robertson, 1968), whereas identity disequilibria are mainly generated by partial inbreeding (Weir & Cockerham, 1973). The effects of linkage disequilibria are restricted mainly to a narrow chromosomal segment around the target locus and vanish with increasing genetic distance. This hypothesis was therefore categorized as a local effect (David et al., 1995). Identity disequilibria are relatively insensitive to linkage: at inbreeding equilibrium their value for two completely linked loci is only twice that of independent loci (Weir & Cockerham, 1973). Therefore, inbreeding generates correlations among all loci of the genome. This is referred to as a general effect. Local effects due to linkage desequilibria were modelled by Ohta (1971) when the agent locus is at equilibrium under selection and recurrent mutation to partially or fully recessive deleterious alleles. However, overdominant agent loci produce similar effects. Under general effects, the source of variation in fitness is inbreeding depression, which can also rely on overdominant or dominant loci (Charlesworth, 1991). The logic behind indirect effects is that heterozygosity at a single locus reflects variation of heterozygosity at the level of the genome (or a part of it). However, with realistic values for linkage or identity disequilibrium, the representativity of one or a few marker loci remains very small (Chakraborty, 1987) and the observed weakness of HFC is therefore expected. If disequilibria are zero (large, random mating populations), marker loci only represent themselves, and HFC reduces to their direct contribution to the phenotype (Smouse, 1986).

Empirical arguments for alternative hypotheses

The main alternatives discussed in the literature are direct overdominance vs. other hypotheses. A straightforward approach is functional dissection of allozymes. Many researchers attempted to establish links between enzyme heterozygosity and phenotypic performance through cascades of enzymatic, metabolic, and ultimately physiological mechanisms. In some cases, enzymological parameters of heterozygotes stand outside the range between corresponding homozygotes, suggesting overdominance (Pogson, 1991; Sarver et al., 1992). However, overdominance for enzymatic activity is not overdominance for fitness, as shown by the complex metabolic models of Clark (1991). Zouros & Foltz (1987) even suggest that overdominance for fitness could be achieved by enzymological intermediacy of heterozygotes (buffering effect on metabolism). Enzymological studies are therefore insufficient. In a few cases the whole path from enzymological properties to fitness is known, such as LDH in Fundulus heteroclitus (DiMichele and Powers 1982), PGI in Colias butterflies (Watt, 1977), and LAP in the mussel Mytilus edulis (Koehn et al. 1980). Clearly, these polymorphisms are not neutral. However, their relevance to HFC is unclear, first because we do not know how general these results are (non neutral situations may be studied in priority) and second, because overdominance does not appear to be a major factor. In the three examples cited, polymorphism is maintained by contrasted selection regimes in different environments rather than heterozygote advantage in a single environment. PGI (in Colias) is an exception, but only one pair of alleles (among six different alleles) actually shows overdominance. Therefore, the functional approach tells us that some allozyme polymorphisms are definitely non neutral, but provide little support for the hypothesis that overdominance generates HFC.

The ‘functional’ logic can also be followed using a comparative approach. Direct effects of allozymes on growth may rely on energetic metabolism. Therefore, HFC should be detected mainly in enzymes playing key-roles in energetic metabolism. In Mulinia lateralis, Koehn et al. (1988) found that enzymes with large effects on growth belong to glycolytic or protein-catabolic pathways. However, the notion of key-role in energetic metabolism is unclear and prone to a posteriori adjustment. This problem can now be addressed by extending the same comparative logic to new classes of marker loci. Unlike differences among unimportant enzymes and key-enzymes, differences between genes and anonymous sequences are very clear a priori and can hardly be discussed. It is expected that DNA sequences with no known protein product will not show HFC, whereas enzymatic loci will. This has been tested by Pogson & Zouros (1994) using enzymatic and anonymous RFLP loci in scallops (Placopecten magellanicus). HFC was significant for enzymes (considered as a group), and not for RFLPs (idem), supporting the hypothesis of direct effect of allozyme heterozygosity on growth. However, the difference between the two classes of markers, as tested by a permutation test over loci, is only marginally significant (P=0.05), and a closer look at locus-specific effects shows that the largest effect is associated with an RFLP locus. A further significant problem under the comparative approach is that the hypothesis that differences in HFC among loci represent mere sampling variance has not been rejected. A simple statistical test of this hypothesis has now been designed (test A in David, 1997). Its application to a large sample of the bivalve Spisula ovalis showed nonsignificant variations in HFC among allozyme loci (David & Jarne, 1997b). More tests of this kind should be performed. It would be interesting as well to test the consistency of locus-specific effects across samples or populations. In summary, the comparative approach has not given a definitive answer but we now have the necessary tools (molecular markers and statistical tests) to obtain an answer. Experiments such as Pogson & Zouros's (1994) should be replicated and analysed in the proper statistical framework.

A different way to identify the causes of HFC is to consider the role of the genetic background. Under direct overdominance, background fitness variation is mere noise, decreasing the detectability of allozyme-dependent effects. Therefore, HFC should appear stronger in homogeneous genetic backgrounds, e.g. among offspring from a single pair-cross or within laboratory populations with reduced effective sizes. On the other hand, indirect effects rely on variation in genetic background. In first-generation offspring of pair-crosses or mass-matings, HFC due to indirect effects should be reduced or absent, as (i) all individuals tend to have similar genetic loads and (ii) inbreeding (and therefore identity desequilibrium) is generally excluded by the experimentator's control. The available evidence on noninbred laboratory strains or offspring of single pair-matings indeed shows that HFC is rare or absent under these conditions (Adamkewicz et al., 1984; Gaffney & Scott, 1984; Danzmann et al., 1988; Dubrova et al., 1995), suggesting indirect effects. However, the number of such studies with large sample sizes (1000) is small. The case of inbred pair-crosses such as controlled selfing (Strauss, 1986) is different, as a strong HFC is readily detected in this situation. These crosses interestingly allow the evaluation of inbreeding depression, and of eventual variation in the linked load among loci. However, the contribution of the marker loci themselves remains unknown. Even if the load is mainly borne by neighbouring chromosomal regions (which is likely), this does not prove that it is involved in HFC (indirect effects) in natural populations, but merely shows its existence (necessary but not sufficient for indirect effects).

Modelling HFC

The causes of HFC can ultimately be addressed by comparing data with theoretical expectations based on mathematical models of different hypotheses. Although the possibility of indirect effects has been analytically demonstrated (Ohta, 1971) and computer-simulated (e.g. Charlesworth, 1991), the first model providing a statistical test usable on real datasets was based on the overdominance hypothesis. Under overdominance and random mating, Smouse (1986) predicted a linear relationship between fitness traits and a genotypic variable called adaptive distance (Table 1). This was later extended to nonrandom mating populations (Houle, 1994). This relationship can be detected by regressing the logarithm of the fitness trait on adaptive distances. Significant regressions (more exactly, better fit for regressions on adaptive distances than on heterozygosity) were therefore initially thought to be evidence for overdominance (Bush et al., 1987). However, Houle (1994) showed that indirect effects should result in the same linear relationship in the one-locus case, casting doubt on the usefulness of the model to distinguish among competing hypotheses. Recently, David (1997) designed three tests (A, B and C) theoretically able to distinguish direct overdominance from general effects due to inbreeding. Test A is a formal test of the comparative approach, detecting differences in HFC among loci, expected under overdominance but not under inbreeding. Test B is derived from an extension of Houle (1994)'s analysis to multiple loci. The adaptive distance model is optimized for overdominance. Symmetrically, another model (‘inbreeding’ model, see Table 1) was optimized for inbreeding. Although equivalent in the one-locus case (Houle, 1994), they differ as soon as multiple loci are involved. Test C is based on the relationship between heterozygosity and variance in the fitness trait, negative under inbreeding, but not predicted by the overdominant model. These tests have been applied to a bivalve dataset (David & Jarne, 1997b). Although partly compromised by inherent defects of allozyme markers (null alleles) or of the models used (the need to pool alleles to compute adaptive distances), all three tests were consistent with the inbreeding hypothesis.

Table 1 Genetic variables used to predict fitness from the genotype at a locus with two alleles A & B with estimated frequencies p and q, respectively. Formulae are given for populations where the inbreeding rate (s) is small. For the general case, Houle (1994) and David (1997) give formulae using s estimated from the data

In conclusion, it is too early to identify the causes of HFC, although empirical arguments often point towards one direction or the other. Furthermore, a single explanation may not prevail in all cases. However, we are now in a better position than a decade ago. First, clear hypotheses have been phrased and the terminology clarified, avoiding confusion between hypotheses and observations. For example, the debate on the causes of heterosis is related, but different, to the question of the neutrality of allozyme variation. The demonstration of fitness differences determined by allozymic genotypes is not a proof that these differences produce HFC. Second, new empirical (molecular markers) and statistical tools have been designed. Together with the use of large sample sizes (of the order of 103) and precise phenotypic descriptors (especially, controlled for age differences between individuals) they may give clear answers in the next few years.