Main

Quantitative characters have been a major area of genetical study for over a century because they are a common feature of natural variation in populations of all eukaryotes. They are typical of commercially important traits in crop plants and domestic animals as well as in vital traits in humans from hypertension to intelligence. First attempts at studying them stem from the work of Galton (1889) on man before the rediscovery of Mendel, through the pioneering work of Fisher (1918), Wright (1934), Mather (1949) and Falconer (1960) to the new era opened up by developments in molecular biology over the last 15 years (Tanksley, 1993).

For most of the period up to 1980, the study of quantitative traits has involved statistical techniques based on the means, variances and covariances of relatives, with no actual knowledge of the number and location of the genes that underlie them, termed polygenes by Mather (1949). It was sufficient to assume that there were several genes segregating in a given population and that these genes would share the properties of transmission, gene action and interaction of other genes, but that their individual allelic differences were smallish relative to the effects of the environment. However, on such a minimalist or ‘black box’ basis, considerable progress was made in advancing our knowledge of the genetics of these traits, in understanding processes such as heterosis and in predicting response to selection.

Much early progress was made to locate polygenes in Drosophila for characters such as bristle number and viability using major gene markers by Breese & Mather (1957), Spickett & Thoday (1966) and Shrimpton & Robertson (1988). They showed that there were several genes segregating, that their effects varied from gene to gene and that they interacted with each other. These genes were located with varying precision and some of them were close to known major candidate genes (Shrimpton & Robertson, 1988). Similar progress was made in wheat using aneuploidy as a device to manipulate and fix polygenes of interest, so transferring the techniques of the model organism to a crop plant (Law et al., 1983).

However, the methodologies were somewhat laborious and based on major mutants, either phenotypic or cytological, which made it difficult if not impossible to study populations. These impediments were removed by two developments in the 1980s. The first was the discovery of extensive, yet easily visualized, variability at the DNA level which could be used as markers in most natural populations. The second was a catchy acronym, QTL, for Quantitative Trait Loci, which appeared to liberate the subject from the earlier heavy statistical associations with polygenes (Gelderman, 1975). Initially RFLPs were used as markers (Beckmann & Soller, 1983; Lander & Botstein, 1989), but these were followed by PCR markers such as RAPDs, microsatellites and AFLPs which were cheaper, safer and provided more markers per unit of DNA (Westman & Kresovich, 1997). These polymorphic markers provided the framework map around which the polygenes/QTL could be located. Because they were polymorphic in (nearly) all populations, including humans, potentially any population was open to QTL analysis.

To what questions are these analyses expected to provide answers? Essentially they fall into two classes. Firstly there are fundamental questions about the nature of the QTL such as where they are, what they do, how they act and interact, i.e. to look inside the ‘black box’. Secondly, there are the applied questions of medicine and agriculture; can the QTL be manipulated either by some form of diagnosis or therapy in humans or by marker assisted breeding in crops and farm animals?

Techniques of QTL analysis

QTL analysis is predicated on looking for associations between the quantitative trait and the marker alleles segregating in the population. It has two essential stages; the mapping of the markers and the association of the trait with the markers. Both of these stages require accurate data plus statistical software. The basic theory underlying marker mapping has been available since the 1920s (see Mather, 1938), but had to be extended to handle hundreds of markers simultaneously. Although slightly different algorithms are used in the final stages to ‘smooth’ the results to fit the multiple marker information, the maps produced are very similar (Lander et al., 1987; Stam, 1993). The quality of the marker data from the segregating population is critical for these analyses, however. There was considerable surprise when the map lengths of species such as maize and wheat suddenly increased with the advent of molecular markers beyond the lengths predicted from chiasma frequency (Nilsson et al., 1993). This was soon attributed to genotyping errors, particularly where laboratory workers wrongly interpreted results which were ambiguous. It has been shown that a 3 per cent error rate in genotyping can double the genetic map length (Brzustowicz et al., 1993).

Most QTL analyses in plants involved populations derived from pure lines and several approaches have been developed to associate QTL with molecular markers in such populations (Kearsey & Pooni, 1996). The basic problem is that the trait score of a particular genotype is a single value resulting from the combined allelic effects of many genes and the environment. Two individuals could have the same genotype but a different phenotype or vice versa. The earliest approach to the problem, and in many ways still a very useful first step, was to look at all individual associations between marker and phenotype. If at any particular marker, M, in an F2 for example, the individuals which were homozygous M1 M1 were significantly taller on average than those which were M2 M2, then it could be deduced that there was a QTL affecting height linked to this marker. There are three problems with this approach. First, false positives will occur if the significance level is set too low. Second, because all genes on a chromosome will show some linkage among themselves, any one QTL will be associated with several markers. Third, because the QTL will not necessarily be allelic with any given marker, its exact position and its effect can not be known, although the strongest association will be with the closest marker.

Interval mapping was introduced to overcome many of these problems. Intervals between adjacent pairs of markers along a chromosome are scanned and the likelihood profile of a QTL being at any particular point in each interval is determined; or to be more precise, the log of the ratio of the likelihoods (LOD) of there being one vs. no QTL at a particular point (Lander & Botstein, 1989). Those maxima in the profile which exceed a specified significance level, indicate the likely sites of QTL. Significance levels have to be adjusted to avoid false positives resulting from multiple tests, while confidence intervals are set as the map interval corresponding to a 1 LOD decline either side of the peak. This has been the most widely used approach, particularly for those working with populations derived from inbred parents, because the software package MAPMAKER/QTL (Whitehead Institute, 1993) was freely available. There have been numerous theoretical studies of the efficiency and accuracy of the approach but these have resulted in no major revisions (van Ooijen, 1992; Churchill & Doerge, 1994; Mangin et al., 1994).

An alternative approach using multiple regression was developed by Haley & Knott (1992). It produces very similar results to LOD mapping both in terms of accuracy and precision, but has the advantages of speed and simplicity of programming. It has been adapted to handle complex pedigrees and to include a wide range of fixed effects in the model such as sex differences and environments. Tests of significance and confidence intervals can be obtained by bootstrapping approaches (Visscher et al., 1996; Lebreton & Visscher, 1998).

A third approach, Marker Regression, fits a model to all the marker means on a given chromosome simultaneously, and obtains significance tests by weighted least-squares or by simulation (Kearsey & Hyne, 1994; http://web.bham.ac.uk/g.g.seaton/) It has the advantage of speed and of integrating all the marker information in a single test. It can be shown that, if there is just one QTL on a chromosome, all the necessary information to locate and measure the effects of that QTL are available from the markers which flank that QTL. However, because you do not know which markers flank the QTL nor that there is just one QTL per chromosome, the multiple marker approach does provide an overall test of the model, no matter how the QTL are organized on the chromosome. Recently a similar approach has been attempted with LODs, to see if a fitted QTL does match the overall chromosome LOD profile (Hackett, 1997).

It has long been clear that the confidence intervals (CI) associated with QTL locations in segregating populations are large (van Ooijen, 1992; Darvasi et al., 1993; Hyne et al., 1995). The reliability depends on the heritability of the individual QTL. Given a typical trait with an overall broad heritability of 50 per cent or less, the individual QTL will have heritabilities of but a fraction of this 50 per cent. Thus with 5 equally sized QTL, each can only have a heritability of 10 per cent. Simulations have shown that the 95 per cent CI of such a QTL in an F2 population of 300 individuals is more than 30 cM while it is very difficult to reduce the CI to much less than 10 cM even for a very highly heritable QTL; more markers beyond a density of one every 15 cM do not help much. These distances should be viewed in the context that, on average, a chromosome is about 100 cM long.

Several approaches have been explored to overcome this problem. Increasing the number of genotypes is the most efficient way of improving precision and is easy to achieve with F2 or backcross populations. However, for many agronomically important traits, like yield or quality, such populations are not very relevant because they would have to be scored as spaced plants in a heterogeneous mixture. Such scores would bear little similarity to those of similar genotypes raised as dense commercial monocultures. So it is necessary to use plot trials with Recombinant Inbred Lines (RILs) or Double Haploid (DH) lines where 500–1000 plots with replication would occupy excessive trial space.

The heritability of individual QTL can be enhanced in two ways. Firstly the environmental variation can be minimized by having many replicates of each individual, as can be achieved easily with RILs or DH lines. Secondly the residual variation due to other QTL can be identified and removed from the error. The latter is the basis of the ‘MQM’ approach of Jansen, in which after an initial scan of the genome to identify QTL, the effects of these QTL are absorbed and removed from the error, so increasing the power of the individual tests (Jansen, 1993; Jansen & Stam, 1994). These two approaches have been combined to analyse data on dormancy and flowering using RILs in Arabidopsis, with the result that CIs have been reduced and more QTL than normal identified (van der Schaar et al., 1997). However, even by these approaches, the CI have still not been reduced to much less than 10 cM, and then only for the QTL with the largest effects. These results were based on carefully controlled greenhouse and growth room trials. Such reductions are unlikely in plot trials of crop plants, where more replication can result in greater environmental heterogeneity. In Arabidopsis and wheat 10 cM equates to 300 kbp and 6000 kbp of DNA, respectively!

Despite considerable effort by statisticians, all methods of analysis yield essentially similar QTL locations and gene effects, while there is some slight variation in the CIs which can be obtained. It is difficult to demonstrate the existence of more than three QTL per chromosome because of these wide CI, so it would be hard to distinguish linkage from pleiotropy (Lebreton & Haley, 1998). However, new approaches are being explored such as attempting to mimic replication by using correlated or multiple environments (Korol et al., 1995; Ronin et al., 1995) and using Markov Chain Monte Carlo approaches (Guo & Thompson, 1992). Although the results of these approaches are awaited with interest, it would seem unlikely that any quantum shift in reliability can be achieved from segregating populations of this sort and at best we shall see small reductions in the CI and precision of the tests of significance. These limitations are largely due to the low chiasma frequency per chromosome, around two on average, which limits recombination and hence QTL resolution (Kearsey & Pooni, 1996).

There is another problem associated with the methodologies considered above which is difficult to avoid; statistical bias. First, the true number of QTL will be underestimated because only the few QTL of large effect will be detected. If there are a large number of QTL, by definition they can not all have a large effect or individually constitute a large proportion of the genetical variation. It can be shown that it is difficult to locate more than 12 QTL in any given population at any one time, and generally far fewer (Hyne & Kearsey, 1995). Secondly, because only significant effects are reported, published QTL effects will be biased towards larger values; the more stringent the significance level, the greater the bias (Georges et al., 1995; Burns, 1997). It is not the estimation procedures which are biased, it is the fact that only the significant estimates are used; the poorer the power of the test the greater the bias. This will cause a greater bias on estimates of dominance than on additive effects because dominance effects are more difficult to detect. They will be found less often but, when they are found, they will be exaggerated. Thirdly, QTL at the ends of chromosomes will tend to be located further from the ends than they actually are because all locations beyond the end of the chromosome will be excluded (Hyne et al., 1995). Similar effects may occur in the vicinity of markers using any interval mapping approach. These biases are all larger with QTL of small effect. Put together, these biases imply that we will tend to underestimate the true number of QTL but exaggerate their additive and dominance effects. Actual data must be viewed with due awareness of these biases and limitations on reliability.

Conclusions from analyses

Having considered the methods currently used for QTL analysis, what have we learnt so far from their application to experimental data? We have investigated the results from 47 studies involving QTL analysis in maize, cereals, brassicas and Arabidopsis, which include some 176 trial-trait combinations. The references for these will be made available on the WWW (see Fig. 1). The number of QTL located for particular traits in individual studies varies from 1 to ≈16 with a mean of ≈4 (Fig. 1a). In 94 per cent of studies, 8 or fewer QTL are found while studies finding none are seldom reported! As predicted above, very few (≈2 per cent) claim to have found more than 12. Few studies detect individual chromosomes with more than one statistically significant QTL, but there are examples with at least 3 (van der Schaar et al., 1997). Taken together, these results are entirely consistent with theoretical predictions.

Fig. 1
figure 1

Summary of QTL properties from 176 trial-trait combinations in plants. (a) Distribution of numbers of QTL located; (b) proportion of phenotypic variation (Vp) explained, (c) relationship between number of QTL and proportion of Vp explained; (d) variance explained by individual QTL; (e) dominance ratios of QTL showing dominance; (f) proportions of additive, dominant and overdominant QTL. (For details of publications see: http://www.biology.bham.ac.uk/qtl-rev-papers/.)

It is often difficult to determine from the literature how much of the genetical variation is explained by the QTL either individually or together because only the total phenotypic variance is reported. It is thus not possible to decide whether that variation left unexplained is due to other QTL or the environment. Typically (Fig. 1b), ≈46 per cent of the variance is explained by the QTL identified, although it varies from 10 to 95 per cent in individual studies. One might expect that those studies finding fewer QTL would explain less variation than those finding more but this is not the case (Fig. 1c). Although the percentage variance explained does increase significantly with QTL number, the R2 is only 6 per cent, while those cases in which most variation is explained involve just 1–6 QTL. If we consider 19 cases where 70 per cent or more of the variance is explained, 5 are biochemical traits controlled by 2–5 genes, three concern disease resistance, two are pollen traits and the rest are agronomic traits. Thus both erucic acid and linolenic acid content in Brassica napus are largely explained by 2 QTL each (Ecke et al., 1995; Tanhuanpaa et al., 1995). Individual QTL explain from 1 to 50 per cent of the phenotypic variation (Fig. 1d).

Most studies with plants involve homozygous lines, RILs or DH lines, which do not provide information on dominance effects of QTL. Of those that do some degree of dominance is found in 53 per cent. A total of 55 per cent of the dominant QTL showed apparent overdominance (i.e. d>a), while for 23 per cent the overdominance was significant (Fig. 1e, f). These overdominant QTL clearly require further investigation to confirm their status and, if genuine, to identify their genetic basis. Such extensive overdominance, if real, is not expected from previous quantitative genetic studies. It could possibly be due to closely linked dispersed genes but, as was said earlier, the size of significant dominance effects is likely to be exaggerated.

Many studies have identified candidate loci in the vicinity of QTL. For example, QTL controlling variation in heading date in cereals have been located close to known loci controlling photoperiod and vernalization (Laurie et al., 1994; Bezant et al., 1995). QTL have also been located in similar positions in different populations (Lin e al., 1995; Thomas et al., 1995) while QTL in wheat have been found in similar positions in homoeologous chromosomes, reinforcing the view that they are the same loci. Similarly, QTL controlling flowering time in brassicas have been found to map to similar regions in homoeologous chromosomes both within and between species (Lagercrantz et al., 1996; Osborn et al., 1997), and similar syntenous relationships are found in maize and sorghum (Lin et al., 1995). Further insight into such synteny is provided by the fact that some of the duplicated Brassica regions show very close physical similarity to one end of chromosome 5 of Arabidopsis thaliana, which is known to contain several flowering related genes, including Constans (Bohuon et al., 1998). Thus, three linkage groups of Brassica oleracea (O2, O3 and O9) and possibly O4, contain this same Arabidopsis region and all carry QTL for flowering time. Such syntenous regions encourage one to think that the same few QTL may be involved for at least some quantitative traits both within and between species. However, the large CI on QTL positions and the fact that physical identity of chromosome tracts often involves very different map distances, should encourage caution in nominating candidate loci.

Plant and animal breeders may not need to know the locations of their QTL with very great accuracy if they intend to introgress them by marker-assisted backcrossing. They will be mainly interested in those QTL which have a large effect and hence the approaches described so far in this review would allow them to ‘cherry pick’ useful QTL, which could well have been missed by conventional mass selection, and incorporate them in elite lines. The current techniques will also enable breeders to determine the ideal ideotype from QTL studies of several different crosses, and allow the possibility of constructing them. Some success has already been achieved but the jury is probably still out on the question of whether or not greater, more efficient advances will be made overall. Probably the greatest value of markers in this context is in the reduction of linkage drag during introgression of QTL by backcrossing.

On the other hand, map based gene cloning of QTL and their detailed analysis will require somewhat greater mapping precision than is currently available. The use of ‘engineered’ substitution lines, containing donor fragments of known length, will improve this accuracy and their construction by backcrossing is currently in progress in several species. Their results are awaited with interest (Tanksley, 1993; Ramsay et al., 1996).