Introduction

The spatial distribution and abundance of long-lived species, such as forest trees, is closely related to phenological events, and especially bud burst (Chuine and Beaubien, 2001). Changes in the date of bud burst may modify the length of the growing season, the flowering time and the reproductive success of trees. That bud burst is a fitness-related trait is confirmed by the important clinal variation that is observed in the provenance tests of many tree species (Savolainen et al., 2007). As the date of bud burst is mainly driven by temperature for temperate tree species (Chuine and Cour, 1999), global warming is expected to have a major impact on the phenology of trees. An increase in the length of the growing season has already been observed for trees in Europe and North America (Menzel and Fabian, 1999; Penuelas and Filella, 2001). Changes in the date of bud burst will modify exposures to late frost (Scheifinger et al., 2003) or to outbreaks of phytophagous insects (Van Asch and Visser, 2007). The adaptive response of trees to these rapid climatic changes will depend on the levels of genetic variation within natural populations. In this contribution, we explore the distribution of the genetic variation of bud burst in European white oaks, by focusing on the trait variation itself and on the diversity of the underlying putative genes controlling the trait. Large latitudinal and altitudinal clinal variations were observed in provenance tests (Ducousso et al., 1996). In a recent contribution, we showed that the geographical pattern of bud burst variation was a result of the local adaptation in response to the warming following the last glaciation (Kremer et al., 2009). The clinal pattern observed along the latitude and the altitude is therefore the consequence of natural selection mediated by the temperature gradient, which has occurred during the last 15 000 years, and is not imprinted by the earlier differentiation existing among the source populations of the glacial refugia (Kremer et al., 2002).

The between-population variation was shown to be associated with the maintenance of important genetic variation within populations (Scotti-Saintagne et al., 2004). Previous quantitative trait loci (QTLs) detection analysis suggested that the important genetic variation was likely because of a large number of genes, with rather small to moderate effects (Scotti-Saintagne et al., 2004). Technical limitations or reduced pedigree size in forest trees did not permit to implement QTL cloning. Indeed, positional cloning (Tanksley et al., 1995) and insertion mutagenesis (Bechtold et al., 1993) are not available for tree species. The research of the underlying genes residing within the QTLs was therefore oriented towards a candidate gene (CG) approach (Pflieger et al., 2001). In a recent study, accumulation of transcripts in apical buds was monitored during spring time in order to identify differentially expressed genes before and after bud flush of sessile oak, leading to the identification of expressional CGs that were either up- or downregulated during the transition from quiescent to developing buds (Derory et al., 2006). In this study, we compare the relationships between the diversity of the CGs and the variation of the time of bud burst in natural and segregating populations in order to validate the expressional CGs. In natural populations, we assessed nucleotide diversity and differentiation in populations showing contrasting timing of bud burst. The molecular imprint of natural selection for bud burst was tested in two different ways: (i) by comparing the differentiation of CG sequences with control neutral markers (for example, microsatellites), (ii) by comparing the differentiation of CG sequences between geographical groups of population and phenological groups of populations. In segregation populations we refined the QTL position of the timing of bud burst by compiling observations across environments and years and compared their position with the position of CG on the genetic map following earlier analyses conducted by Scotti-Saintagne et al. (2004) and Casasoli et al. (2006). Our overall objective was to strengthen the selection of CGs based on their variation in natural and segregation populations.

Materials and methods

Sequencing, genotyping and diversity analysis of microsatellites and CGs

Plant material

The study material used for the assessment of nucleotide polymorphisms in CGs and SSRs loci come from a common garden experiment (provenance test) that was conducted at four different plantations in the Central Part of France (from the west to the east). Nine oak populations were sampled within the provenance tests comprising 112 in total (Ducousso et al., 1996). The plantations were installed between 1990 and 1995, and each provenance was represented by 10 repetitions of 24 trees in each test (Table 1). Bud burst was assessed in spring after 3 years of plantation following a grading system ranging between 0 (quiescent bud) and 5 (fully developed leaves and elongated shoot). Population differentiation (QST) for bud burst, as assessed in these tests, amounted to 0.55 (Ducousso et al., 2005). The nine sampled populations were chosen according to their contrasting bud burst recorded in the tests (from early to late flushing), and their geographic distribution. Within this study, the populations were clustered in two different ways: according to the time of the bud burst as recorded in the provenance test (phenological groups) and according to their geographic origin (geographic groups) (Table 1). Within each of the nine sampled populations, six trees were selected at random for sequencing the DNA of the CGs.

Table 1 Descriptive data of the sampled populations

CGs and microsatellite loci

Nucleotide polymorphisms were assessed in partial fragments of 9 CGs and 15 microsatellite loci (Table 2). CGs were selected according to differential expression of ESTs before and after bud flush as assessed by cDNA-macroarray experiments and real-time reverse transcriptase-PCR (Derory et al., 2006), or according to their functional role as assessed in model plants (Table 2). EST sequences were blasted in gene databases such as NCBI (http://www.ncbi.nlm.nih.gov/BLAST/), to search for homologs in oak species (Derory et al., 2006). A gene product was assigned to each EST on the basis of sequence similarity to proteins with known function in the trEMBL and Swiss-prot databases, using BLASTX with an e-value 10−10. The fragments sequenced in this study corresponded to the nine following nine genes. Galactinol synthase (GALA), Dof Affecting Germination 2 (DAG2), α-amylase/subtilisin inhibitor (ASI) and metal-nicotianamine transporter (YSL1) were drastically downregulated during the transition from quiescent to developing buds (Derory et al., 2006) and thus constitute relevant expressional CGs for molecular signals regulating bud burst transition. The functional role of these genes was investigated in studies conducted on other plants. Indeed, GALA was reported to be involved in tolerance to drought, high salinity and cold (Pukacka and Wojkiewicz, 2002; Taji et al., 2002) in Arabidopsis thaliana. DAG2 was shown to act as a transcription factor specifically involved in the maternal control of seed germination (Gualberti et al., 2002). In buds, as in seeds, it may potentially act on dormancy release. ASI is expressed in the germinating seeds of rice, barley and wheat; the protein is multifunctional and is involved in hydrolysis of starch metabolism and in seed defence against pathogens (Furtado et al., 2003; Nielsen et al., 2004). In oaks, ASI was highly expressed in buds at the quiescent stage only, suggesting that hydrolysis of storage starch or glycogen is repressed in the quiescent bud (Derory et al., 2006). The observed reduction in the expression of this gene upon bud swelling would indicate the onset of starch mobilization at this developmental stage. Lastly, three CGs putatively encoding for an auxin-repressed protein (AUX-REP), a seed maturation protein (PM23) and a histone (H3) were also selected, owing to their downregulation in the expression study from quiescent to developing buds. Additional CGs were selected based on their functional role known from other plants. As gibberellins are known to be implied in bud flushing (Or et al., 2000; Falusi and Calamassi, 2003), we included Gibberellin 20-oxidase (GA20) and Gibberellin 3-β-hydroxylase (GA3) (Perez-Flores et al., 2003; Calvo et al., 2004; Israelsson et al., 2004), which are involved in the synthesis of GA1, the active form of gibberellins in plants.

Table 2 Summary data of candidate genes

Additionally, we selected 15 microsatellite loci as ‘control markers’ that would picture the existing geographic structure for neutral markers (Mariette et al., 2002). Genotyping for microsatellites was done on the same individuals as for CGs. The following microsatellite loci were scored on the sampled trees: QrZAG11, QrZAG39, QrZAG96, QrZAG112, QpZAG110, QrZAG5, QrZAG7, QrZAG20, QrZAG65, QrZAG87, QpZAG9, QpZAG15, QpZAG46, QpZAG36 and MSQ13 (Dow et al., 1995; Steinkellner et al., 1997; Kampfer et al., 1998).

DNA extraction and PCR amplification

Total genomic DNA was extracted from buds using a slightly modified protocol of Saghai-Maroof et al. (1984). Primers and PCR conditions to amplify CGs were those used either in Derory et al. (2006) or in Casasoli et al. (2006). Primer pairs for GA3 and GA20 were designed from published sequences in GenBank (accession numbers AJ006453 and AJ420192, respectively) and their characteristics are presented in Table 1. The amplified fragments of CG covered the domain of the protein or extended to the 3′ untranslated gene region (Table 2).

Microsatellite loci were amplified in four different sets by using multiplex PCRs as described in Lepais et al. (2006) for two sets of primers. The two other sets (Pemonge, unpublished results) used comprised QpZAG9, QpZAG15, QpZAG46, and QpZAG36 and MSQ13, respectively (Dow et al., 1995; Steinkellner et al., 1997).

CG sequencing and SSR genotyping

PCR products were cloned into the pCR4-TOPO vector using a TA cloning kit from Invitrogen (Carlsbad, CA, USA). Clones were sequenced using a DYEnamic ET Dye Terminator kit (Amersham Biosciences Inc, Buckinghamshire, UK) on a Megabace 1000 automated DNA sequencer (Amersham Biosciences Inc.). Three different clones of each fragment were sequenced to ensure a high sequencing quality. Microsatellite loci were revealed as described in Lepais et al. (2006) on a Megabace 1000 automated DNA sequencer (Amersham Biosciences Inc.) using a combination of three different dyes.

Single-nucleotide polymorphism detection

The overall sequences per locus were aligned and polymorphic sites were automatically identified using an informatic pipeline described by Le Dantec et al. (2004). Every polymorphic site was then manually verified with CodonCode Aligner v.1.5.1 (Codon Code Corporation, Dedham, MA, USA). In order to distinguish true polymorphisms from scoring errors, each polymorphic site was visually checked on the chromatograms and further validated for Phred scores (quality threshold above 30).

Diversity and differentiation

For CGs, genetic diversity statistics, including the number of single-nucleotide polymorphisms (SNPs), insertion–deletions (indels), synonymous and replacement mutations, were calculated using DnaSP 4.00.5 software (Rozas et al., 2003). Two measures of DNA polymorphism were computed: π, the average number of pairwise nucleotide differences per site in the sample (Nei, 1987), and S, the number of segregating nucleotide sites. These parameters were computed with DnaSP considering SNPs and indels, at three different levels: the whole sequenced region, noncoding regions (including introns, 3′ and 5′ untranslated regions) and coding regions. The number of haplotypes and the haplotypic diversity were also calculated using DnaSP software (Rozas et al., 2003). Tajima's D (Tajima, 1989) and Fu and Li's D and F statistics (Fu and Li, 1993) were estimated to detect deviations from neutrality using the DnaSP software. Differentiation indexes were estimated, on the whole sequence and at each polymorphic site as FST (Weir and Cockerham, 1984), using Arlequin ver 3.01 (Excoffier et al., 2005). We further tested whether SNP deviated significantly from neutral expectations by using the Bayesian approach for detecting outlier loci (Beaumont and Balding, 2004). The method is based on the posterior distribution of a locus-specific parameter (αi) related to population differentiation. Population differentiation is modelled as the coancestry coefficient by (FSTij/(1−FSTij), where FSTij is the probability that two randomly chosen alleles of locus i have a common ancestor within that population j. The model accounts for different FST values for different subpopulations. Hence FSTij is subdivided into a locus-specific effect (αi), owing to mutation or selection, into a population-specific effect (βj), such as population sizes and/or migration rates, and γij, an interaction effect between population and locus—for example, local adaptation of a given allele within a given population. The three components are estimated by logistic regression as log(FSTij/(1−FSTij)=αi+βj+γij. Under neutral expectations, α values are expected to be zero, whereas positive values would indicate directional selection and negative values balancing selection. At P=5% αi is significantly positive (directional selection) if its 2.5% quantile is positive, and is significantly negative (balancing selection) if its 97.5% quantile is negative. Similarly, a significant positive effect of βj (reduced population size or gene flow) would be detected if 2.5% of its posterior distribution is positive, and negative (very large population size or gene flow) if its 97.5th quantile is negative. Posterior distributions of the parameters were computed with BAYESFST using MCMC simulations (Beaumont and Balding, 2004). The mean and standard deviation of prior distributions were set by default at 0 and 1 for αi, and at −2 and 1.8 for βj, respectively. The Bayesian detection of outlier loci (Beaumont and Balding, 2004) was preferred to the frequentist approach (Beaumont and Nichols, 1996), as the latter is less robust when sample sizes are low and does not account for populations’ effects on FST and is limited to the infinite allele mutation model (and stepwise mutation model), which is not appropriate for biallelic loci as SNPs (Eveno et al., 2008).

For SSRs, the following diversity and differentiation statistics were assessed: H0 (observed heterozygosity), HS (mean population diversity), HT (overall diversity), FST (differentiation) and FIS (fixation index) (Nei, 1987). H0, HS and HT were estimated using Fstat v2.9.3 software (Goudet, 2001), and FST and FIS were estimated using Arlequin ver 3.01 (Excoffier et al., 2005).

Mapping CGs and detection of QTLs for bud burst

Plant material and assessments of bud burst

QTLs for bud burst were detected within the full sib family 3P*A4 comprising 278 full sibs (Scotti-Saintagne et al., 2004). Full sib seedlings were sown in spring 1995 and raised in a seed bed (field test 1) in the nursery of the INRA Research Station at Pierroton (near Bordeaux, France). The full sibs were further vegetatively propagated during two successive campaigns in 1997 (6.2 cuttings per full sib on average) and 1998 (10.6 cuttings per full sib) and cuttings were installed in two experimental plantations in the fall of 1998 (field test 2) and spring 2000 (field test 3). Details about the two experimental plantations are given in Scotti-Saintagne et al. (2004). Both field tests 2 and 3 are located at the INRA Froot Tree Domain located at Bourran (near the city of Agen in the southwest of France).

Bud burst was recorded in 1999 in the seed bed (field test 1) and over 6 years in each of the two cutting plantations (in 2000, 2003, 2004, 2005, 2006, 2007 for field test 2 and in 2002, 2003, 2004, 2005, 2006, 2007 for field test 3). In total, 13 observations of bud burst were made over the three test sites. Depending on the number of cuttings available per full sib, the number of observations per full sib reached a maximum of 224 for each full sib of the mapping pedigree. Assessments of bud burst consisted in recording bud development scores (ranging from 0 to 5) every 2 to 3 days, starting from when the first tree reached score 1. The data used for the detection of QTLs were the number of days necessary to reach stage 3.

QTL detection and mapping of CGs

Multiple interval mapping (Jansen and Stam, 1994; Zeng, 1994), was used to identify QTLs, their position (L in cM) and their contribution to the variance of the trait (percentage of the variation (PEV)) using the MultiQTL V2.5 software package (http://www.multiqtl.com). Confidence intervals at 95% were estimated after 1000 bootstrap resampling (Visscher et al., 1996). The position of a QTL was defined as the range between the location of the QTL found using the observed data and the mean of 1000 bootstrap samples. Confidence intervals at 95% were estimated after 1000 bootstrap resampling and were positioned relative to the mean of the bootstrap samples.

The QTL detection was done separately on each parental map in two different steps. The one-QTL (per linkage group) model was first used and then followed by a model assuming two QTLs. When both models revealed the existence of QTLs for a given linkage group, the two-QTL model was preferred to the one-QTL model when the presence of two QTLs was significant versus the null hypothesis of no QTL (H2>H0) and versus the null hypothesis of one QTL (H2>H1). Otherwise, the one-QTL model was preferred.

To declare the presence of a QTL, type I error was set to 0.05 at the genome level. Type I errors obtained at the chromosome level (Doerge and Churchill, 1996) by MultiQTL were transformed at the genome level following the method of Scotti-Saintagne et al. (2004). An overall analysis using all 13 observations of bud burst was conducted using the multi-environment model. By bulking the information over all field tests and years, the multi-environment approach allows to improve the QTL detection by decreasing the environmental variance.

CGs were mapped by using either single strand conformation polymorphism (SSCP, Orita et al., 1989) or SNP variation (Casasoli et al., 2006). A sample of 57 to 135 full sibs of the mapping pedigree was genotyped for the ESTs of the CGs. MapMaker V.2 (Lander et al., 1987) software were used for linkage analysis using a LOD threshold of 6.0 as a grouping criterion and the Kosambi (1944) function to estimate the genetic distances.

Results

Nucleotide diversity

For eight out of the nine studied loci, the sequence of 42–52 samples was obtained, except for GA20, for which only 21 sequences were obtained. The regions analysed covered a total number of 4.6 kb, corresponding to 2.7 kb of the coding sequence and 1.9 kb of the non-coding sequence (introns+3′ untranslated region) (Table 2). The fragment sizes ranged from 315 bp for YSL1 to 1104 bp for PM23. A total of 107 SNPs, 18 singletons and 23 indels were detected. In total, 148 mutations (including SNPs, singletons and indels) were detected, suggesting that on average there is one mutation every 31 bp. The indels varied in size from 1 to 28 bp, and were mostly located in intronic regions. The average total nucleotide diversity (6.15 × 10−3) varied from 1.09 × 10−3 for locus GA3 to 14.7 × 10−3 for locus GALA (Table 3). The level of diversity in non-coding regions (8.15 × 10−3) is twofold higher than in coding regions (4.81 × 10−3). In coding sequences, synonymous regions were more variable than replacement regions (12.67 × 10−3 versus 2.40 × 10−3), except for locus GA20. Interestingly, the ratio of replacement and silent nucleotide diversity was high for PM23 (0.51) and ASI (0.39) and notably different from the other seven genes. The average haplotypic diversity was 0.76, corresponding on average to 13.2 haplotypes per locus for the sample sizes analysed (Table 3). With the exception of one locus (QrZag112), the total genetic diversity of microsatellites varied between 0.70 and 0.94, and the mean number of alleles between 10 to 20 (data not shown), as previously reported in other genetic surveys (Mariette et al., 2002).

Table 3 Patterns of nucleotide variation

Population differentiation

The overall differentiation for CGs and microsatellites was extremely low (Table 4a and b) and highly variable across loci due to the low sample size. There is a slightly higher differentiation for genes, mainly because of DAG2. Differentiation computed among geographical or phenological groups remained at extremely low levels, regardless of whether they were microsatellites or CGs. We explored in more detail the comparison of geographical versus phenological differentiation by computing FST at each single SNP for CGs and at each allele for the microsatellites and comparing the distribution of FST values between the two categories of markers and clustering of populations.

Table 4 (a) and (b) Population differentiation of candidate genes (FST values)

The mean value of FST for SNPs and microsatellite alleles was located at 0 (Figure 1), and their distributions were slightly skewed to positive values. The overall range of distribution was larger in the case of SNPs than in the case of SSR alleles as a result of the lower overall diversity of SNPs in comparison with SSR alleles. There is no trend of larger differentiation between phenological groups than between geographical groups. However, a detailed analysis of the FST values of SNPs showed that 15 SNPs of GALA (out of 18) showed values >0.07 among phenological groups and only 2 SNPs showed values >0.07 among geographical groups. The Bayesian FST test did not reveal any significant SNP (locus-specific effect (αi)) deviating from neutral expectations among the nine populations. Two different runs of BAYESFST were conducted. This first analysis was based on 10 000 drawings from the posterior distribution, and the second on 2000 drawings. Both came to the same conclusion that no SNP showed any significant deviation from neutral expectations. The Bayesian FST test was then further carried out across the phenological groups and geographical groups using the same methodology (two independent analysis). No SNP showed either positive or negative significant deviations. However, BAYESFST runs indicated significant negative population-specific effects—for example, for which 97.5% of the posterior distribution was lower than the prior mean −2. Significant negative values of βi were obtained for all populations except Cochem, Klostermarienberg and Mölln. We further did the computation by changing the prior standard deviation of βj to 1 and 3 and obtained the same results. None of the analysis indicated any significant γ effect (interaction between gene and population).

Figure 1
figure 1

Distribution of FST values. (a) Candidate genes. Differentiation among geographic groups. (b) Candidate genes. Differentiation among phenological groups. (c) Microsatellites. Differentiation among geographic groups. (d) Microsatellites. Differentiation among phenological groups.

As the FST analysis strongly suggested a very low population differentiation, we considered the overall sample of trees as belonging to one single random mating population. Under these hypotheses, we calculated Tajima's D statistic and Fu and Li's D and F statistics. None of these tests showed any significant deviation from neutral expectations (data not shown). As an example, among the nine genes, four were positive to Tajima's D test, four negative and one close to zero, but none was significant. AUX-REP and ASI were the two genes that were closest to significance in Tajima's D tests, and both were negative, suggesting potential directional selection.

QTL mapping of bud burst

The QTL detection resulted in 19 QTLs located on 11 female linkage groups and as many on 11 male linkage groups (Table 5). As not all the QTLs are located in homologous regions in the male and female linkage groups, one could conclude that there are more than 19 QTLs controlling bud burst in oaks. Standard deviations of QTL positions were substantially reduced in comparison with previous analyses (Scotti-Saintagne et al., 2004; Casasoli et al., 2006), as a result of the multi year and site observations taken into account by the multienvironment option. On average, the standard deviation amounted to 5 cM, and reached <0.02 cM in the extreme cases (linkage group 2).

Table 5 Distribution of QTLs of bud burst

We further monitored the distribution of the contribution of the 38 QTLs across all 13 observations (sites*years), representing a total of 286 QTL detections, by estimating the PEV of the mean clonal value explained by the QTL for each observation (Figure 2). The distribution of the PEV values follows an L-shaped curve. Out of the 286 detections, 265 (about 90%) corresponded to QTLs, contributing 12% to the variation of bud burst. The tail of the distribution, for example, the 21 detections with contributions >12%, comprised mainly QTLs located at three linkage groups (eight detections for LG 2; six detections for LG 9; three detections for LG5). Eight CGs were polymorphic in the two parents of the pedigree and could be mapped on the existing genetic map. Five of them located within the confidence intervals of QTLs for bud burst (using the multienvironment option, Figure 3). Interestingly, these five were located within QTL regions contributing the most to bud burst: LG9 (ASI), LG2 (YSL1) and LG5 (AUX-REP, GA3).

Figure 2
figure 2

Distribution of the PEV values (percentage of variation explained by the QTLs) for all QTLs. Tags for QTLs located at the tail of the distribution indicate the linkage groups in which the QTL is located and the field test and year of the bud burst assessment. Example: LG5 T3Y07 stands for linkage group 5, field test 3, year 2007.

Figure 3
figure 3

Distribution of QTLs of bud burst and candidate genes along linkage groups. This figure represents the linkage groups for which QTLs were detected that collocate with at least one of the eight mapped candidate genes.

Discussion

Diversity of CGs of bud burst versus diversity of neutral markers

This study is the second one reporting on the nucleotide diversity in genes of oaks. Indeed, a recent paper on an Asian oak species (Quang et al., 2008), and using genes that were obtained from the same cDNA library as ours (Casasoli et al., 2006; Derory et al., 2006) made a population survey in 11 genes of Quercus crispula. Interestingly, the overall diversity was of similar magnitude in their study (π=6.93 × 10−3) as in ours (π=6.15 × 10−3). These figures are lower than those previously reported on Populus tremula (11.1 × 10−3) (Ingvarsson, 2005), but higher than in pine species, Pinus taeda (3.98 × 10−3) (Brown et al., 2004), Pinus pinaster (2.41 × 10−3 in Pot et al. (2005); 5.51 × 10−3 in Eveno et al. (2008)) or Pinus sylvestris (1.4 × 10−3) (Dvornyk et al., 2002) or other conifers (Gonzalez-Martinez et al., 2006; Savolainen and Pyhäjärvi, 2007). Larger diversity in broadleaves than in conifers is also observed when comparative analysis is conducted at the level of silent polymorphism: the level of diversity in oaks (πsilent=11.2 × 10−3) is higher than in earlier reports on pines (πsilent=7.7 × 10−3 in P. sylvestris, Wachowiak et al. (2009); πsilent=8.6 × 10−3 in P. pinaster, Eveno et al. (2008)). Although the number of genes is still low, our results confirm earlier findings obtained with other marker systems, suggesting that oaks are highly variable species (Kremer and Petit, 1993; Mariette et al., 2002). As oaks are outcrossing species exhibiting extensive pollen flow, they compose large populations, which contribute to maintain high levels of diversity. These results should, however, be considered with caution, as this overall picture overshadows the very large variation observed among genes (15-fold variation from the lower to the largest polymorphic gene) and within genes (between silent and replacement regions). Genes exhibiting the largest diversity were GALA, and ASI, and they also showed the largest ratio of replacement versus silent diversity (Table 3). These two genes are downregulated during the transition from quiescent to developing buds (Derory et al., 2006), and their functional role has been investigated in annual plants in germinating seeds, suggesting that similar metabolic pathways may be active in developing buds and germinating seeds. In Arabidopis thaliana (Taji et al., 2002), GALA contributes to the accumulation of galactinol under abiotic stress conditions (drought, salinity and cold), and acts as an osmoprotectant of the seed. In barley the expression of ASI in germinating seeds indicates a potential role in defence against pathogens of the developing seed and embryo. The gene is also involved in the inhibition of starch hydrolysis in the peripheral tissues of the seed (Furtado et al., 2003). Both genes therefore seem to have a multifunctional role, which may contribute to the maintenance of a larger diversity owing to balancing selection in natural conditions.

Differentiation of CGs versus trait differentiation

The overall population differentiation of expressional and functional CGs of bud burst was of the same level as that of genomic microsatellites. Furthermore, differentiation among phenological groups was not larger than among geographic groups, and the distributions of the single SNP FST values for the two subdivisions (phenological versus geographical) overlapped completely. Lastly, the FST values of SNPs did not deviate significantly from neutral expectations based on the Bayesian FST test. Our results are in line with recent data obtained in various species, in which differentiation of CGs was compared with differentiation of neutral markers and of the target trait (Eveno et al. (2008) in P. pinaster; Pyhäjärvi et al. (2008) in P. sylvestris; Heuertz et al. (2006) in Picea abies; Hall et al. (2007) and Luquez et al. (2007) in P. tremula). In all these case studies, the mean differentiation of genes was of the same level as differentiation of the neutral markers and far less than differentiation of the target trait. Recent theoretical developments suggested that this discrepancy may be related to the multilocus structure of a complex trait. Indeed, Latta (2003) and Le Corre and Kremer (2003) showed that phenotypic differentiation among populations (QST) is driven by two major components: covariances among allelic effects at the different loci controlling the trait and variances of allelic effects at each locus. Only the latter component is dependent on FST of the genes, whereas the former is generated by allelic associations between loci. Interestingly, these authors also showed that the contribution of these components to the phenotypic differentiation of the trait is unbalanced under a wide range of evolutionary scenarios. For tree species, exhibiting extensive gene flow and existing in large populations, differentiation at the trait level is mostly created by allelic associations, rather than changes in allelic frequencies. In other words, diversifying selection, creating population differentiation, is capturing the first beneficial allelic associations distributed among the loci contributing to the trait, before modifying the allele frequencies at these genes. Hence, strong phenotypic differentiation may coexist with very low differentiation at the genes controlling the trait. Furthermore, the discrepancy is even inflated when the trait is controlled by a large number of loci. Indeed, more loci offer more opportunities for allelic associations to build up. We identified in this study at least 19 QTLs (Table 5) that may be contributing to the trait in only one full-sib cross. There might be more QTLs in natural populations, suggesting that the high heritability of bud burst in oaks (Scotti-Saintagne et al., 2004) may result from the summing of allelic effects over many loci, hence increasing the relative contribution of allelic associations to the overall differentiation of the bud burst. An interesting extension of this study would be to test whether intergenic allelic associations among the CGs were generated by diversifying selection and were responsible for the large phenotypic differentiation that was observed in the provenance tests as suggested by simulations by Latta (1998) and Le Corre and Kremer (2003). Intergenic allelic associations could be detected by calculating multilocus FST taking into account intergenic disequilibria (Kremer et al., 1997), but would require larger population samples than those used in this study.

Searching for CGs of bud burst

Our investigations lead to paradoxical conclusions about the link between CGs diversity and bud burst variation. On the one hand, diversity and differentiation statistics show that expressional and functional CGs actually behave as neutral markers, whereas, at the same time, the collocation of CGs and QTLs was confirmed with stronger confidence in comparison with earlier experiments (Casasoli et al., 2006). Mapping positions of five CGs were located within the confidence interval of the three strongest QTLs. There are at least four interpretations to these contrasting results: (1) expressional and functional CGs that were selected for this study are not related to the variation of bud burst; (2) QTL mapping, and collocation of CGs is still imprecise; (3) as suggested in the previous paragraph, intergenic allelic associations may be the main component of the phenotypic differentiation of bud burst; (4) the causal mutation of bud burst variation is located outside the genomic region that was explored in this study. As for the first and second interpretations, we recently monitored the level of expression of ASI that is located within the confidence interval of the strongest QTL of bud burst within the same mapping population and identified a strong eQTL (expression QTL) located at the same spot, suggesting that the level of expression of ASI is correlated with earliness of flushing (data not published), and reinforcing its putative feature as CGs. Yet other genes located in the same region may as well be the main source of variation of bud burst. The mean range of the confidence interval of a QTL in this study varies from 2 to 20 cM (Figure 3). Given that the physical size of the oak genome amounts to 740 Mbp/C and the genetic size to 1200 cM (Kremer et al., 2007), 1 cM roughly represents 600 000bp. Hence, the confidence interval of a QTL may comprise from several tens to hundreds of genes, which includes the gene causing the observed variation of bud burst. Lastly, we explored nucleotide diversity within a limited region of the different CGs. Although the targeted region derived from EST sequences overlaps the active domain of the corresponding protein (Table 2), no sequence data were available within the promoting regions. As linkage disequilibrium decays quite rapidly in oaks (Quang et al., 2008), chances for detecting any causal association located several hundred base pairs away from the recorded mutation are extremely low.

In conclusion, our results comparing the nucleotide diversity of CGs and their collocation with QTL on the genetic maps lead to implement additional analyses in two different directions to confirm their role in bud burst variation. First, the exploration of nucleotide diversity needs to be extended across the full length of the genes, including the promoter regions. Second, SNP frequencies should be monitored in a larger set of populations in order to assess for intergenic allelic associations among populations that may account for the overall population differentiation of CGs.