- Split View
-
Views
-
Cite
Cite
Andreas Wagner, Asymmetric Functional Divergence of Duplicate Genes in Yeast, Molecular Biology and Evolution, Volume 19, Issue 10, October 2002, Pages 1760–1768, https://doi.org/10.1093/oxfordjournals.molbev.a003998
- Share Icon Share
Abstract
Most duplicate genes are eliminated from a genome shortly after duplication, but those that remain are an important source of biochemical diversity. Here, I present evidence from genome-scale protein-protein interaction data, microarray expression data, and large-scale gene knockout data that this diversification is often asymmetrical: one duplicate usually shows significantly more molecular or genetic interactions than the other. I propose a model that can explain this divergence pattern if asymmetrically diverging duplicate gene pairs show increased robustness to deleterious mutations.
Introduction
Soon after a gene duplication, degenerative mutations are likely to eliminate duplicate genes from the genome (Li 1997 , pp. 284–287; Lynch and Conery 2000 ). But gene duplications occur continuously and at high rates in eukaryotes, which accounts for the fact that up to 50% of a eukaryotic genome may consist of duplicate genes (Lynch and Conery 2000 ; Rubin et al. 2000 ). These persisting duplicate genes are perhaps the most prominent source of biochemical innovation of gene products. But little is known about how this innovation occurs or about how gene duplicates diverge in general.
Studying functional divergence among duplicate genes requires a definition of gene function, but no such universal definition is possible. The reason is that there are several complementary ways of classifying gene functions (Ashburner et al. 2000 ). For instance, gene products can be characterized biochemically, e.g., as enzymes or transcription factors. Second, they can be characterized through their time and locus of expression, e.g., expression during a cell cycle stage, in the cytoplasm, or during brain development. Third, they can be characterized genetically through mutations and through other genes that these mutations affect. This list is not necessarily complete.
Functional genomics has added much information to each of these categories, especially in model organisms like the yeast Saccharomyces cerevisiae. First, monitoring expression through microarrays (Chu et al. 1998 ; Eisen et al. 1998 ; Spellman et al. 1998 ; Gasch et al. 2000 ) provides spatiotemporal expression information for thousands of genes at once. This information is indicative of the biological process a gene is involved in. Second, genome-wide protein-protein interactions can characterize physical interactions among thousands of gene products (Bartel et al. 1996 ; Fromont-Racine, Rain, and Legrain 1997 ; Uetz et al. 2000 ; Ito et al. 2001 ). Third, large-scale gene knockout screens in combination with microarray experiments indicate which genes' expression level is affected by a mutated gene (Hughes et al. 2000 ). Thus, even in the absence of a detectable phenotype—all too frequent in knockout experiments—a putative function can sometimes be assigned using genetic interactions with known genes.
Attempts to identify gene functions according to any of the above criteria, whether they use genomic or pregenomic techniques, yield one key message: most genes have more than one, if not many functions. They are expressed at multiple times and in multiple places, they affect multiple biological processes when mutated, or they interact with proteins with diverse biochemical and biological roles (Bender et al. 1983 ; Li and Noll 1994 ; Jack and Delotto 1995 ; Slusarski, Motzny, and Holmgren 1995 ; Kirchhamer, Yuh, and Davidson 1996 ; Schwikowski, Uetz, and Fields 2000 ; Wagner 2001 ). This multifunctionality has important implications for the divergence of duplicate genes: duplicate genes often diverge through loss of complementary (sub)functions in each duplicate (Force, Lynch, and Postlethwait 1999 ; Lynch and Force 2000 ; Wagner 2000 ). Examples abound. To name but two, the ZAG1 and ZMM2 genes are paralogues in the maize genome. They are orthologues of the Arabidopsis AGAMOUS gene, which is involved in carpel and stamen development. Each of them appears to have largely lost one of their ancestral expression domains: ZAG1 is expressed at high levels in developing carpels and ZMM2 is expressed in developing stamens. A null mutation in ZAG1 affects only early carpel development (Coen and Meyerowitz 1991 ; Schmidt et al. 1993 ; Mena et al. 1996 ). Force, Lynch, and Postlethwait (1999) report on the zebrafish engrailed genes eng1 and eng1b, the likely results of a teleost-specific gene duplication of the tetrapod En1 gene. In mice and chicken, En1 is expressed in the developing pectoral appendage bud and in specific neurons of the developing hindbrain and spinal cord. In zebrafish, eng1 retained expression in the pectoral appendage bud, whereas eng1b is only expressed in the hindbrain and the spinal cord. Similar patterns of divergence may be quite common in zebrafish (Ekker et al. 1995 ; Lee, Xu, and Breitbart 1996 ; Ekker et al. 1997 ).
Studies focussing on individual gene pairs fall short of identifying general divergence patterns of many duplicate genes. At first sight, analyzing functional divergence of many duplicate genes may seem like a hopeless task. Because it is not even straightforward to classify one gene's function, how would one compare the functions of many divergent duplicates? Functional genomic experiments provide a crude remedy for this problem. Despite their disadvantage of providing largely qualitative information about genetic and molecular interactions of genes, their great advantage is that they do so for thousands of genes at once. They thus yield insight about one aspect—however minute—of gene function, such as the protein interaction partners of a gene, gene expression patterns affected through mutating a gene, or the response of gene expression to environmental challenges. It is this aspect of gene function I will focus on.
Methods
Gene Duplication Data
Data on yeast gene duplicates were kindly provided by John Conery (Department of Computer Science, University of Oregon) and generated as described in Lynch and Conery (2000) . Briefly, gapped BLAST (Altschul et al. 1997 ) was used for pairwise amino acid sequence comparisons of all yeast open reading frames as obtained from GenBank. All protein pairs with a BLAST alignment score greater than 10−2 were retained for further analysis. Then, the following conservative approach was followed to retain only unambiguously aligned sequences. Using the protein alignment generated by BLAST as a guide, a sequence pair was scanned to the right of each alignment gap. All sequences from the end of the gap through the first “anchor” pair of matched amino acids were discarded. All subsequent sequences (exclusive to the anchor pair of amino acids) were retained if a second pair of matching amino acids was found within less than six amino acids from the first. This procedure was then repeated to the left of each alignment gap (see Lynch and Conery 2000 for more detailed description and justification). The retained portion of each amino acid sequence alignment was then used jointly with DNA sequence information to generate nucleotide sequence alignments of genes. For each gene pair in this data set, the fraction Ks of synonymous (silent) substitutions per silent site as well as the fraction Ka of replacement substitutions per replacement site were estimated using the method of Li (1993) .
Protein Interaction Data and Analysis
Data for 899 pairwise interactions among 985 yeast proteins, as reported in Uetz et al. (2000) , were obtained from http://depts.washington.edu/sfields/projects/YPLM/Nature-plain.html on February 15, 2000. There are 43 proteins that have been reported to interact among themselves. Before further analysis all such self-interactions were eliminated. (Self-interactions are interactions between two protein products of the same gene, such as interactions that might occur for homodimerizing proteins.) The resulting protein interaction network was then represented as a graph using the Library of Efficient Data types and Algorithms (LEDA) (Mehlhorn and Naher 1999 ). Within this graph representation, common and different protein interactions among gene family members are easily analyzed (Wagner 2001 ). To analyze protein interaction data not generated by two-hybrid experiments, I used information on physical interactions among yeast proteins obtained from the Munich Institute for Protein Sequences (MIPS) database (Mewes et al. 1999 , http://mips.gsf.de/proj/yeast/CYGD/db/index.html). I eliminated from these data all protein interactions generated only by two-hybrid experiments. The remaining 899 interactions involve 680 proteins. I did not distinguish between genes with only one paralogue and genes that occur in multigene families in the analysis of either data set.
I used the following numerical approach to test (and reject) the null hypothesis that the number of interactions in products of paralogous genes has diverged symmetrically. (Notice that this hypothesis does not regard the mechanism of divergence, only its pattern.) The approach proceeds by (1) reconstructing the (identical) numbers of interactions of two proteins immediately after duplication of their encoding genes, and (2) emulating the process of symmetric divergence. Consider two proteins P and P* that have d1 and d2 protein interactions, respectively, and that share b of these interaction partners (fig. 1 ). It follows that P and P* have d1 − b and d2 − b nonshared interactions, respectively, adding to a total of d1 + d2 − 2b nonshared interactions. Each of these interactions might have arisen through the evolutionary loss of an interaction that was shared after duplication or through the evolutionary gain of an interaction since the duplication. To not restrict myself to only one of these possibilities, I assume that after duplication interactions are lost with some probability Pl and gained with probability (1 − Pl). Because interactions are gained or lost probabilistically, one cannot unambiguously reconstruct the ancestral state of interactions, that is, the number of interactions P and P* had immediately after duplication. But it is possible to reconstruct a likely ancestral state simply by noting that the number of lost interactions after duplication follows a binomial distribution B(d1 + d2 − 2b, Pl). This ancestral state, the number of interactions of each protein immediately after duplication, is simply given by b + nl, where nl is a random number distributed as B(d1 + d2 − 2b, Pl). (The total number of interactions gained by the two duplicates then immediately follows as ng = [d1 + d2 − 2b] − nl.) Equipped with these two numbers, I then applied the null hypothesis of symmetric divergence to emulate each protein's divergence from this ancestral state. According to the null hypothesis, the number of interactions lost and gained by protein P since duplication is given by random numbers nl1 with distribution B(nl, 0.5) and ng1 with distribution B(ng, 0.5), respectively. The factor 0.5 in these distributions reflects the assumption of symmetric divergence in the null hypothesis. Thus, according to the null hypothesis, protein P should have (b + nl) − nl1 + ng1 interactions. The number of interactions of protein P* immediately follows as (b + nl) − (nl − nl1) + (ng − ng1).
I numerically applied this approach, which I have explained for only one protein pair, to all protein pairs considered here. In this way, I generated a distribution of the number of interactions under the null hypothesis of symmetric divergence with a given probability Pl of interaction loss. I then asked whether the statistical association between the number of interactions is the same in the null model as in the empirical data. The answer was unequivocally no, regardless of the value of Pl used. But only three special cases are treated in the main text: (1) divergence through loss of interactions only (Pl = 1); (2) divergence through gain of interactions (Pl = 0); and (3) divergence through equiprobable loss and gain of interactions (Pl = 0.5).
Environmental Stress and Gene Expression
To assay the differential expression response of yeast paralogues to environmental stresses, I used data provided by Gasch et al. (2000) for the following conditions: heat shock (25–37°C, after 30 min), reverse heat shock (37–25°C, 30′), H2O2 and Menadione exposure, both of which generate reactive oxygen species (60′ and 80′, respectively), dithiothreitol, a reducing agent interfering with protein folding (90′), diamide, an agent oxidizing sulfhydryl groups, (40′), hyperosmotic shock mediated by 1 M sorbitol (60′), hypo-osmotic shock mediated by transfer of cells from 1 M sorbitol to medium lacking sorbitol (30′), amino acid starvation (2 h), nitrogen depletion (1 day) , and stationary phase (7 days). I considered genes whose expression level was changed at least threefold relative in response to a stressor to be affected significantly. Because the expression response to most environmental stresses is transient, I chose a time point (indicated above in parentheses) approximately halfway through the measured response time series for each environmental stress to assess significant change. I then counted the number of stressors to which each member of a paralogous gene pair responded and did so for all 5,460 duplicate pairs with Ka < 0.75. For 40.4% (2,210) of these gene pairs, neither gene in the pair showed a response to any of the stressors applied. Such gene pairs are not suitable for this analysis, and I have thus eliminated them. I also excluded 162 further gene pairs (2.96%), where at least one stress condition induced the expression of one gene but repressed that of the other. Because of cross-hybridization, very closely related duplicates cannot be distinguished through microarray analysis, but the analysis of Gasch et al. (2000 , fig. 5) suggests that gene pairs with Ks > 0.5 are readily distinguishable. I thus excluded an additional 4.5% (247) of the paralogues with Ks < 0.5 from the analysis. The null hypothesis of symmetric divergence was assessed in exactly the same way as that for protein-protein interactions, except that d1 and d2 now do not correspond to the number of protein interactions but instead to the number of expression responses that two duplicate genes show when exposed to the environmental stressors considered here (b is the number of environmental stressors to which both duplicate genes respond).
Gene Perturbations and Gene Expression
Data summarizing the effects of 271 gene deletions (and other treatments) on gene expression were made available as supplemental material to Hughes et al. (2000) , file data_expts_1-300_ratios.txt. From this data set, which contains log10-transformed expression ratios of 6,312 genes for each mutation, I eliminated all data derived from haploid and aneuploid deletion strains, as well as data on nongenetic treatments. The remaining data contain information on null mutation effects for a total of 21 paralogous gene pairs, the most closely related 11 of which (Ka < 1) are discussed here. For each member gene of each paralogue, I determined what other genes were affected in their expression level by a synthetic-null mutation in the gene. I also determined the number of genes that were affected by a null mutation in each paralogue. I considered a gene as affected by a null mutation if its level of mRNA expression had changed by more than threefold in response to the mutation.
The following are brief annotations (Mewes et al. 1999 ) of all genes listed in table 1 (in order of appearance), with the exception of genes with seven-letter names, which correspond to genes of completely uncharacterized functions—MBP1: subunit of the MBF transcription factor; SWI4: transcription factor; ERP2: p24 protein involved in membrane trafficking; ERP4: similarity to human COP-coated vesicle membrane protein; CLB6: B-type cyclin; CLB2: G2/M-specific cyclin; ISW1 and ISW2: strong similarities to Drosophila ISW1 gene; RAD27: ssDNA endonuclease and 5′-3′exonuclease; VPS21: GTP-binding protein; CAT8: transcription factor involved in gluconeogenesis; SIR2: silencing regulatory protein and DNA-repair protein; HST3: silencing protein; PAU2: strong similarity to members of the Srp1p/Tip1p family; ALD5: aldehyde dehydrogenase 2 (NAD+); DIG1 and DIG2: MAP kinase–associated proteins, down-regulator of invasive growth and mating. Further information on the genes affected by a particular perturbation is available at http://www.rosettainpharmatics.com/publications/cell_hughes.htm as well as at the Munich Information Center for Protein Sequences (http://mips.gsf.de/proj/yeast/).
Results
Asymmetric Divergence in Protein-Protein Interactions
Genome-scale screens of protein interactions using the yeast two-hybrid assay have been carried out in several organisms (Bartel et al. 1996 ; Fromont-Racine, Rain, and Legrain 1997 ; Ito et al. 2000 ; Uetz et al. 2000 ). Their results are comprehensive maps of protein-protein interactions comprising many proteins encoded by a genome. Interpreting these maps is still difficult because they may contain significant numbers of false-positive and false-negative interactions (Ito et al. 2001 ) and because they collapse the spatial and temporal dimensions of gene expression into a still-life image of protein interactions. But these maps have also demonstrated their usefulness in predicting the spatial expression domain and functional annotation of many proteins from their interaction partners (Schwikowski, Uetz, and Fields 2000 ). They can also answer questions about global patterns of interactions, questions whose answers do not depend on the veracity of each individual interaction but only on statistical interaction patterns.
More than 30% of yeast genes whose products interact with proteins have one or more gene duplicates in the yeast genome (Wagner 2001 ). How do gene duplications influence the structure of the protein interaction network? Figure 1 shows a hypothetical protein P that interacts with four other proteins. Immediately after duplication of the gene encoding P, P and its duplicate P* share all four interactions. As the duplicates diverge in sequence, they also diverge in their protein interactions. Each protein may occasionally gain new interactions. But if mutations are more likely to cause loss of an interaction, as suggested by the prevalence of degenerative mutations in general (Li 1997 ), then most divergences will be due to loss of originally common protein interactions. Here, I use the number of interaction partners a protein has as a crude one-dimensional indicator of protein function. The number of common and different interactions between two duplicates then indicates their functional divergence.
Figure 2a shows the number of interaction partners for 1,734 pairs of paralogous genes in the network described by Uetz et al. (2000) . These comprise all paralogous pairs with Ka < 1 nonsynonymous substitutions per nonsynonymous site, corresponding to genes with less than 60% amino acid divergence. The abscissa and ordinate axes show the number of protein interactions for the first and second protein member of each pair. The number of common interactions in these pairs is small: even among the most recent paralogues (synonymous substitutions per synonymous site Ks < 0.5) less than 60% share any interactions at all, and this number dwindles to less than 15% for more distant paralogues (Ks > 1) (Wagner 2001 ).
Figure 2a shows a distinct L-shape, indicating that in many protein pairs, where one partner has many interactions, the other one has disproportionately few. This negative correlation in the number of interaction partners between duplicates is statistically highly significant (Spearman rs = −0.58, P ≪ 10−3; Pearson r = −0.15, P ≪ 10−3, df = 1,732). Could it have occurred by chance alone, that is, through random symmetric (equiprobable) loss or gain of interactions in either member of a pair? To find out, I numerically tested this null hypothesis of symmetric divergence as described in Methods. I did so under multiple scenarios distinguished by the relative importance given to evolutionary gains and losses of protein interactions after gene duplications. More specifically, each scenario assumes that the probabilities of interaction loss and gain are equal to some probability Pl and 1 − Pl. I report results only for three representative scenarios, although others yielded qualitatively identical results. The first scenario assumes that all divergence of interactions in duplicate genes is due to loss of interactions since the duplication. (Pl = 1). For illustration, figure 2b shows the distribution of interactions expected under this scenario, as generated from a stochastic simulation of the divergence of 1,734 pairs of paralogous genes. The L-shape of the plot in figure 2a disappears in this scenario of symmetric divergence, as does the highly negative statistical association (Spearman rs = −0.08, P ≪ 10−3; Pearson r = 0.44, P ≪ 10−3, df = 1,732).
The second scenario assumes that all divergence is due to symmetric (equiprobable) gain of interactions (Pl = 0) in the two duplicates. It yields identical results (Spearman rs = −0.1, P ≪ 10−3; Pearson r = 0.46, P ≪ 10−3, df = 1,732). The third scenario assumes that divergence is due to a mix of both loss and gain of interactions (Pl = 0.5), where both duplicates lose or gain interactions symmetrically, that is, with equal probability. It also leads to a fundamentally different distribution of interactions compared with that observed in the data. (Spearman rs = −0.08, P ≪ 10−3; Pearson r = 0.46, P ≪ 10−3, df = 1,732). Similar to the simulated data shown in figure 2b, the L-shape observed in the data also disappears under the latter two scenarios.
Independent genome-scale two-hybrid experiments using different experimental designs (Uetz et al. 2000 ; Ito et al. 2001 ) show limited overlap in the interactions they detect. It is thus advisable to ensure that the observed patterns of divergence are not artifacts of a particular experimental technique. I have repeated the above analysis with yeast protein interaction data taken from the MIPS database (Mewes et al. 1999 ), from which I eliminated all protein interaction information generated by two-hybrid experiments. The remaining 899 interactions among 680 yeast proteins have been experimentally confirmed using techniques ranging from Western blotting to coimmunoprecipitation. The global pattern of interactions among paralogues follows closely that of the two-hybrid data, an L-shaped distribution indicating asymmetry (fig. 2c ) and a highly negative statistical association (Spearman rs = −0.52, P ≪ 10−3; Pearson r = −0.15, P ≪ 10−3, df = 1,357). This pattern is not explicable through symmetric loss of interactions (Spearman rs = 0.12, P ≪ 10−3; Pearson r = 0.54, P ≪ 10−3, df = 1,357), symmetric gain of interactions (Spearman rs = 0.10, P ≪ 10−3; Pearson r = 0.49, P ≪ 10−3, df = 1,357), or symmetric gain and loss of interactions (Spearman rs = 0.09, P ≪ 10−3; Pearson r = 0.53, P ≪ 10−3, df = 1,357).
In summary, protein interactions among products of duplicate genes diverge asymmetrically, i.e., one paralogue has more protein interactions than the other. This asymmetry is statistically highly significant and is not explicable through independent (equiprobable) loss or gain of function in the duplicates.
Asymmetric Response to Environmental Stresses
Unicellular organisms like yeast have evolved elaborate cellular responses, allowing them to adapt to drastic environmental changes. They can not only withstand fluctuations in temperature, osmolarity, environmental acidity, and types and quantity of nutrients but also survive the influence of radiation and toxic chemicals. During environmental change, many genes alter their transcriptional activity. Such changes in mRNA expression profile provide valuable insights into gene functions (Chu et al. 1998 ; Eisen et al. 1998 ; Spellman et al. 1998 ; Gasch et al. 2000 ). A recent study examined the genomic mRNA expression response of most yeast genes to a variety of environmental stressors (Gasch et al. 2000 ). To assess the differential response of duplicate genes to these stressors, I analyzed data from 11 different stress responses, including heat shock, hyperosmotic shock, amino acid, and nitrogen starvation (Gasch et al. 2000 ). I excluded the most closely related paralogues (Ks < 0.5) from the analysis because cross-hybridization does not allow them to be distinguished by microarray analysis. For the remaining 2,841 paralogous gene pairs, with Ka < 0.75 and Ks > 0.5, I identified the number of stressors to which each member of the pair responds.
There is again a pronounced asymmetry in the response of gene duplicates to these stresses, as indicated by a significantly negative statistical association between the number of stresses the first and second gene respond to (Spearman rs = −0.33, P ≪ 10−3; Pearson r = −0.1, P ≪ 10−3, df = 2,839). Completely analogous to the tests for symmetric divergence in protein interactions, I analyzed whether this association is consistent with the null hypothesis that the paralogues originally responded identically to these 11 stresses but that divergence occurred symmetrically for the two gene duplicates. This hypothesis must be rejected, regardless of whether divergence occurs through loss of responses (Spearman rs = −0.003, P > 0.5; Pearson r = 0.21, P ≪ 10−3, df = 2,839), gain of responses (Spearman rs = −0.0004, P > 0.5; Pearson r = 0.21, P ≪ 10−3, df = 2,839), or a mix of loss and gain of responses (Spearman rs = −0.002, P > 0.5; Pearson r = 0.21, P ≪ 10−3, df = 2,839). In summary, the distinct asymmetry in divergence observed for protein interactions also holds for another aspect of gene function, the response to environmental stress.
Asymmetric Response to Genetic Perturbations
The results of a large-scale gene perturbation experiment in yeast, involving several hundred gene-knockout mutations in combination with microarray measurements of changes in the expression of 6,312 yeast genes, have been reported (Hughes et al. 2000 ). Measuring the effect of a null mutation in a gene on the expression of all other genes does not distinguish between direct and indirect effects of the mutation. Its advantage, however, is that it is a very comprehensive means to assay genetic interactions.
For the purpose of this article it is relevant that the available data (Hughes et al. 2000 ) contain information on the knockout effect of 11 paralogous gene pairs with Ka < 1. For these 11 gene pairs, I compared the number of genes whose expression is affected by a null mutation in each member of the pair (table 1 ). Interpreting differences between paralogues in the number of affected genes is complicated because these differences are not only the result of divergence between the paralogues but also include effects from the divergence of genes interacting with each paralogue. But the advantage of a perturbation approach is that it provides a more comprehensive assessment of functional differences between paralogues than a mere analysis of direct physical protein interactions. It exposes how the effects of a mutation ripple through a transcriptional regulation network.
Similar to the analysis discussed above, one can ask whether the observed differences between paralogues can be attributed to independent and equiprobable loss or gain of genetic interactions. For seven out of 11 gene pairs in table 1 , both these null hypotheses must be rejected, that is, these seven gene pairs show statistically significant asymmetries in divergence. Eliminating one of two paralogous genes affects a substantially greater number of other genes than eliminating the other.
Discussion
In all three data sets, evidence for asymmetric divergence is unequivocal. Gene perturbations affect the expression of a moderate to large number of genes. This makes it possible to derive statistical evidence for asymmetric divergence of paralogous genes from individual gene pairs. Seven out of 11 perturbed gene pairs show such evidence. The number of environmental stresses to which a gene responds is typically smaller and so is the number of protein interactions of gene products. These smaller numbers make it more difficult to derive solid evidence for asymmetric divergence from individual gene pairs. But such evidence emerges when analyzing multiple gene pairs.
If this model is correct, we see asymmetrically diverged gene pairs because organisms harboring them have survived preferentially in the past. Importantly, natural selection would act in an indirect, second-order manner on such gene pairs. In a population polymorphic for gene duplicates at different stages of divergence, different individuals would not necessarily have different fitness levels; rather, the propensity of such individuals to suffer deleterious mutations would be different. Individuals with symmetrically diverged duplicates would thus be preferentially eliminated from the population through deleterious mutations.
One might assume that the selective advantage of having asymmetrically diverged gene duplicates must be minute. After all, differences in fitness do not manifest themselves until new loss-of-function mutations arise. For any organism, the expected waiting time for such a new loss-of-function mutation is proportional to the inverse of the mutation rate μ (Hartl and Clark 1989 , p. 98). During this time, symmetrically diverged gene duplicates are free to go to fixation via random drift. Formal population genetic analysis (Wagner 2000 ) shows that for sufficiently large population sizes (N > 1/μ) the lens of natural selection has sufficient resolving power to perceive differences in mutational robustness and to act on them. For microorganisms like yeast, attainable population sizes may well be in the required range. In addition, this minimally required population size is based on the evolution of only one diverging gene pair (Wagner 2000 ). It may be much smaller for multiple gene pairs and their cumulative effects on mutational robustness.
The requirement for large effective population sizes suggests a test for the model. In organisms with small effective population sizes, such as many higher vertebrates, we would not expect asymmetric divergence of gene duplicates. (The necessary data are not yet available.) A requirement for persistently large population sizes may also be one of the reasons why the asymmetry observed is not perfect and does not hold for all genes. Depending on a gene and its functions, a loss-of-function mutation may have very subtle fitness effects. In conjunction with fluctuating effective population sizes, the selection pressures for asymmetrical divergence may fluctuate as well. Some genes thus diverge symmetrically, whereas others do not.
The foundation of this speculative model is the assumption that gene duplicates diversify mostly through loss of common functions. The model is thus a neutral model in the sense that adaptive mutations providing fitness benefits play no role in it. Although neutral divergence of gene duplicates has received much attention in recent work (Nowak et al. 1997 ; Gibson and Spring 1998 ; Force, Lynch, and Postlethwait 1999 ; Wagner 1999 ; Lynch and Force 2000 ; Wagner 2000 ) and is probably an important mode of gene evolution, the importance of beneficial mutations must not be neglected (Hughes 1994 ; Kreitman and Akashi 1995 ; Walsh 1995 ; Ludwig, Patel, and Kreitman 1997 ; Cirera and Aguade 1998 ; Tsaur, Ting, and Wu 1998 ). Recent evidence using fully sequenced genomes further underscores the abundance of beneficial mutations and thus the importance of scenarios of sequence divergence that involve such mutations (Fay, Wyckoff, and Wu 2002 ). Although it is not clear how adaptive mutations might lead to asymmetric functional divergence of gene duplicates, the cause may be as simple as that one adaptive mutation leads to a cascade of further such mutations and consequent functional change. To distinguish between neutral models of asymmetric functional divergence and models involving adaptive mutations will be a major task for future work.
Herve Philippe, Reviewing Editor
Keywords: protein interaction networks microarrays gene knockout biochemical innovation
Address for correspondence and reprints: Andreas Wagner, Department of Biology, University of New Mexico, 167A Castetter Hall, Albuquerque, New Mexico 817131-1091. wagnera@unm.edu
Financial support through NIH grant GM63882 is gratefully acknowledged.
References
Altschul S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, D. J. Lippman,
Ashburner M., C. A. Ball, J. A. Blake, et al. (20 co-authors)
Bartel P. L., J. A. Roecklein, D. SenGupta, S. Fields,
Bender W., M. Akam, F. Karch, P. A. Beachy, M. Peifer, P. Spierer, E. B. Lewis, D. S. Hogness,
Chu S., J. Derisi, M. Eisen, J. Mulholland, D. Botstein, P. O. Brown, I. Herskowitz,
Cirera S., M. Aguade,
Coen E. S., E. M. Meyerowitz,
Eisen M. B., P. T. Spellman, P. O. Brown, D. Botstein,
Ekker M., M. A. Akimenko, M. L. Allende, R. Smith, G. Drouin, R. M. Langille, E. S. Weinberg, M. Westerfield,
Ekker S. C., A. R. Ungar, P. Greenstein, D. P. Vonkessler, J. A. Porter, R. T. Moon, P. A. Beachy,
Fay J. C., G. J. Wyckoff, C. I. Wu,
Force A., M. Lynch, J. Postlethwait,
Fromont-Racine M., J. C. Rain, P. Legrain,
Gasch A. P., P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, P. O. Brown,
Gibson T. J., J. Spring,
Hartl D. L., A. G. Clark,
Hughes A. L.,
Hughes T. R., M. J. Marton, A. R. Jones, et al. (22 co-authors)
Ito T., T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, Y. Sakaki,
Ito T., K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, Y. Sakaki,
Jack J., Y. Delotto,
Kirchhamer C. V., C. H. Yuh, E. H. Davidson,
Kreitman M., H. Akashi,
Lee K. H., Q. H. Xu, R. E. Breitbart,
Li W.-H.,
Li X. L., M. Noll,
Ludwig M. Z., N. H. Patel, M. Kreitman,
Lynch M., J. S. Conery,
Lynch M., A. Force,
Mehlhorn K., S. Naher,
Mena M., B. A. Ambrose, R. B. Meeley, S. P. Briggs, M. F. Yanofsky, R. J. Schmidt,
Mewes H. W., K. Heumann, A. Kaps, K. Mayer, F. Pfeiffer, S. Stocker, D. Frishman,
Nowak M. A., M. C. Boerlijst, J. Cooke, J. Maynard-Smith,
Rubin G. M., M. D. Yandell, J. R. Wortman, et al. (54 co-authors)
Schmidt R. J., B. Veit, M. A. Mandel, M. Mena, S. Hake, M. F. Yanofsky,
Schwikowski B., P. Uetz, S. Fields,
Slusarski D. C., C. K. Motzny, R. Holmgren,
Spellman P. T., G. Sherlock, B. Futcher, P. O. Brown, D. Botstein,
Tsaur S. C., C. T. Ting, C. I. Wu,
Uetz P., L. Giot, G. Cagney, et al. (20 co-authors)
———.
———.