- Split View
-
Views
-
Cite
Cite
Feng-Chi Chen, Sheng-Shun Wang, Chuang-Jong Chen, Wen-Hsiung Li, Trees-Juen Chuang, Alternatively and Constitutively Spliced Exons Are Subject to Different Evolutionary Forces, Molecular Biology and Evolution, Volume 23, Issue 3, March 2006, Pages 675–682, https://doi.org/10.1093/molbev/msj081
- Share Icon Share
Abstract
There has been a controversy on whether alternatively spliced exons (ASEs) evolve faster than constitutively spliced exons (CSEs). Although it has been noted that ASEs are subject to weaker selective constraints than CSEs, so they evolve faster, there have also been studies that indicated slower evolution in ASEs than in CSEs. In this study, we retrieve more than 5,000 human-mouse orthologous exons and calculate the synonymous (KS) and nonsynonymous (KA) substitution rates in these exons. Our results show that ASEs have higher KA values and higher KA/KS ratios than CSEs, indicating faster amino acid–level evolution in ASEs. The faster evolution may be in part due to weaker selective constraints. It is also possible that the faster rate is in part due to faster functional evolution in ASEs. On the other hand, the majority of ASEs have lower KS values than CSEs. With reference to the substitution rate in introns, we show that the KS values in ASEs are close to the neutral substitution rate, whereas the synonymous substitution rate in CSEs has likely been accelerated. The elevated synonymous rate in CSEs is not related to CpG dinucleotides or low-complexity regions of protein but may be weakly related to codon usage bias. The overall trends of higher KA and lower KS in ASEs than in CSEs are also observed in human-rat and mouse-rat comparisons. Therefore, our observations hold for mammals of different molecular clocks.
Introduction
In complex organisms such as mammals, the frequency of alternative splicing (AS) is high. Genomic studies have suggested that as high as 40–60% of human genes undergo AS (Mironov, Fickett, and Gelfand 1999; Kan et al. 2001; Modrek et al. 2001; Kan, States, and Gish 2002). AS has been shown to be associated with nonsense-mediated decay (Wachtel et al. 2004; Stamm et al. 2005), programmed cell death (Wu, Tang, and Havlioglu 2003), and many other important biological processes. Some AS events were shown to be highly related to human diseases (Orban and Olah 2003; Sazani and Kole 2003; Garcia-Blanco, Baraniak, and Lasda 2004; Rossi 2004; Venables 2004). Therefore, AS has become an important topic in a variety of fields such as oncology, molecular medicine, and developmental biology.
In evolution, AS is thought to be one of the major mechanisms of increasing transcriptome complexity (Hanke et al. 1999; Modrek and Lee 2002). It allows the generation of different transcript/protein isoforms from the same genes, thus increasing functional diversity of proteome without increasing the number of genes. Several studies have suggested that AS plays a major role in genome evolution because of relatively weaker negative selection pressure (Kan, States, and Gish 2002; Boue, Letunic, and Bork 2003; Modrek and Lee 2003; Xing and Lee 2004). Because new gene functions may arise from insertion or deletion of an exon, it was suggested that alternatively spliced exons (ASEs) can accelerate gene evolution. This hypothesis has been supported by recent studies. Examples include the following: (1) a significant proportion of ASEs is species specific and not conserved between human and rodents in contrast with high conservation of orthologous gene structures between these species (Nurtdinov et al. 2003); (2) AS is associated with exon gain or loss events, implying faster evolution in ASEs than in constitutively spliced exons (CSEs) (Modrek and Lee 2003); (3) ASEs tend to insert or delete complete protein domains more frequently than expected by chance, leading to increase in functional diversity (Kriventseva et al. 2003); and (4) ASEs tend to have higher KA/KS (nonsynonymous to synonymous substitution rate) ratios than CSEs as revealed from comparison of orthologous exons between human and other species (Iida and Akashi 2000; Hurst and Pal 2001; Filip and Mundy 2004; Xing and Lee 2005a, 2005b). However, it has also been suggested that ASEs evolve at a slower pace than CSEs. Indeed, ASEs have been found to be better conserved than CSEs (Sorek and Ast 2003; Sorek et al. 2004; Sugnet et al. 2004), to be under stronger selection to conserve reading frame (Resch et al. 2004), and to have fewer single-nucleotide polymorphisms (Yeo et al. 2005). It was also observed that the introns flanking ASEs are better conserved than introns flanking CSEs (Sorek and Ast 2003; Philipps, Park, and Graveley 2004), implying that ASEs are under stronger selection pressure than CSEs. In addition, Cusack and Wolfe (2005) suggested that genes undergoing AS by exon skipping were more constrained than the genome average. In view of these conflicting conclusions, we conducted an extensive analysis to compare evolutionary rates in ASEs and CSEs and to explore possible selection forces that underlie ASE evolution.
Materials and Methods
Retrieval of CSEs and ASEs
Human CSEs and ASEs were available from the online database ASAP (the Alternative Splicing Annotation Project [Lee et al. 2003]) at http://www.bioinfo.mbi.ucla.edu/ASAP/. By mapping the ASAP-provided homolog table to the ASAP genomic data set or the corresponding mouse UniGene expressed sequence tag (EST) sequences (ftp://ftp.ncbi.nih.gov/repository/UniGene/, March 2005), 4,630 human CSEs and their orthologous mouse exonic sequences were extracted (Fig. 1).
Because ASEs that have been conserved between human and mouse were not available from ASAP, we BlastN aligned the human ASE plus two flanking exons against the whole mouse UniGene EST database to identify the orthologous mouse counterparts of human ASEs (Fig. 1). Mouse exons that had a ≥70% sequence identity to the full lengths of human exon queries were extracted. A total of 788 human ASEs, including 512 major-form exons, 21 minor-form exons, and 255 undetermined-form exons, were paired with their mouse orthologs. The classification of major-form (included in at least two-thirds of the EST counts), minor-form (skipped in at least two-thirds of the EST counts), and undetermined-form (in the intermediate case or ≤5 ESTs in total) exons was retrieved from ASAP. The detailed definitions of major-, minor-, and undetermined-form exons were described in Modrek and Lee (2003). Because only a small number of minor-form exons were available, we grouped minor-form exons with undetermined-form exons to form nonmajor-form (NM) exons. In addition, orthologous human-rat and mouse-rat exon pairs, including CSEs and ASEs, were identified based on BlastN alignments between human/mouse exons and the rat UniGene database. The sequences of exons analyzed in this study are available at http://www.sinica.edu.tw/∼trees/ASE_CSE/ASE_CSE.htm.
Retrieval of Pure (Constitutive) Introns
Human-mouse orthologous introns that did not overlap with any known human transcripts (the “pure” or “constitutive” introns) were extracted from the University of California, Santa Cruz (UCSC) Genome Browser at http://hgdownload.cse.ucsc.edu/goldenPath/hg17/multiz8way/. Sequences with lower than 70% identity were excluded. The human intronic sequences in the remaining human-mouse sequence alignments were then matched to the Human Gene Index (HGI) Release 15 from the TIGR database (The Institute of Genome Research; http://www.tigr.org/tdb/tgi) using the CRASA program (Chuang et al. 2003). Introns that were matched to HGI entries were excluded because they might, in fact, be ASEs. The flanking exons of these pure introns were further examined to reconfirm that they were conserved between human and mouse. By doing so, we could be confident that the introns retrieved were true orthologous introns between human and mouse.
Computation of Divergence at Fourfold Degenerate Sites
We extracted fourfold degenerate sites from human CSEs, major-form exons, NM exons, and their mouse counterparts. To exclude the effect of CpG dinucleotides, only sites that were neither preceded by a “C” nor followed by a “G” (“non-CpG–prone sites”) were considered. The extracted sites of the three exon types from human and mouse were submitted to the BASEML program of the PAML package (Yang 1997; Yang and Nielsen 2000) for calculation of genetic distance.
Computation of KA, KS, KA/KS, Ke, and Ki Values
For the KA/KS ratio analysis of orthologous exon pairs, two procedures were performed: (1) detecting reading frames of human protein-coding exons by BlastX aligning these exons against the corresponding RefSeq protein sequences (ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/protein/) and (2) calculating KA, KS, and KA/KS values using the YN00 program of the PAML package (also see Fig. 1). The substitution rates of human-mouse orthologous exons (Ke values of ASEs and CSEs) and pure introns (Ki value) were measured using the TN93 model implemented in the BASEML program of the PAML package.
Prediction of Protein Domains and Function
We detected protein domains using the InterProScan package and the INTERPRO resource (Mulder et al. 2005; Quevillon et al. 2005) (downloaded from http://www.ebi.ac.uk/InterProScan/index.html and ftp://ftp.ebi.ac.uk/pub/databases/interpro/iprscan/, respectively). The globular domains were annotated by the GlobPlot package (Linding et al. 2003) downloaded from http://globplot.embl.de/. Low-complexity protein domains were identified using the BlastP program downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.13/. Exonic sequences were BlastN aligned against the RefSeq database (http://www.ncbi.nlm.nih.gov/RefSeq/), and the matched RefSeq entries, together with the included gene ontology annotations, were retrieved.
Results
Basic Features of Human-Mouse Orthologous Exons
Table 1 shows the basic features of the exons retrieved in this study. It is apparent that lengths, average sequence identities, and GC contents differ only slightly from each other among the three exon types. The same comment applies to the percentages of CpG dinucleotides and their derivatives (TpG and CpA). However, for GC contents at fourfold degenerate sites and substitution rates at non-CpG–prone fourfold degenerate sites, CSEs have the highest values, while NM exons have the lowest, but the reverse trend is observed for effective number of codons. Meanwhile, the percentage of exons with length divisible by three is the highest in NM exons and the lowest in CSEs. Similarly, NM exons have the highest average transversion to transition ratio, followed by major-form exons and then by CSEs. Because transversions are mostly nonsynonymous changes (while transitions tend to be synonymous changes), different transversion/transition ratios in ASEs and CSEs may be associated with different KA and KS values in these exons. Indeed, our results show that the largest average and median KA values occur in NM exons, followed by major-form exons, and then by CSEs, whereas the reverse is true for the KS values (Fig. 2A and B and Table 1). Interestingly, despite the fact that the median (average) KS value is lower in ASEs than in CSEs, the median (average) KA is higher in ASEs. Note that the differences of Ke, KS, and KA/KS between CSEs and ASEs are all significant, while the differences between major-form and NM exons are not (Table 2).
. | . | ASEs . | . | |
---|---|---|---|---|
. | CSEs . | Major . | Nonmajor . | |
Number of exons analyzed | 4,630 | 512 | 276 | |
Median length (mean) (bp) | 115 (124) | 111 (117) | 110 (119) | |
Average sequence identity (%) | 88.35 | 89.07 | 89.22 | |
GC content (%) | 51.09 | 51.09 | 50.41 | |
Average CpG, TpG, and CpA dinucleotides content (%) | 19.18 | 18.97 | 18.64 | |
GC content at fourfold degenerate sites (%) | 57.54 | 55.56 | 54.70 | |
Substitution rate at non-CpG–prone fourfold degenerate sites | 0.394 | 0.344 | 0.314 | |
Effective number of codons | 53.7 | 54.6 | 54.8 | |
Percentage of exon length divisible by 3 (%) | 37.75 | 42.19 | 48.91 | |
Average transversion/transition ratio | 0.475 | 0.518 | 0.564 | |
Median Ke value (mean) | 0.130 (0.132) | 0.122 (0.123) | 0.118 (0.120) | |
Median KA value (mean) | 0.026 (0.040) | 0.033 (0.046) | 0.043 (0.062) | |
Median KS value (meana) | 0.588 (0.873) | 0.468 (0.632) | 0.447 (0.605) | |
Median KA/KS ratio (meana) | 0.037 (0.091) | 0.069 (0.144) | 0.098 (0.181) |
. | . | ASEs . | . | |
---|---|---|---|---|
. | CSEs . | Major . | Nonmajor . | |
Number of exons analyzed | 4,630 | 512 | 276 | |
Median length (mean) (bp) | 115 (124) | 111 (117) | 110 (119) | |
Average sequence identity (%) | 88.35 | 89.07 | 89.22 | |
GC content (%) | 51.09 | 51.09 | 50.41 | |
Average CpG, TpG, and CpA dinucleotides content (%) | 19.18 | 18.97 | 18.64 | |
GC content at fourfold degenerate sites (%) | 57.54 | 55.56 | 54.70 | |
Substitution rate at non-CpG–prone fourfold degenerate sites | 0.394 | 0.344 | 0.314 | |
Effective number of codons | 53.7 | 54.6 | 54.8 | |
Percentage of exon length divisible by 3 (%) | 37.75 | 42.19 | 48.91 | |
Average transversion/transition ratio | 0.475 | 0.518 | 0.564 | |
Median Ke value (mean) | 0.130 (0.132) | 0.122 (0.123) | 0.118 (0.120) | |
Median KA value (mean) | 0.026 (0.040) | 0.033 (0.046) | 0.043 (0.062) | |
Median KS value (meana) | 0.588 (0.873) | 0.468 (0.632) | 0.447 (0.605) | |
Median KA/KS ratio (meana) | 0.037 (0.091) | 0.069 (0.144) | 0.098 (0.181) |
Because large KS values and KA/KS ratios may be methodologically suspect, KS values >10 and KA/KS ratios >10 were not considered (Hillier et al. 2004) when we calculated their average values.
. | . | ASEs . | . | |
---|---|---|---|---|
. | CSEs . | Major . | Nonmajor . | |
Number of exons analyzed | 4,630 | 512 | 276 | |
Median length (mean) (bp) | 115 (124) | 111 (117) | 110 (119) | |
Average sequence identity (%) | 88.35 | 89.07 | 89.22 | |
GC content (%) | 51.09 | 51.09 | 50.41 | |
Average CpG, TpG, and CpA dinucleotides content (%) | 19.18 | 18.97 | 18.64 | |
GC content at fourfold degenerate sites (%) | 57.54 | 55.56 | 54.70 | |
Substitution rate at non-CpG–prone fourfold degenerate sites | 0.394 | 0.344 | 0.314 | |
Effective number of codons | 53.7 | 54.6 | 54.8 | |
Percentage of exon length divisible by 3 (%) | 37.75 | 42.19 | 48.91 | |
Average transversion/transition ratio | 0.475 | 0.518 | 0.564 | |
Median Ke value (mean) | 0.130 (0.132) | 0.122 (0.123) | 0.118 (0.120) | |
Median KA value (mean) | 0.026 (0.040) | 0.033 (0.046) | 0.043 (0.062) | |
Median KS value (meana) | 0.588 (0.873) | 0.468 (0.632) | 0.447 (0.605) | |
Median KA/KS ratio (meana) | 0.037 (0.091) | 0.069 (0.144) | 0.098 (0.181) |
. | . | ASEs . | . | |
---|---|---|---|---|
. | CSEs . | Major . | Nonmajor . | |
Number of exons analyzed | 4,630 | 512 | 276 | |
Median length (mean) (bp) | 115 (124) | 111 (117) | 110 (119) | |
Average sequence identity (%) | 88.35 | 89.07 | 89.22 | |
GC content (%) | 51.09 | 51.09 | 50.41 | |
Average CpG, TpG, and CpA dinucleotides content (%) | 19.18 | 18.97 | 18.64 | |
GC content at fourfold degenerate sites (%) | 57.54 | 55.56 | 54.70 | |
Substitution rate at non-CpG–prone fourfold degenerate sites | 0.394 | 0.344 | 0.314 | |
Effective number of codons | 53.7 | 54.6 | 54.8 | |
Percentage of exon length divisible by 3 (%) | 37.75 | 42.19 | 48.91 | |
Average transversion/transition ratio | 0.475 | 0.518 | 0.564 | |
Median Ke value (mean) | 0.130 (0.132) | 0.122 (0.123) | 0.118 (0.120) | |
Median KA value (mean) | 0.026 (0.040) | 0.033 (0.046) | 0.043 (0.062) | |
Median KS value (meana) | 0.588 (0.873) | 0.468 (0.632) | 0.447 (0.605) | |
Median KA/KS ratio (meana) | 0.037 (0.091) | 0.069 (0.144) | 0.098 (0.181) |
Because large KS values and KA/KS ratios may be methodologically suspect, KS values >10 and KA/KS ratios >10 were not considered (Hillier et al. 2004) when we calculated their average values.
. | CSEs versus Major-Form ASEs . | CSEs versus NM ASEs . | Major-form Versus NM ASEs . |
---|---|---|---|
Ke | P < 0.002 | P < 0.002 | P > 0.1 |
KA | P < 0.006 | P < 0.002 | P > 0.1 |
KS | P < 1 × 10−5 | P < 1 × 10−6 | P > 0.05 |
KA/KS | P < 1 × 10−6 | P < 1 × 10−6 | P > 0.05 |
. | CSEs versus Major-Form ASEs . | CSEs versus NM ASEs . | Major-form Versus NM ASEs . |
---|---|---|---|
Ke | P < 0.002 | P < 0.002 | P > 0.1 |
KA | P < 0.006 | P < 0.002 | P > 0.1 |
KS | P < 1 × 10−5 | P < 1 × 10−6 | P > 0.05 |
KA/KS | P < 1 × 10−6 | P < 1 × 10−6 | P > 0.05 |
. | CSEs versus Major-Form ASEs . | CSEs versus NM ASEs . | Major-form Versus NM ASEs . |
---|---|---|---|
Ke | P < 0.002 | P < 0.002 | P > 0.1 |
KA | P < 0.006 | P < 0.002 | P > 0.1 |
KS | P < 1 × 10−5 | P < 1 × 10−6 | P > 0.05 |
KA/KS | P < 1 × 10−6 | P < 1 × 10−6 | P > 0.05 |
. | CSEs versus Major-Form ASEs . | CSEs versus NM ASEs . | Major-form Versus NM ASEs . |
---|---|---|---|
Ke | P < 0.002 | P < 0.002 | P > 0.1 |
KA | P < 0.006 | P < 0.002 | P > 0.1 |
KS | P < 1 × 10−5 | P < 1 × 10−6 | P > 0.05 |
KA/KS | P < 1 × 10−6 | P < 1 × 10−6 | P > 0.05 |
It is possible that selection of data sets may affect our results. For example, if the ASEs retrieved happen to be biased toward fast-evolving exons, it is then not surprising to observe higher KA in ASEs than in CSEs. To address this possibility, we retrieved human-rat orthologous ASEs and CSEs for the same evolutionary analysis. As shown in Table 3, substitution rates derived from human-rat orthologous exons have similar trends as observed in the human-mouse comparison. That is, ASEs have higher median KA values and higher KA/KS ratios but lower median KS values than CSEs. The probability of observing the same trends from two biased data sets appears to be small. Therefore, our results are very likely unbiased. Furthermore, because rodents have a faster molecular clock than primates (Li 1997; Nekrutenko, Chung, and Li 2003), comparison of human-mouse (or human-rat) orthologous exons might yield tendencies that would not hold in comparisons of species with similar molecular clocks. However, an analysis using orthologous exons from mouse and rat, which have similar molecular clocks, shows the same tendencies as above. Therefore, the trends of higher KA and lower KS values in ASEs than in CSEs are not affected by species selection or different molecular clocks. Our estimated median KA/KS ratio between mouse and rat is lower than that reported in the rat genome analysis (Gibbs et al. 2004). Note that mouse-rat homologous genes were used to derive KA/KS ratios in the rat genome analysis, whereas only well-annotated exons are used in this study. That is, only exons that are well defined to be alternatively or constitutively spliced are included in this study. Inclusion of less well-annotated exons, predicted genes, or fast-evolving genes may result in elevated estimates of KA/KS ratios. Notwithstanding the difference in KA/KS ratio estimates between different studies, the overall trend of higher KA/KS ratios in ASEs than in CSEs holds well in human-rodent and mouse-rat comparisons, in agreement with previous studies (Xing and Lee 2005a).
. | . | ASEs . | . | |
---|---|---|---|---|
. | CSEs . | Major . | Nonmajor . | |
Human-rat | ||||
Number of exons analyzed | 622 | 117 | 68 | |
Median KA value | 0.021 | 0.034 | 0.039 | |
Median KS value | 0.623 | 0.486 | 0.437 | |
Median KA/KS ratio | 0.030 | 0.061 | 0.082 | |
Mouse-rat | ||||
Number of exons analyzed | 3,774 | 382 | 176 | |
Median KA value | 0.007 | 0.014 | 0.019 | |
Median KS value | 0.192 | 0.181 | 0.179 | |
Median KA/KS ratio | 0.016 | 0.070 | 0.092 |
. | . | ASEs . | . | |
---|---|---|---|---|
. | CSEs . | Major . | Nonmajor . | |
Human-rat | ||||
Number of exons analyzed | 622 | 117 | 68 | |
Median KA value | 0.021 | 0.034 | 0.039 | |
Median KS value | 0.623 | 0.486 | 0.437 | |
Median KA/KS ratio | 0.030 | 0.061 | 0.082 | |
Mouse-rat | ||||
Number of exons analyzed | 3,774 | 382 | 176 | |
Median KA value | 0.007 | 0.014 | 0.019 | |
Median KS value | 0.192 | 0.181 | 0.179 | |
Median KA/KS ratio | 0.016 | 0.070 | 0.092 |
. | . | ASEs . | . | |
---|---|---|---|---|
. | CSEs . | Major . | Nonmajor . | |
Human-rat | ||||
Number of exons analyzed | 622 | 117 | 68 | |
Median KA value | 0.021 | 0.034 | 0.039 | |
Median KS value | 0.623 | 0.486 | 0.437 | |
Median KA/KS ratio | 0.030 | 0.061 | 0.082 | |
Mouse-rat | ||||
Number of exons analyzed | 3,774 | 382 | 176 | |
Median KA value | 0.007 | 0.014 | 0.019 | |
Median KS value | 0.192 | 0.181 | 0.179 | |
Median KA/KS ratio | 0.016 | 0.070 | 0.092 |
. | . | ASEs . | . | |
---|---|---|---|---|
. | CSEs . | Major . | Nonmajor . | |
Human-rat | ||||
Number of exons analyzed | 622 | 117 | 68 | |
Median KA value | 0.021 | 0.034 | 0.039 | |
Median KS value | 0.623 | 0.486 | 0.437 | |
Median KA/KS ratio | 0.030 | 0.061 | 0.082 | |
Mouse-rat | ||||
Number of exons analyzed | 3,774 | 382 | 176 | |
Median KA value | 0.007 | 0.014 | 0.019 | |
Median KS value | 0.192 | 0.181 | 0.179 | |
Median KA/KS ratio | 0.016 | 0.070 | 0.092 |
Our estimates of KA values are smaller than those observed in previous studies (Waterston et al. 2002). The reason is that a large portion (>60%) of exons (including CSEs and ASEs) analyzed in this study encode for protein domains (data not shown). Regions with domains tend to have lower KA values than regions not containing domains (Waterston et al. 2002). It is emphasized that the exons analyzed in this study are well-annotated ASEs and CSEs. We recognize that a large number of exons are not included in this study because they are not well annotated and cannot be reliably classified into ASEs and CSEs.
Substitutions at Synonymous Sites
There are two possible explanations for a lower synonymous rate in ASEs than in CSEs: CSEs and ASEs have different mutation rates or they are under different selection pressures. The first explanation has the intriguing implication that the mutation rate varies among regions of a gene. However, this scenario requires a mechanism to distinguish CSEs from ASEs for different mutation rates to occur. Because the GC content, length, and Homo-Mus sequence identity of the two exon types are highly similar (Table 1) and because ASEs and CSEs are located in the same genomic region, it is likely that ASEs and CSEs mutate at similar rates. Therefore, we suggest that different selection pressures have operated on ASEs and CSEs, leading to different Ks values in the two exon types. Under this scenario, the synonymous rate is either reduced in ASEs or elevated in CSEs.
To determine which of the two possibilities is more probable, we retrieved ∼110,000 human-mouse orthologous introns that do not include any ASE or EST match (i.e., pure or constitutive introns) from the UCSC Genome Browser and calculated the substitution rates (see Materials and Methods). The median rate (0.441) of these pure introns is 33.6% lower than the median KS of CSEs (0.588) but very close to that of ASEs (0.468 for major-form and 0.447 for NM exons; Table 1). Thus, either both ASE synonymous sites and introns are under stronger negative selection than CSE synonymous sites or the latter evolve faster than the neutral rate. The former scenario is less probable because ASEs are less frequently translated and so are likely to be subject to weaker functional constraints than CSEs and because most parts of pure introns are likely subject to little functional constraints. Therefore, it seems that substitutions at ASE synonymous sites are approximately neutral, while CSE synonymous sites have an elevated substitution rate. This observation also implies that ASEs may have originated from introns, as suggested in previous studies (Gilbert 1978; Kondrashov and Koonin 2003). Because KA is lower in CSEs than in ASEs, the higher KS in CSEs cannot be due to its positive correlation with KA, an observation that has been made by previous authors (Graur 1985; Li, Wu, and Luo 1985).
KA and KA/KS Values Are Negatively Related to Inclusion Level of Exons
To probe possible selective forces on ASEs and CSEs, we analyzed the KA values and KA/KS ratios in these exons. Because ASEs have higher KA/KS ratios than CSEs (Fig. 2C and Table 1), they are either subject to positive selection more frequently than CSEs or more relaxed from negative selection. Because most ASEs were exons whose functions were still under development (Modrek and Lee 2003), ASEs may change fast at the amino acid level to gain new protein functions. Therefore, nonsynonymous changes may be selected for in ASEs, leading to higher KA values and KA/KS ratios. Such differences in nonsynonymous substitution rates between ASEs and CSEs tend to increase as the inclusion level of ASEs decreases. This observation implies that the inclusion level of ASEs is associated with the strength of selection pressure. Our results are consistent with those previously reported (Xing and Lee 2005a).
Many ASEs Are Under Strong Selection Pressure
Despite the tendency of fast evolution in ASEs, as high as ∼30% of ASEs have a small KA (<0.02), and the distribution curves of the three exon types are barely distinguishable for the 30% of exons (Fig. 2A). A similar trend is also observed in the cumulative distributions of KA/KS ratios (Fig. 2C), with the three curves diverging from each other at ∼35% cumulative proportion (or KA/KS = 0.05). These observations indicate that a significant fraction of ASEs have very low rates of evolutionary changes and are under strong negative selection. It is likely that these ASEs may be alternative conserved exons (i.e., ASEs that are conserved between the compared species [Yeo et al. 2005]) and may have important biological functions so that nonsynonymous base substitutions are repressed in these exons. Note that the exons analyzed in this study are conserved between humans and mice. Therefore, these exons may be more conserved than newly gained (or species specific) ASEs (Cusack and Wolfe 2005). Newly added exons may be under positive selection (or relaxed negative selection) and develop new functions. Overall, our results reveal that more than 60% of the three exon types have different KA values and KA/KS ratios (Fig. 2A and C), indicating that the majority of CSEs, major-form exons and NM exons are under different selection pressures, which may have been caused by changes of protein-level selection pressure, translational selections, and selections at silent sites (Iida and Akashi 2000; Hurst and Pal 2001).
Discussion
Our results indicate that ASEs have lower synonymous rates but higher nonsynonymous rates than CSEs. The elevated KA/KS ratio in ASEs implies that ASEs may have contributed more to protein diversity than CSEs during mammalian evolution. Our results also show that the synonymous substitution rates in ASEs are close to those of pure introns. We therefore suggest that the synonymous rates in ASEs are close to neutral rates. Although introns are known to contain sequences that are subject to functional constraint (Nobrega et al. 2003, 2004; Rastegar et al. 2004; Ovcharenko et al. 2005), it has been estimated that intronic sequences that are under selection pressure occupy only a small fraction of introns (Keightley and Gaffney 2003). Moreover, previous studies indicated that introns were, in general, subject to less selection pressure than exons (Li 1997). In other words, evolutionary rates of introns are closer to neutral than those of exons. Therefore, it should be reasonable to infer that synonymous rates in ASEs are close to neutral rates, while those in CSEs are accelerated. Furthermore, with a large number of pure introns (∼110,000) analyzed, our estimates of substitution rates of human-mouse orthologous pure introns, which are similar to the results from previous studies (Castresana 2002), should be representative. Our conclusion is in agreement with the previous observation of a higher synonymous substitution rate in coding exons than the substitution rate in introns (Moriyama and Powell 1996; Cargill et al. 1999; Halushka et al. 1999; Zwick, Cutler, and Chakravarti 2000; Subramanian and Kumar 2003). Because ASEs have a dual role of exon and intron, it is expected that the synonymous substitution rate of ASEs falls in between the substitution rates of CSEs and introns. Moreover, it has been suggested that introns flanking ASEs are more conserved than those flanking CSEs (Sorek and Ast 2003; Sorek et al. 2004; Sugnet et al. 2004). Although we suggest that the synonymous substitution rate in CSEs is accelerated, there has not been direct evidence to relate the KS values in CSEs and the nucleotide substitution rates in introns flanking CSEs. Because exons and introns are under different selection pressures and have different mechanisms of evolution, it is likely that CSEs and their flanking introns evolve at different paces.
On the other hand, previous studies indicated that synonymous changes were not neutral (Hurst and Pal 2001; Pagani, Raponi, and Baralle 2005) and that regulatory signal might reside in silent sites (Fairbrother et al. 2004a, 2004b; Wang et al. 2004). These observations may not hold for ASEs because it is possible that exonic splicing enhancers are more abundant in CSEs than in ASEs. Although a previous study suggested that ASEs had more potential regulatory sequences than CSEs (Itoh, Washio, and Tomita 2004), these sequences have not been experimentally validated. In addition, the exact contents of regulatory sequences in exons remain unclear, making it difficult to infer the influences of these sequences on overall evolutionary rates. At any rate, given the approximate neutrality of substitutions in pure introns, we may tentatively conclude that the synonymous substitution rates in ASEs are close to neutral rates.
It has been suggested that synonymous substitution rates in ASEs are lower than those in CSEs due to selection for conserved AS regulatory signals (Baek and Green 2005; Xing and Lee 2005a). However, the higher nonsynonymous substitution rates in ASEs appear to be inconsistent with the hypothesis of regulatory signal conservation in ASEs. Because the numbers of nonsynonymous sites are, in general, larger than those of synonymous sites, it seems more probable that regulatory signals fall in nonsynonymous sites than in synonymous sites. Therefore, we suggest that the lower KS values in ASEs than in CSEs result from accelerated synonymous substitution rates in CSEs.
It is not clear why the synonymous rate is elevated in CSEs. A previous study indicated that GC-ending codons are more abundant in CSEs than in ASEs (Iida and Akashi 2000), implying different codon usage patterns between these two exon types. We used the DnaSP program (J. Rozas and R. Rozas 1999) to estimate codon usage bias in human-mouse orthologous exons. Our results showed that CSEs have a slightly smaller effective number of codons and higher GC contents at fourfold degenerate sites than ASEs. Because the differences are not very large, codon usage bias may account for just part of the difference in KS values between ASEs and CSEs. Another possible cause of the substitution rate difference is different contents of highly mutable CpG dinucleotides. However, our results show that the contents of CpG dinucleotides are similar in CSEs and ASEs (Table 1). Moreover, our analysis on non-CpG–prone fourfold degenerate sites indicates that CSEs indeed have a higher substitution rate at these sites than ASEs (Table 1), supporting the view that the elevation in the CSE synonymous rate is not related to CpG dinucleotides.
Because the ASEs analyzed in this study are conserved in the genomes of humans and mice, it is very likely that they were derived from the common ancestor of humans and rodents. In other words, these ASEs either were already alternatively spliced in the common ancestor or they changed from CSEs to ASEs after the human-rodent divergence. It is likely that these ASEs are subject to weaker functional constraint than CSEs because they are usually not involved in protein domains, as revealed in previous studies (Kriventseva et al. 2003; Xing, Xu, and Lee 2003; Cline et al. 2004; Yeo et al. 2005). Therefore, ASEs may be allowed to evolve and develop new functions without disrupting the original protein structures. From this viewpoint, positive selection may have played a significant role in the evolution of these ASEs.
In summary, our study suggests that ASEs and CSEs are subject to different evolutionary forces. The elevated nonsynonymous substitution rate in ASEs may have contributed to functional divergence in mammalian evolution.
Jennifer Wernegreen, Associate Editor
We thank the reviewers for valuable suggestions. This work is supported by the Genomics Research Center, Academia Sinica, Taiwan; the National Health Research Institutes (NHRI), Taiwan, under contract NHRI-EX94-9408PC; and the National Science Council (NSC), Taiwan, under contract NSC 93-2213-E-001-023.
References
Baek, D., and P. Green.
Boue, S., I. Letunic, and P. Bork.
Cargill, M., D. Altshuler, J. Ireland et al. (18 co-authors).
Castresana, J.
Chuang, T. J., W. C. Lin, H. C. Lee, C. W. Wang, K. L. Hsiao, Z. H. Wang, D. Shieh, S. C. Lin, and L. Y. Ch'ang.
Cline, M. S., R. Shigeta, R. L. Wheeler, M. A. Siani-Rose, D. Kulp, and A. E. Loraine.
Cusack, B. P., and K. H. Wolfe.
Fairbrother, W. G., D. Holste, C. B. Burge, and P. A. Sharp.
Fairbrother, W. G., G. W. Yeo, R. Yeh, P. Goldstein, M. Mawson, P. A. Sharp, and C. B. Burge.
Filip, L. C., and N. I. Mundy.
Garcia-Blanco, M. A., A. P. Baraniak, and E. L. Lasda.
Gibbs, R. A., G. M. Weinstock, M. L. Metzker et al. (229 co-authors).
Graur, D.
Halushka, M. K., J. B. Fan, K. Bentley, L. Hsie, N. Shen, A. Weder, R. Cooper, R. Lipshutz, and A. Chakravarti.
Hanke, J., D. Brett, I. Zastrow, A. Aydin, S. Delbruck, G. Lehmann, F. Luft, J. Reich, and P. Bork.
Hillier, L. W., W. Miller, E. Birney et al (175 co-authors).
Hurst, L. D., and C. Pal.
Iida, K., and H. Akashi.
Itoh, H., T. Washio, and M. Tomita.
Kan, Z., E. C. Rouchka, W. R. Gish, and D. J. States.
Kan, Z., D. States, and W. Gish.
Keightley, P. D., and D. J. Gaffney.
Kondrashov, F. A., and E. V. Koonin.
Kriventseva, E. V., I. Koch, R. Apweiler, M. Vingron, P. Bork, M. S. Gelfand, and S. Sunyaev.
Lee, C., L. Atanelov, B. Modrek, and Y. Xing.
Li, W. H., C. I. Wu, and C. C. Luo.
Linding, R., R. B. Russell, V. Neduva, and T. J. Gibson.
Mironov, A. A., J. W. Fickett, and M. S. Gelfand.
Modrek, B., and C. J. Lee.
Modrek, B., A. Resch, C. Grasso, and C. Lee.
Moriyama, E. N., and J. R. Powell.
Mulder, N. J., R. Apweiler, T. K. Attwood et al. (40 co-authors).
Nekrutenko, A., W. Y. Chung, and W. H. Li.
Nobrega, M. A., I. Ovcharenko, V. Afzal, and E. M. Rubin.
Nobrega, M. A., Y. Zhu, I. Plajzer-Frick, V. Afzal, and E. M. Rubin.
Nurtdinov, R. N., I. I. Artamonova, A. A. Mironov, and M. S. Gelfand.
Orban, T. I., and E. Olah.
Ovcharenko, I., G. G. Loots, M. A. Nobrega, R. C. Hardison, W. Miller, and L. Stubbs.
Pagani, F., M. Raponi, and F. E. Baralle.
Philipps, D. L., J. W. Park, and B. R. Graveley.
Quevillon, E., V. Silventoinen, S. Pillai, N. Harte, N. Mulder, R. Apweiler, and R. Lopez.
Rastegar, M., L. Kobrossy, E. N. Kovacs, I. Rambaldi, and M. Featherstone.
Resch, A., Y. Xing, A. Alekseyenko, B. Modrek, and C. Lee.
Rossi, E. L.
Rozas, J., and R. Rozas.
Sazani, P., and R. Kole.
Sorek, R., and G. Ast.
Sorek, R., R. Shemesh, Y. Cohen, O. Basechess, G. Ast, and R. Shamir.
Stamm, S., S. Ben-Ari, I. Rafalska, Y. Tang, Z. Zhang, D. Toiber, T. A. Thanaraj, and H. Soreq.
Subramanian, S., and S. Kumar.
Sugnet, C. W., W. J. Kent, M. Ares Jr., and D. Haussler.
Wachtel, C., B. Li, J. Sperling, and R. Sperling.
Wang, Z., M. E. Rolish, G. Yeo, V. Tung, M. Mawson, and C. B. Burge.
Waterston, R. H., K. Lindblad-Toh, E. Birney et al. (222 co-authors).
Wu, J. Y., H. Tang, and N. Havlioglu.
Xing, Y., and C. Lee.
———.
Xing, Y., and C. J. Lee.
Xing, Y., Q. Xu, and C. Lee.
Yang, Z.
Yang, Z., and R. Nielsen.
Yeo, G. W., E. Van Nostrand, D. Holste, T. Poggio, and C. B. Burge.
Author notes
*Genomics Research Center and †Institute of Information Science, Academia Sinica, Taipei, Taiwan; and ‡Department of Ecology and Evolution, University of Chicago