Gene expression goes global

Regulation of gene expression is fundamental for the coordinate synthesis, assembly and localization of the macromolecular structures of cells. This is achieved by a multi-step program that is highly interconnected and regulated at diverse levels (Fig. 1) [1, 2]. It starts in the nucleus, where transcription factors bind to specific DNA sequences proximal to the genes they regulate and recruit RNA polymerases for RNA synthesis. As soon as RNA precursors are formed, they get covered by a host of proteins forming ribonucleoprotein complexes [3, 4]. Messenger RNA-binding proteins (mRBPs) associate with nascent mRNA precursors and mediate diverse RNA processing reactions including 5′-end capping, splicing, editing, 3′-end cleavage and polyadenylation [24]. The transcripts are subsequently exported through nuclear pores to the cytoplasm where they may undergo localization to subcellular regions by complexes consisting of motor proteins and RBPs or by the signal recognition particle [5]. The transcripts assemble with translation factors and ribosomes for protein synthesis, which is controlled by global or transcript-specific mechanisms [6]. Finally, mRNAs undergo exonuclease-mediated degradation by diverse decay pathways [7]. The fate and location of proteins can be further controlled through modification of specific amino acids, cleavage by site-specific proteases and degradation through the proteasome [8].

Figure 1
figure 1

Gene expression is controlled at multiple steps. See text for details.

Besides many classical biochemical and genetic studies that revealed factors involved in and regulating these diverse steps in the gene expression program, the recent development of genome-wide analysis tools like DNA microarrays allowed fundamental new insights into the systems architecture of gene regulatory programs. For instance, DNA microarrays have been extensively used to study transcriptional programs by comparing steady-state RNA levels between diverse cell types and stages, and by the mapping of binding sites for DNA-associated proteins through chromatin immunoprecipitation (so-called ChIP-CHIP assays [9, 10]). Integration of these data allowed the description of complex transcriptional regulatory networks, involving large sets of genes that control coherent global responses in physiological and developmental programs [1113].

In contrast, less is known about the systems architecture that underlies the post-transcriptional steps in the gene expression program (although many RNA regulatory processes also occur co-transcriptionally, we further classify them as post-transcriptional for simplicity). Considering the large number of mRNA molecules in the cell — ranging from 15 000 to 150 000 mRNA molecules in yeast and mammals, respectively — it is rational to assume that the location, activity, and fates of these RNAs is not left to chance but is highly coordinated and regulated by an elaborate system. Such a post-transcriptional regulatory system may be controlled by the hundreds of RBPs and non-coding RNAs (e.g. microRNAs) that are encoded in eukaryotic genomes, possibly defining specific fates of each RNA by the combinatorial binding of distinct groups of RBPs [1416].

Here, we summarize recent work that applied genomic tools to decipher the principles and logic of post-transcriptional regulatory systems. We focus on studies considering the localization, translation and decay of mRNAs in eukaryotes. On one hand, this includes investigations to globally map post-transcriptional regulatory ‘programs’ to understand their extent, the underlying principles and conservation during evolution. On the other hand, it concerns investigations on the mediators or ‘nodes’ of these programs, which involves the characterization of RBPs and the systematic identification of their RNA targets (Fig. 2).

Figure 2
figure 2

Global approaches to study post-transcriptional gene regulation. (a) Determining the translation status of each mRNA for the mapping of translational programs. Cell-extracts are fractionated through a sucrose-density gradient and the absorbance at 254 nm is monitored. RNA is isolated from fractions containing ‘free’ RNA and ribosomal subunits, monosomes (80S) and polysomes, and analyzed with DNA microarrays. The relative position of a message in this profile is an indicator for its translational activity. (b) Systematic identification of RNAs associated with specific RNA-binding proteins. In this so-called ‘ribonomics’ approach, RBPs are immunoprecipitated or affinity-purified via a tag from cellular extracts. RNAs associated with RBPs are isolated, cDNA copies are fluorescently labeled and hybridized to DNA microarrays. The Cy5/Cy3 fluorescence ratio for each locus reflects its enrichment by affinity for the cognate protein.

RNA localization

RNA localization generally refers to the transport or enrichment of subsets of mRNAs to specific subcellular regions. RNA localization can be achieved ‘passively’ by local protection from degradation or through the trapping/anchoring at specific cellular locations. Moreover, asymmetric distribution of RNA can also be established by the ‘active’ transport of RNAs via RBP-motor protein complexes [5, 17]. Here, we discuss studies that systematically mapped RNA distribution to subcellular structures or organelles, and then refer to investigations aimed at globally identifying localized mRNAs mediated through active mRNA transport by RBPs.

In a pioneering study by Pat Brown and colleagues [18], mRNA species bound to ‘membrane-associated’ ribosomes were separated from free ‘cytosolic’ ribosomes by equilibrium density centrifugation in a sucrose gradient, and the distribution of transcripts in the fractions were quantified by comparative DNA microarray analysis. As expected, transcripts known to encode secreted or membrane-associated proteins were enriched in the membrane-bound fraction, whereas those known to encode cytoplasmic or nuclear proteins were preferentially enriched in the fractions containing mRNAs associated with cytoplasmic ribosomes. However, transcripts for more than 300 genes in the yeast Saccharomyces cerevisiae were found in the ‘membrane-fraction’ coding for previously unrecognized membrane or secreted proteins. Rather unexpected, among these was also the message for ASH1 coding for a well-known transcriptional repressor, suggesting alternative signals for membrane association [18]. Similarly, application of this method to map mRNA distributions in the plant Arabidopsis thaliana allowed the classification of 300 previously unknown transcripts as secreted or membrane-associated proteins [19]. A recent extension of this approach to eleven different human cell lines provided a detailed catalog containing more than 5000 previously uncharacterized membrane-associated and 6000 cytoplasmic/nuclear proteins at high confidence levels [20]. Strikingly, this analysis predicts that 44% of all human genes encode membrane-associated or secreted proteins exceeding previous estimates ranging from 15% to 30%. In addition, the comparison of this catalog to data obtained from hundreds of DNA microarray profiles from tumors and normal tissues allowed the identification of candidate genes that are highly overexpressed in tumors and, hence, could be particularly good candidates for diagnostic tests or molecular therapies [20].

Claude Jacq’s lab applied a subcellular fractionation approach to determine transcripts associated with free and mitochondrion-associated ribosomes in the yeast S. cerevisiae. Besides the mRNA for ATP14, which was previously known to localize in the vicinity of mitochondria [21], nuclear transcripts for diverse mitochondrial proteins were enriched in the mitochondrial fraction. Interestingly, two characteristics correlated with this mRNA localization: the phylogenetic origin and the length of the genes. mRNAs enriched in the mitochondrial fraction were preferentially longer (as deduced from average length of the encoded proteins) and originate from genes with bacterial homologues, whereas mRNAs in free cytosolic polysomes were shorter and of eukaryotic origin [22, 23]. Possibly, such coordinate localization of groups of mRNAs could allow oriented access for controlling their fates. This may also apply to other cellular compartments. For instance, a low-density array study revealed that 22 out of 649 analyzed transcripts were enriched in the cytoskeleton fraction relative to the cytosolic fraction — most of these encoding ribosomal proteins or structural proteins that interact with the cytoskeleton [24, 25].

In polarized cells like neurons, mRNA localization has major physiological implications. In dendrites, RNA transport and subsequent local protein synthesis is thought to influence experience-based synaptic plasticity and long-term memory formation; in axons, local translation modifies axon guidance and synapse formation [26, 27]. However, to date there are only a handful of well-characterized examples of localized neuronal mRNAs, among them the messages coding for microtubule-associated protein 2 (MAP2), the α-subunit of a calmodulin-dependent protein kinase (αCaMKII), brain-derived neurotrophic factor (BDNF), and activity-regulated cytoskeletal-related (Arc) [5, 26]. Therefore, several genomics-based approaches have been undertaken to identify novel localized transcripts. For example, Matsumoto et al. [28] fractionated brain tissue and isolated RNA from the heavy portion of polysomes and synaptosomes to provide a list of potentially dendritic mRNAs that undergo localized translation. Interestingly, the induction of neural activity by an electroconvulsive shock triggered a redistribution of the population of dendritic transcriptome, which may trigger changes in the translatability of this transcriptome, suggesting complex mechanisms of local translation in response to synaptic inputs [28] (Fig. 2A). The hundreds of potentially localized mRNA in neurons now await confirmation by in situ hybridization and exploration as to whether and how these may be regulated through activating stimuli, such as neurotransmitter release.

To date, more than 100 mRNAs are known to undergo active mRNA transport in diverse organisms [17, 29]. In neurons, mRNAs are transported over long distances in a microtubule-dependent manner in the form of large granules consisting of RNA-binding proteins, ribosomes and translation factors [27, 30]. Several RBPs associated with neuronal RNA transport have been identified, such as zipcode-binding proteins (ZBP1,2; named after their ability to bind to a conserved 54-nucleotide element in the 3′-UTR of the β-actin mRNA known as the ‘zipcode’), Staufen, hnRNPA2, cytoplasmic polyadenylation protein (CPEB) and members of the familial mental retardation proteins (FMRPs). At least for one of these RBPs, FMRP, a systematic gene array-based screen was undertaken to identify the mRNAs that are transported and possibly regulated by this protein [31]. Using a ‘ribonomics’ approach [32], which involved the immunoprecipitation or affinity isolation of RBPs followed by the identification of bound RNAs with DNA microarrays [32] (Fig. 2B), the authors immunopurified the protein from mouse brain tissues and found ∼4% of all mRNAs (435 messages) associated with FMRP. In addition, they compared the mRNA profiles of polyribosomes between normal human cells and cells derived from fragile X syndrome patients identifying over 200 messages with altered association and hence, these are potentially subject to translational regulation (Fig. 2A). Notably, nearly 70% of the homologous messages found in both studies had a G-quartet structure, which was demonstrated as an in vitro FMRP target [31]. These data provided a good starting point for further investigations on the most critical targets involved in fragile X pathophysiology and, possibly, on other related cognitive diseases.

Probably the best-studied example for actin-dependent RNA transport concerns ASH1 mRNA localization to the bud tip of yeast cells during cell division. ASH1 codes for a transcriptional repressor repressing mating type switching in daughter cells [29]. ASH1 mRNA is bound by She2p, an RBP tethered to the myosin motor protein Myo4p via the adaptor protein She3p. This RNA-protein complex travels along actin cables to the emerging bud for local protein synthesis. To identify other localized mRNAs, affinity purification of components of the She complex followed by the analysis of bound mRNA with microarrays was combined with a robust reporter system for in vivo visualization as a secondary screen [33, 34]. This analysis revealed 23 additional transcripts that are localized to the bud-tip and encode a wide variety of proteins, several involved in stress responses and cell wall maintenance [33]. These results reveal an unanticipated widespread use of RNA transport in budding yeast — possibly providing the daughter cells with a favorable ‘start-up package’.

In conclusion, the few studies that investigated spatial distribution of mRNAs in the cell on a global level challenge the long-standing assumption of a rather ‘unorganized’ pool of mRNAs that randomly diffuse in the cytoplasm to be eventually translated. Possibly, many mRNAs may be spatially organized even in non-polarized cells for local translation or decay in processing (P) bodies. Further applications of both subcellular fraction techniques and ribonomics approaches will certainly reveal a more comprehensive picture of the spatial arrangement of RNAs in cells.

Regulation of translation

Translational regulation has essential roles in development, oncogenesis and synaptic plasticity [3537]. It concerns the differential recruitment of mRNA species to the ribosome for protein synthesis, which results in a lack of correlation between the relative amounts of mRNA and the amount of the encoded protein. In an innovative study, the relative contribution of transcriptional and translational regulation in yeast was measured using large-scale absolute protein expression measurement called APEX (Absolute Protein Expression Index), which relies upon observed peptide counts from mass spectrometry [38]. Most (73%) of the variance in protein abundance can be explained by mRNA abundance, which is lognormal distributed around an average of 5600 proteins per mRNA molecule. This indicates that the abundance of most proteins is set per mRNA molecule; however, one third of the mRNAs must be regulated at additional levels including translation and/or protein turnover. In mammalian cells, the fraction of differentially expressed messages may be considerably higher ranging from 60% to 80%, indicating that gene expression of most messages is heavily controlled at diverse levels [39].

Translation can be divided into three steps: initiation, elongation and termination [6]. During translation initiation, the primary target for translational control, translation initiation factors (eIFs) recruit the mRNA to the small ribosomal subunit (40S subunit). Thereby, eIF4E binds to the cap structure at the 5′-end of the mRNA and interacts with eIF4G, which binds to the poly(A)-binding protein (PABP). The initiation complex then scans the mRNA in 5′ to 3′ direction until the initiation codon is reached where the large ribosomal subunit (60S) joins the complex leading to the formation of active ribosomes. Notably, ribosomes can also be recruited cap-independently to some viral and cellular mRNAs by direct binding of the small ribosomal subunit to internal RNA structures, termed IRES [40]. The assembled ribosomes traverse the coding region with help of elongation factors (eEFs) and synthesize the encoded polypeptide with multiple ribosomes covering them RNA to form polysomes. At the termination codon, peptide chain-releasing factors (eRFs) are required to release the polypeptide from the ribosome.

Two basic modes of translation regulation have been described. During global regulation, translation of most mRNAs is controlled by translation factors. For instance, phosphorylation of eIF2α reduces the amount of active initiation complexes and hence leads to a rapid reduction of translation of most messages. The availability of eIF4E is controlled by 4E-binding proteins (4E-BP) that displace eIF4G from eIF4E, and thus inhibit association of the small ribosomal subunit with them RNA[6, 41]. The second mode of translational regulation concerns mRNA-specific control, where translation of defined groups of mRNAs is modulated without affecting general protein biosynthesis. This can be carried out by specific RNA-binding proteins, which often bind to sequence or structural elements in untranslated regions (UTRs) of protein-coding transcripts and, hence, repress translation via interactions with eIFs [6, 42]. A prime example for such regulation represents cytoplasmic aconitase, an enzyme that regulates iron-dependent translation initiation through binding to a stem-loop structure in the 5′-UTR of messages involved in iron metabolism (e.g., ferritin mRNA coding for an iron regulatory protein) [43]. Specific control can also be exerted by microRNAs (miRNAs) — small RNAs of 22 nucleotides in length — that have recently been shown to repress translation via base pairing to sequences located in 3′-UTRs of target mRNAs [44]. Interestingly, it has recently become apparent that miRNA- and RBP-mediated translational regulation may collaborate or compete on specific mRNA substrates, suggesting interconnections between these different modes of translational regulation [45].

Genome-wide analysis of translational regulation. A reliable measure for translation of cellular mRNA is the degree of its association with ribosomes. Since the rate of initiation usually limits translation, most translational responses will alter the ribosome density on a given mRNA. Actively translated mRNAs are typically bound by several ribosomes (polysomes) and can be separated from the small (40S) and the large (60S) ribosomal subunits and the 80S monosomes by sucrose gradient centrifugation (Fig. 2A). In classical experiments, total RNA was isolated from fractions of the polysomal gradient and assayed for the mRNA of interest by Northern blot analysis. Several laboratories have further extended this technique using DNA microarray technology to perform genome-wide analysis of mRNAs in polysomes in yeast, Drosophila and mammals [46].

The laboratories of Pat Brown and Daniel Herschlag performed a high resolution translation state analysis in rapidly growing yeast cells, providing profiles for mRNA-ribosome association for thousands of genes [47]. Based on these data, they calculated the ribosome occupancy (fraction of a specific mRNA associated with ribosomes), the ribosome density and the translation rate for each expressed mRNA. The average occupancy was calculated at 71%, indicating that most mRNAs are likely engaged in active translation. However, about 100 mRNAs showed only weak association with ribosomes and may therefore be considered as potential candidates for ‘translation on demand’. The average ribosome density was found to be 156 nucleotides per ribosome, which is about one fifth of the maximal packing density, supporting the premise that translation initiation is the rate-limiting step for protein synthesis. Surprisingly, the ORF length appears to be a major factor determining the ribosome density, which is expressed through an inverse correlation between the ORF length and the ribosome density in diverse species [4749].

Since poly(A) tail length affects translational efficiencies and mRNA stability, two recent studies systematically addressed its length in yeast [48, 50]. In a procedure called polyadenylation state array analysis (PASTA), mRNAs were captured with poly(U) Sepharose columns and differentially eluted by increasing temperature. The mRNAs with short tails elute first and those with long tails last. RNAs fractions were analyzed with DNA microarrays to identify groups of mRNAs with similar poly(A) tail lengths. In the yeast S. cerevisiae, mRNA coding for functionally or cytotopically related mRNAs could be attributed to groups with similar tail length. Long poly(A) tails were found among mRNA coding for cytoplasmic ribosomal proteins, whereas short tails were enriched for DNA/Ty elements and among mRNAs coding for nucleolar proteins involved in ribosome synthesis, and proteins with cell cycle-related functions [50]. The comparison of the data with other genome-wide analysis revealed that poly(A) tail length positively correlates with ribosome density and to some extend with mRNA abundance, and it negatively correlates with ORF and UTR length [51]. The poly(A) tail length, and hence ribosome occupancy of messages correlate with the degree of association with poly(A)-binding protein (Pab1p). This provides ‘global’ support for the concept that long poly(A) tails stimulate translation via Pab1p and eIF4G. Interestingly, poly(A) tail length does not correlate with mRNA decay rates. Therefore, it appears that translation rates are not directly coupled to mRNA decay control, although poly(A) shortening is a prerequisite for mRNA decay [52]. Possibly, processes acting on oligo(A)-tailed intermediates may limit the decay rates of large number of yeast mRNAs. A congruent study performed in fission yeast S. pombe monitored translational status, poly(A) tail length, mRNA abundance, mRNA decay rates and RNA polymerase II association under identical conditions [48]. Functional groupings of mRNA in respect to their translational efficiencies, length and abundance were identified with shorter and abundant mRNAs having longer poly(A) tails. Notably, ORF length correlated best with ribosome density and mRNA abundance with ribosome occupancy. In conclusion, both studies revealed similar principles that may organize translation and therefore, these may be evolutionarily conserved. Further studies in other organisms will reveal whether these principles are universally conserved and possibly affected in disease.

Several studies were aimed at the systematic identification of translationally regulated messages after subjecting cells to stress and other environmental stimuli. They applied a ‘low-resolution’ profile analysis, where the mRNA contents of high sucrose gradient fractions (polysomes) were compared with fractions from the low sucrose gradient (the pool of non-translated mRNAs) (Fig. 2A). In parallel, changes in the levels of total RNA were measured to study the relation between transcription/decay and translationally regulated messages. In yeast, global effects on translation were first studied for the rapid transfer of cells from a fermentable to a non-fermentable carbon source mimicking glucose starvation [53], followed by heat-shock response and rapamycin treatment [54], amino acid starvation, butanol addition (an end product of amino acid breakdown) [55], and application of hydrogen peroxide to induce oxidative stress [56]. First, it should be noted that these relatively ‘harsh’ treatments induced global translation inhibition that goes along with a decrease in cell growth. Although this global translation inhibition is triggered by similar signaling pathways like phosphorylation of eIF2α, the various forms of stress affected quite different sets of mRNAs. Amino acid and glucose starvation differentially regulated the translation of up to 20% of all mRNAs, whereas more ‘mild’ treatments, such as butanol and peroxides, affected less than 4% of transcripts. Treatment of cells with heat/rapamycin or nutrient removal (amino acid and glucose starvation) co-activated similar translational and transcriptional programs. Here, regulation at the translational level often reflects a magnification of the transcriptional activity — an effect that has been termed ‘potentiation’ [54]. In contrast, the addition of butanol or peroxide provoked no potentiation, but instead changed the abundance and translation rates of different sets of mRNAs. This could, at least in part, be explained by the recruitment of stored mRNA for translation ‘on demand’. Nevertheless, in all cases, the specific sets of mRNAs that undergo treatment-specific regulation appear to share functional themes that can be attributed to logic response of the cell’s altered physiological circumstances. For instance, mRNAs coding for proteins related to sugar metabolism and transport, such as hexose transporters, remain associated with polysomes during glucose starvation [53]; rapamycin treatment, which blocks the target of rapamycin (TOR) pathway controlling cell growth, led to a decrease of nearly all yeast mRNAs coding for cytoplasmic ribosomal proteins, whereas mRNAs for proteins acting in the nitrogen discrimination pathway were increased [54]; amino acid starvation strongly coregulated or potentiated transcripts encoding permeases, proteases and proteins involved in degradation pathways, which may reflect an early amino acid scavenging response to starvation [55]. Another interesting aspect of these studies is that the concentration of an applied compound may significantly matter for the outcome. Shenton et al. [56] showed that low concentrations of peroxide (0.2 mM H2O2) induced the translation of mRNAs coding for antioxidants, cellular transporters and proteins involved in diverse intermediary metabolism and may reflect the need for metabolic reconfiguration. A tenfold higher concentration of peroxide (2 mM) resulted in the up-regulation of genes involved in ribosome biogenesis and ribosomal RNA processing, possibly reflecting the need to repair factors for efficient protein synthesis. On the other hand, many translationally repressed mRNAs showed increased steady-state (total RNA) levels. Again it was postulated that this group of messages may represent an mRNA store that could become rapidly activated following relief of the stress condition. It will certainly be interesting to further study whether other ‘mild’ treatments with pharmacological compounds activate dose-dependent non-linear effects via distinct regulatory programs. If so, this may become of great medical relevance as diverse drugs are known to act differentially depending on their dose.

Finally, there are recurring and intriguing observations that mRNAs coding for cytoplasmic ribosomes generally appear to undergo outstanding and strong coregulation. Amino acid and glucose starvation coordinately repress these transcript’s abundances and ribosome association very rapidly, whereas the addition of butanol or oxidative stress even lead to the opposite effect — translational activation — that possibly reflects the requirement of cells to replace ribosomal proteins and rRNA that became damaged by free radicals or other toxic products. Therefore, besides tight transcriptional control of these messages, they also undergo decent post-transcriptional control at diverse levels and hence, may represent the most tightly controlled genes in eukaryotic cells [57].

First studies to investigate global aspects of translational regulation in mammalian cells focused on IRES-dependent translation in poliovirus-infected cells [58], and the reaction of mitogenically activated fibroblasts [59], providing early proof-of-principle for the methodology introduced above that involves polysomal fractionation followed by DNA microarray analysis of RNA contents (Fig. 2A). A further landmark study by Holland and colleagues [60] analyzed polysomal profiles of murine cell lines after blocking oncogenic Ras and Akt signaling. Apparently, these pathways regulate the recruitment of specific mRNAs to ribosomes to a far greater extent than de novo synthesis of mRNAs by transcription and thus, Ras and Akt signaling pathways seem to have a more pronounced effect on translational versus transcriptional regulation. The authors postulated that the immediate and direct inductive oncogenic effect of these signaling pathways could be largely achieved through translational activation. The differences seen in RNA abundances during chronic signaling alterations may be secondary to translational effects caused by mRNAs encoding transcription factors [60]. Similar studies on different cancer types may lead to the identification of potential markers and possibly reveal novel drug targets [37, 61]. Moreover, a recent study identified specific subsets of mRNAs regulated by eIF4E overexpression, which is known to lead to tumor transformation. The authors postulated that down-regulation of eIF4E and its downstream targets may represent a potential therapeutic option for the development of novel anti-cancer drug [62].

As seen for Ras/Akt activation, it is intriguing that changes at translation can even outperform changes at the steady-state mRNA level. This has also been noticed in a study analyzing radiation-induced changes in gene expression of human brain tumor cells or normal astrocytes. Ten times more genes (∼15%) were altered at the level of translation compared to the number of genes regulated at the level of transcription (∼1.5% of 7800 analyzed human genes) [63]. Only a few transcripts were commonly affected at both the transcriptional and translational levels, suggesting that the radiation-induced changes in transcription and translation are not coordinated. Those transcripts that were affected at translation fell into functional groups such as cell cycle, DNA replication and anti-apoptotic functions. This indicates that DNA damage affects post-transcriptional gene regulation of previously synthesized mRNAs, possibly enabling cells to repair DNA instead of being transcriptionally active [63]. Functional relations among messages were also recognized in a recent study performing translational profiling of mouse pancreatic β-cells in response to an acute increase in glucose concentration [64]. More than 300 transcripts (2% of the analyzed genes) changed their association with polysomes more than 1.5-fold; most of them encoding proteins acting in metabolism or transcription. Notably, this set of messages is related to the group of genes translationally altered during glucose starvation in the yeast S. cerevisiae [53]. Therefore, in mammals and yeast, it appears that mRNAs for functionally related messages may be coordinately regulated at the translational level. It is possible that a comparative analysis in different species may allow evolutionarily conserved translational regulatory programs to be deciphered, which are at the moment still rather speculative.

Whereas concomitant changes in RNA abundance and translational rates were rarely detected during radiation response [63], a recent study identifying mRNAs that remain associated with polysomes during hypoxia in transformed prostate cancer cells (a condition that tumors prevent through the induction of angiogenesis) found both homodirectionally/potentiated mRNAs and distinctively regulated messages [65]. After prolonged exposure of PC-3 cells to low oxygen levels, global translation was reduced by half; however, 104 mRNAs, representing about 0.5% of all analyzed features, became more associated with hypoxic polysomes compared to normoxic ones. Among these, 71 mRNAs were similarly increased in hypoxic polysomes compared with total RNA levels representing homodirectional changes; 33 mRNAs were translationally enriched, some of them ‘potentiated’ (11 of those coding for ribosomal proteins) [65].

In summary, the common principles of translational regulation that emerge from genome-scale studies in diverse eukaryotes suggest a complex but coordinate system of regulation. It must be triggered by a variety of factors that go well beyond the described pathways that influence global translation. In future, there will certainly be an increasing number of studies to decipher translational regulatory programs in cancer, neurogenesis and development. Intriguingly, despite the impact of translational regulation during development, only one recent study systematically investigated translational programs during early embryogenesis in the fruit fly Drosophila melanogaster [49]. The mapping of translational programs in diverse species will likely reveal key regulatory networks and how these are affected in disease.

Regulation of mRNA decay

Steady-state mRNA levels are a result of both RNA synthesis and degradation that are dynamically controlled and can vary up to 100-fold during the cell cycle or cellular differentiation. In eubacteria like Escherichia coli, mRNAs are generally degraded by endonucleolytic cleavage, followed by 3′-to-5′ exonucleolytic RNA decay through the so-called RNA degradasome consisting of ribonuclease E (RNAseE), 3′-exoribonuclease polynucleotide phosphorylase (PNPase), RNA helicase (RhlB) and enolase [66]. In eukaryotes, most cytoplasmic mRNA degradation begins with shortening of the poly(A) tail by deadenylases followed by removal of the 5′ cap structure by the decapping enzymes, Dcp1 and Dcp2 [7]. The decapped intermediates are then degraded either by an exonuclease (Xrn1p) in the 5′ to 3′ direction, or by the cytoplasmic exosome in the 3′ to 5′ direction. In addition, eukaryotes own specialized pathways that target mRNAs containing premature termination codons (nonsense-mediated decay pathway, NMD) that lack translational termination codons (non-stop decay pathway, NSD) or that bear stalled ribosomes (no-go decay). Degradation of specific mRNAs can also be initiated by endonucleolytic cleavage through sequence-specific endonucleases, or in response to miRNAs or siRNAs [7]. Numerous cis-acting elements located in the 5′-UTR, the coding sequence (CDS) or in the 3′-UTR of mRNAs can function as binding sites for RNA-binding proteins that regulate decay [67]. For instance, AU-rich elements (AREs), conserved sequences found in the 3′-UTR of nearly 5% of all human genes, interact with specific ARE-binding proteins that stabilize the RNA or promote mRNA degradation by recruiting the RNA decay machinery.

Genome-wide measurements of mRNA decay. Global analysis of mRNA decay rates following arrest of transcription has been performed in all three kingdoms of life: bacteria [6870], archea [71] and eukaryotes including yeast [52, 72], plants [73], and human cells [74].

In eubacteria and archea, mRNA decay proceeds rapidly, with a median half-life of ∼5 min. Two main characteristics seem to be evolutionarily conserved: adjusted decay rates for functionally related groups of messages, and the inverse correlation between the half-lives and the relative abundances of transcripts. As seen in the archaebacterium Sulfolobus, transcripts encoding proteins involved in growth-related processes, such as transcription, tRNA synthesis, translation and energy production, generally decay rapidly (t1/2≤4 min), whereas those encoding products necessary for maintaining cellular homeostasis are relatively stable (t1/2>9 min). Short half-lives of highly abundant mRNAs imply high-turnover rates and thus, enable cells to rapidly reprogram gene expression upon changes in environmental conditions [68, 71]. Interestingly, the half-life and abundance of distinct classes of transcripts appear to depend on particular RNA degradosome components. This finding suggests the existence of structural features or biochemical factors that distinguish different classes of mRNA targeted for degradation [75]. This may also apply to specific growth phases, as seen in Streptococcus where certain mRNAs become sensitive to stationary-phase-induced PNPase [76].

Evidence for the existence of coordinated RNA decay regulons in eukaryotes was obtained from global investigation of mRNA decay profiles in yeast and human cells. Here, transcription was shut-off using cells that bear a temperature-sensitive allele of RNA polymerase II or through chemical inactivation, and the decay of thousands of genes was monitored with DNA microarrays over a time course [52, 72, 74]. Strikingly, mRNA half-lives among components of macromolecular complexes in yeast were significantly correlated [52]. For instance, the transcripts for the four histone mRNAs were among the least stable with closely matched, rapid decay rates (t1/2=7±2 min); the 131 mRNAs coding for ribosomal proteins had average decay rates (t1/2=22±6 min), and the four components of the trehalose phosphate synthetase complex were amongst the longest lived messages (t1/2=105±15 min) [52]. The examination of decay rates in human cells revealed similar mRNA-turnover patterns among orthologous genes, indicating the presence of evolutionary conserved programs of RNA stability control [74]. Transcripts encoding metabolic proteins have a tendency for longer half-lives, whereas transcripts encoding transcription factors or ribosome biogenesis factors are relatively short lived [52, 72]. Interestingly, it appears to be a universal feature that average transcript half-lives are roughly proportional to the length of the cell cycle: cell-cycle lengths of 20, 90, and 600 min correspond to median mRNA half-lives of 5, 21 and 600 min for E. coli, S. cerevisiae and human cells, respectively [74].

DNA microarrays have also been applied to investigate specialized decay pathways, such as NMD and nuclear exosome-mediated decay. Mutants for NMD factors Upf1, Nmd2 and Upf3 alter the mRNA levels of an overlapping set of ∼600 messages (10% of the transcriptome) in yeast [77, 78]. However, mRNA levels in nmd strains may also be the result of indirect effects because transcription factors are also targeted through NMD and therefore, Guan et al. [78] dissected direct from indirect targets of NMD by profiling global RNA decay rates in nmd strains. About half (300 transcripts) are likely to be direct NMD targets decayed through 5′ to 3′ degradation by Xrn1p. NMD-sensitive transcripts tend to be both non-abundant and short-lived, with one third of them coding for proteins that are connected to two central themes: first, replication and maintenance of telomeres, chromatin-mediated silencing and post-replication events related to the transmission of chromosomes during the cell division cycle; and, second, synthesis and breakdown of plasma membrane components, including transport of macromolecules and nutrients, and cell wall proteins [78]. Genome-wide analyses have also identified potential RNA substrates for the nuclear exosome [79]. More than 300 mRNAs showed altered expression levels in different exosome mutants. Several genes, located downstream of independently transcribed snoRNA genes, were overexpressed in exosome mutants. Further analyses suggested that many snoRNA and snRNA genes are inefficiently terminated. Such read-through transcripts into downstream ORFs are normally rapidly degraded by the exosome and, hence, could explain their enrichment in exosome mutants.

A couple of studies investigated the implications of specific RBPs on RNA turnover. Global mRNA turnover in mutant cells was monitored through gene expression analysis expecting adverse effects on subsets of messages. Grigull et al. [72] examined the effects of deletions of genes encoding deadenylase components Ccr4p and Pan2p and putative RNA-binding proteins Pub1p and Puf4p after inhibition of transcription by chemicals and/or heat stress. This examination showed that Ccr4p, the major yeast mRNA deadenylase, contributes to the degradation of transcripts encoding both ribosomal proteins/rRNA synthesis and ribosome assembly factors largely mediating the transcriptional response to heat stress. Pan2p and Puf4p also participate in degradation of these mRNAs, while Pub1p preferentially stabilized transcripts encoding ribosomal proteins. Notably, the Puf4-affected genes correlate with biochemically identified targets of Puf4p [80]. A second study focused on Pub1p, a yeast RNA-binding protein thought to destabilize mRNAs through binding to AU-rich sequences in 3′-UTRs [81]. Global decay profiles in pub1 mutants revealed a significant destabilization of proteins involved in ribosomal biogenesis and cellular metabolism, whereas genes involved in transporter activity demonstrated association with the protein, but displayed no measurable changes in transcript stability [81]. Therefore, in this case, the direct targets only partially related to the functional outcome under specific physiological conditions. This could be mediated through additional RNA protein interactions forming a network through combinatorial binding. Finally, Foat et al. [82] combined a computational and experimental approach to identify transcripts that are destabilized under specific environmental conditions (sugar sources) by yeast mRNA stability regulators. For Puf3p, which was known to primarily associate with mRNAs coding for mitochondrial proteins [80], they computationally inferred and experimentally verified target destabilization in the presence of glucose, as some of these mRNAs were up-regulated in puf3 mutants grown in a non-repressing carbon source, but down-regulated in a repressing carbon source [82].

Mammalian cells have evolved a variety of specific mRNA decay programs that play important roles in medically relevant processes such as inflammation, hypoxia and cancer pathogenesis. For example, the expression of diverse cytokines is differentially regulated after T cell activation, and glucocorticoids inhibit inflammation through destabilization of proinflammatory transcripts like cyclooxygenase-2 [67]. Global mRNA decay profiles revealed mRNAs, which appear specifically regulated by these programs. For instance, in resting T lymphocytes, the majority of transcripts are stable with half-lives of more than 6 h, but a small proportion (∼3%) of expressed transcripts exhibits rapid decay with half-lives of less than 45 min [83]. These short-lived transcripts encode a variety of regulatory proteins such as cell surface receptors, transcription factors and regulators of cell growth and apoptosis. Su et al. [84] focused on the massive degradation of transcripts occurring during meiotic arrest at the germinal-vesicle (GV) stage, and found that degradation is apparently not promiscuous but preferentially affects specific groups of messages. In particular, transcripts involved in processes associated with meiotic arrest at the GV stage and the progression of oocyte maturation, such oxidative phosphorylation, energy production, and protein synthesis, were rapidly degraded, whereas those encoding participants in signaling pathways maintaining the oocyte in the MII-arrested state were among the most stable. In conclusion, these studies exemplify that stimulus-dependent transcript destabilization is an important mechanisms for controlling gene expression in a coordinated manner.

Many activation-induced transcripts contain AREs in the 3′-UTR [85]. The presence of these motifs in mRNAs often correlates with shifts in the distribution of decay rates; however, their sole presence cannot reliably predict turnover behavior. ARE-binding proteins may therefore differentially determine the fate of mRNA depending on the cellular and environmental context [85]. Tristetraprolin (TTP), a well-known ARE-binding protein, has several characterized physiological target mRNAs including tumor necrosis factor (TNF)-α, granulocyte-macrophage colony-stimulating factor, and interleukin-2β. Micro-array analysis of RNA obtained from wild-type and TTP-deficient fibroblast cell lines identified 250 transcripts with altered decay rates, some of them containing conserved TTP binding sites [86]. The RNA-binding protein T-cell intracellular antigen 1 (TIA-1) functions as a post-transcriptional regulator of gene expression and aggregates to form stress granules following cellular damage. TIA-1 regulates mRNAs for proteins involved in inflammatory responses such as TNF-α and cyclooxygenase 2. Immunoprecipitation (IP) of TIA-1-RNA complexes, followed by microarray-based identification and computational analysis of bound transcripts revealed at least 300 potential targets, many of them bearing an U-rich motif [87].

In conclusion, global analysis of mRNA turnover underlines the importance of RNA decay in the control of mRNA levels and strongly suggests the presence of specific RNA turnover programs. mRNA decay certainly involves combinatorial interactions of RBP enabling stimulus-dependent decay programs through the integration of diverse signals. Besides temporal control, RNA decay may also occur spatially restricted, as seen with Drosophila IRE1, a protein activated during the unfolded protein response in the endoplasmic reticulum directing the decay of specific subset of mRNAs, many of which encode plasma-membrane proteins [88]. Moreover, still rather unexplored is the role of P-bodies and stress granules as storage place for untranslated mRNA and site for mRNA degradation. Perhaps different subtypes of P-bodies exist for subgroups of RNAs? At least the recent observation that ARE containing mRNAs are localized to specific cytoplasmic granular structures containing exosome subunits that are distinct from P-bodies or stress granules, support the idea of specialized structures for storage or degradation of distinct groups of mRNAs [89].

Identification of specific RNA-protein interactions

Putative RNA-binding proteins comprise 3 –11% of the proteomes in bacteria, archea and eukaryotes [90]. The large number of RBPs in all kingdoms of life may merely reflect the ancient origin of RNA regulation, which is possibly the most evolutionary conserved part of cell physiology. RBPs often contain distinct RNA-binding domains that specifically interact with sequences or structural elements in the RNA. Approximately one hundred protein domains associated with RNA metabolism have been described to date, half of them believed to have originated at early stages in evolution, whereas others, such as the RNA recognition motif (RRM), are exclusively present in eukaryotes and therefore may have been acquired later in evolution [90].

A successful approach to globally identify the in vivo RNA targets of RBPs involves immunoprecipitation or affinity purification of epitope-tagged proteins followed by the analysis of associated RNAs with DNA microarrays or by sequencing (Fig. 2B). In a pioneering study, Keene and colleagues [91] used this ‘ribonomics’ approach to study RNAs associated with three RBPs in a cancer cell line. Although low-density arrays were used to identify the bound mRNAs, each RBP was associated with a distinct subset of the mRNAs present in total cell lysate. Moreover, these subsets appeared to change after cells were induced to differentiate. These results led to the proposal that groups of mRNAs encoding functionally related proteins are organized as so-called ‘post-transcriptional operons’ [92]. In analogy to prokaryotic operons, this model predicts that specific RBPs may coordinate groups of mRNAs coding for functionally related proteins in eukaryotes. Cis-acting elements in the mRNA may provide the means to mimic the coordinated regulatory advantages of clustering genes into polycistronic operons [16, 92].

A prime example for the coordination of functional related transcripts by specific RBPs is represented by the Pumilio-Fem-3 binding (PUF) proteins [80, 93]. PUF proteins comprise a conserved family of structurally related RBPs that negatively regulate gene expression of specific mRNAs [94]. Applying DNA microarrays to identify their RNA targets revealed that each of the five yeast PUF proteins associated with distinct groups of 40 to 220 different mRNAs with striking common themes in the functions and subcellular localization of the proteins they encode: Puf3p binds nearly exclusively to cytoplasmic mRNAs that encode mitochondrial proteins; Puf1p and Puf2p interact preferentially with mRNAs encoding membrane-associated proteins; Puf4p preferentially binds mRNAs encoding nucleolar ribosomal RNA-processing factors; and Puf5p is associated with mRNAs encoding chromatin modifiers and components of the spindle pole body [80]. The results were further corroborated by the identification of distinct sequence motifs in the 3′-untranslated regions of the mRNAs bound by Puf3, Puf4, and Puf5 proteins. A physiological relation between Puf3p and its mRNA targets has also been observed — as suggested from its association with mRNA-encoding mitochondrial proteins, puf3 mutant cells showed a slow-growth phenotype on non-fermentable carbon sources indicative of a functional connection to mitochondrial physiology [80].

Genome-wide identification of RNAs associated with the orthologous PUF protein from Drosophila melanogaster, called PUMILIO, revealed distinct clusters of mRNAs in embryos and in ovaries of adult flies [93]. More than 1000 messages were significantly associated with the protein. Subgroups of these Pum-associated mRNAs had commonalities, such as function in the anterior-posterior patterning system, and the subunits of the vacuolar H-ATPase. Moreover, a characteristic sequence motif was present in 3′-UTRs of PUMILIO-bound mRNAs resembling the one previously identified for the yeast Puf3 protein. [93]. Hence, the data obtained from the yeast and Drosophila studies provided an additional source for considering their evolution. For instance, conservation of amino acid residues in the RNA-binding domain (the PUM-homology domain) between homologous PUF proteins correlated with identified core motifs in 3′-UTR of mRNA targets. However, the proteins encoded by the mRNA targets appeared not to be particularly conserved. This discordance suggested that acquisition or loss of RBP binding motifs in UTRs of genes may provide a surprisingly fluid evolutionary mechanism to modify post-transcriptional regulatory connections [93].

Ribonomic studies have now been conducted for more than 30 specific RBPs (Table 1). The results form these studies generally support and extend the proposed ‘post-transcriptional operon’ model. Each of the analyzed RBPs has a unique RNA binding spectrum comprised of 20–1000 distinct transcripts that often share functionally related themes. The spectra of targets overlap with other RBPs, suggesting combinatorial binding of RBPs. Occasionally, sequence or structural elements could be identified among mRNA targets using bioinformatics tools, and novel physiological consequences were discovered (e.g., [95]). The ribonomics approach has recently been implemented on the argonaute (Ago) protein family to discover novel mRNAs that potentially undergo miRNA dependent regulation [96, 97]. Although the number of detectable Ago-associated mRNAs was low (∼90 messages) compared to the thousands of genes expected to undergo miRNA dependent regulation, the comparison of Ago-associated mRNAs in wild-type and miRNA mutants may provide a tool to decipher miRNA-specific targets [97]. Besides specific RBPs, ribonomic approaches have also been applied to ‘general’ RNA-binding proteins for the identification of messages expressed in particular tissues or cell-types. Affinity-tagged poly(A) binding protein (PABP) was expressed with tissue-specific promoters to identify muscle- or ciliated sensory neuron-specific transcripts in the worm Caenorhabditis elegans [98, 99], and mRNAs in photoreceptor cells of flies [100]. The method was also used to measure gene expression of endothelial cells that were co-cultured with breast tumor cells [101]. A similar approach with tagged ribosomal proteins may become another tool to determine gene expression in specific cells [102, 103].

Table 1 Global identification of RNA targets for specific RNA-binding proteins (RBPs).

Final conclusions

The application of genomic tools to study post-transcriptional gene regulation suggests additional levels of coordination and regulation that are beyond the traditional view of ‘equally treated’ cellular mRNAs that are similar processed, exported, and eventually translated in the cytoplasm [14–16]. The decay, localization and translation of mRNA seem to undergo coordinate control by regulatory programs, which may be embedded in a multifaceted post-transcriptional regulatory system. The properties of this system are controlled by RNA-binding proteins or non-coding RNAs (e.g., microRNAs [104]) that coordinate functionally related sets of mRNAs through binding to sequence elements in the RNA. Considering the hundreds of RBPs encoded in eukaryotic genomes, post-transcriptional control may be comparable in its richness and complexity to transcriptional regulatory systems. This provides a means to link RNA regulation to other cellular regulons such as signal transduction pathways allowing rapid and efficient reprogramming of gene expression in response to changing physiological conditions.

Further analysis of RBPs and their target RNAs may finally lead to a map of the proposed post-transcriptional regulatory system. However, besides the architecture, it will also be important to study the plasticity and dynamics of this regulation by measuring how it reacts in response to environmental or developmental changes, and how it is perturbed in certain diseases [35, 105]. Finally, a major challenge will be to connect the different levels of gene expression systems though large-scale data integration [39].