Introduction

Generally, we consider humans and most animals to be diploid, i.e., that two copies of nearly all nuclear genes and chromosomal regions are present in all of their cells except gametes. However, many plants and some animals are polyploids, i.e., their cells contain one or more additional copies of their entire genome. In humans, polyploid embryos are sometimes formed by fertilization of oocytes by more than one sperm or by faulty meiosis that results in diploid gametes. These embryos usually die early, few surviving until birth. Mature humans are never true polyploids, but some of our tissues contain extra sets of chromosomes. For example, the liver contains cells with up to eight copies of all chromosomes (octoploidy). Variations in the number of individual chromosomes (aneuploidy) are more common than polyploidy in humans. Aneuploidies are formed when chromosomes fail to segregate properly during cell division (non-disjunction) and severe syndromes arise when embryos are formed by aneuploid gametes. Classic examples include Down's syndrome (caused by triploidy of chromosome 21) as well as Turner's and Klinefelter's syndromes (caused by aberrant numbers of X chromosomes).

Following the recent revolution in whole genome analysis techniques (e.g., microarrays and next generation sequencing), it has become clear that the copy number of parts of chromosomes (Fig. 1) varies quite extensively in human populations. In fact, if two individuals are randomly picked from any natural population of any species, there are likely to be dozens to hundreds of copy number differences of their functional genes (Schrider and Hahn 2010). Due to resolution limitations of micro-array techniques, most studies have focused on copy number variations (insertions/deletions) (CNVs) of sequences ranging in size from ca. 1-k bases to several mega bases. Coming population studies using sequencing approaches will likely give a more complete understanding of the extent of CNV in nature. However, based on currently available evidence, it has been estimated that about 13% of the human genome varies in copy number and the per-locus mutation rate seems to be higher for CNVs compared to SNPs (Stankiewicz and Lupski 2010).

Fig. 1
figure 1

Schematic illustration of CNV (copy number variation). a Chromosomal CNV; chromosomes 1, 2, and 3 are in diploid, trisomic, and monosomic states, respectively. b Segmental CNV; chromosome 1 represents the wild-type condition, while chromosomes 2 and 3 illustrate segmental aneuploidies. Chromosome 2 shows a deletion leading to a segmental haploid condition, and chromosome 3 shows a duplication leading to a segmental triploid condition. Note that the duplication (shaded box) may be located on any chromosome

It is clear that eukaryotic (and hence human) evolution has been significantly shaped by CNVs. Therefore, CNVs presumably do not necessarily have strong negative effects on fitness. On the other hand, changes in numbers of gene copies should generally be proportionally reflected in the RNA levels. Thus, since higher organisms invest substantial effort in maintaining appropriate expression levels for balancing their complex physiological processes, changes in copy number should theoretically be selected against. However, considering the importance of CNVs for evolution, we also expect systems to have evolved that compensate for the differences in gene doses accompanying CNVs, and that these conflicting requirements must be balanced.

Relationship between gene dose and transcription responses

Until recently, we have had very little insight into the ways that changes in numbers of chromosomes or parts of chromosomes affect gene expression levels, in large part because obtaining accurate measurements of genome-wide expressional effects of CNVs is difficult. Thus, to elucidate the effects of CNVs on gene expression, we need good model systems. Fortunately, we now have techniques to accurately measure global gene expression, and Drosophila melanogaster represents a good and well-annotated model system. Therefore, this review mainly focuses on recent discoveries in Drosophila.

In D. melanogaster, a vast number of well-defined deletions and duplications are available, and new ones can easily be induced in any region of interest. These chromosomal rearrangements can then be observed in isolation or in combination with other rearrangements. Bridges (1935) divided the D. melanogaster genome into 102 regions based on chromosomal banding patterns. These regions have a median size of just over one mega basepairs, and deletions spanning more than one such region (about 1% of the genome) are usually lethal (Lindsley et al. 1972), although when one copy of a single gene is deleted or mutated, one rarely detects any phenotypic effect. The reduction in viability or lethality seen when many genes are present in only one copy may be due to several factors. It could be due to the additive effects of many slightly dosage-sensitive genes, the breakdown of gene networks, or the uncovering of recessive lethal alleles. In general, organisms seem to tolerate extra copies of genomic regions much better than a reduced number of copies (Lindsley et al. 1972).

Few attempts have been made to study the effects of copy number variation on transcription levels in any detail. However, some attempts have been made in human aneuploidies (see for instance FitzPatrick et al. (2002)). Such studies generally examine changes in the expression of all genes with altered copy levels. However, genes that are inactive under the test conditions and genes with expression levels below the detection level will appear to be unaffected by the copy number, complicating attempts to assess the variations and their biological significance. For instance, Gupta et al. (2006) showed that a threefold change in copy number of a genomic region in flies only resulted in a ~1.4-fold change in average mRNA levels, but non-expressed genes were included in this study, thus the results are difficult to interpret. Later, Stenberg et al. (2009) showed that both single-copy and three-copy genomic regions are expressed more closely to wild-type levels than would be expected if mRNA levels correlated perfectly to gene dose. The effect was detectable even after taking inactive genes into account, and haploid regions seem to be more strongly buffered (expressed at 64% of wild type, diploid levels) than regions in three copies (expressed at 146% of wild-type levels). Interestingly, genes that are not ubiquitously expressed in all tissues are more strongly buffered (~70% of wild-type levels) when in one copy, and ubiquitously expressed genes are more strongly buffered (~138% of wild-type levels) when in three copies. Recently, similar effects were observed by McAnally and Yampolsky (2010). In addition, RNA-Seq and DNA-seq analysis has shown that >43 Mbp of the genome of Drosophila S2 cells line is aneuploid (Zhang et al. 2010). The cited authors found that the copy number varies from one to five, and they detected a general buffering of transcript levels. These findings imply that aneuploid regions are buffered at the RNA level. The observations by Stenberg et al. (2009) implied the involvement of a general buffering system acting on the aneuploidy region rather than feedback regulation of individual genes at a gene-by-gene level. Two possible explanations of these findings are that activators or repressors recruited by feedback-regulated genes could locally spread to neighboring genes, thus causing a general buffering effect for aneuploid regions, or currently unknown buffering mechanisms may recognize and target aneuploid regions of the genome.

The findings summarized above show that buffering of aneuploid regions does not restore wild-type expression levels completely. This could be due to the evolutionarily conflicting nature of such systems. On one hand, chromosomal rearrangement is clearly an important driving force of evolution, hence cells and organisms need to be able to cope with them. On the other hand, a perfect buffering system would risk destabilizing the genomic integrity since the selection against aneuploidy would be lowered. Evidently, the organization of genes along chromosomes is far from random. Expressional clustering is common, for a review, see Hurst et al. (2004), and there is also evidence of functional clustering, see for instance Petkov et al. (2005). Many mechanisms, acting at several levels, could be involved in the buffering of aneuploid regions. Clearly, buffering at the RNA level occurs, and a yeast selection study recently showed that mutations inducing ubiquitin-proteasomal degradation reduces the deleterious effects of extra copies of entire chromosomes, suggesting that buffering also acts at the protein turnover level (Torres et al. 2010). Thus, a number of mechanisms working at different levels can probably reduce the deleterious effects of chromosomal rearrangements, and we are just beginning to elucidate these mechanisms and their effects.

There are also two well-studied examples of buffering of entire chromosomes. These are the dosage compensation of the single male X chromosome in Drosophila and the compensation of the heterochromatic fourth chromosome (Fig. 2). Both are reviewed below, and we use here the term compensation (rather than buffering) for these systems as their evolution has solved specific dosage and expression problems associated with distinct parts of the genome. In contrast, the buffering system reviewed above is general. However, many mechanisms might still be shared between buffering and the two compensation examples.

Fig. 2
figure 2

Immunostaining of polytene chromosomes in Drosophila melanogaster reveals components of the two chromosome-wide targeting and gene regulatory systems. The MSL-C targeting the entire male X chromosome is visualized by an antibody recognizing MSL3 (green). The fourth chromosome is targeted by POF (red) and DNA is counterstained with DAPI (blue)

Dosage compensation

In many species, sex determination is associated with heteromorphic sex chromosomes, for example, the X and Y chromosome pair in mammals, which has arisen from autosomes. A Y chromosome can evolve when a gene on a chromosome acquires the ability to determine the male sex. This newly formed chromosome, the proto-Y, will only passage through a single sex. Mutations that favor the male sex are predicted to be accumulated on the proto-Y chromosome, which will eventually suppress recombination, leading to the further accumulation of mutations and mobile elements and eventually loss of genes and degeneration (Charlesworth 1996). The formation of heteromorphic sex chromosomes has occurred independently several times in evolution, and their formation is accompanied by an expression problem that must be solved by the co-evolution of mechanisms that compensate this unbalance. Since most genes on the X chromosome should be expressed at the same amounts in males and females, the gradual degeneration of the Y chromosome will select for systems that compensate for the different doses of the genes located on the X chromosome, as reviewed by Larsson and Meller (2006), Mank (2009) and Vicoso and Bachtrog (2009). This dosage-compensating system should restore the balance between expression from the X chromosome and the autosomes and also compensate for the imbalance of X chromosome between males and females. Here, we focus on Drosophila, in which chromosome evolution and dosage compensation have been studied in detail.

The gradual degeneration of the Y chromosome is compatible with an emerging need for dosage compensation at a gene-by-gene level. However, it is not known whether the degeneration of the Y chromosome resulted from a gradual accumulation of slightly deleterious mutations scattered along the chromosome or largely from strongly deleterious mutations of individual genes. These possibilities have different implications for the selective forces promoting dosage compensation, since the former should theoretically favor compensation mechanisms acting on blocks of genes, while the latter should favor mechanisms acting on individual genes (Vicoso and Bachtrog 2009). The required up-regulation of genes in the heterogametic sex together with tolerance of overexpression in the female sex, will determine the levels of expression regulation that are most effective for evolutionary change. Dosage compensation mechanisms can potentially evolve at diverse levels of expression regulation, but in principle, we would expect them to be as sex-specific as possible and only act on active genes. An ideal compensatory system should not cause general de-repression of all X-linked genes. We delineate some of the levels that could potentially be targets for evolution of dosage-compensation mechanisms and discuss the support for them below.

A progressive degeneration of the proto-Y chromosome will inevitably lead to evolutionary pressure at all levels of expression to compensate for the losses of functional gene copies, which should intuitively develop initially (as the aneuploidy emerges) largely on a gene-by-gene basis. An early view in line with this hypothesis was that cis-acting elements close to individual genes are important for this response (Baker et al. 1994). For individual genes, e.g., the white gene, such cis-acting elements have indeed been reported (Qian and Pirrotta 1995). Following the identification of the specific genes on the X chromosome that are dosage compensated in genome-wide investigations in combination with extensive sequence analysis, no unambiguous evidence for cis-acting elements for individual genes on this chromosome has been reported (Alekseyenko et al. 2006; Dahlsveen et al. 2006; Gilfillan et al. 2006; Alekseyenko et al. 2008). However, although no cis-acting elements have been identified as a general feature of the dosage-compensated genes, significant overrepresentation of a sequence motif targeted by the DREF (DNA replication element factor) protein has been reported in the 5′ region of X chromosomal genes targeted by the male-specific lethal complex (MSL-C) (Legube et al. 2006). The MSL-C is a ribonucleoprotein complex that targets X chromosome genes in males, and its targeting correlates to and can explain most of the observed dosage-compensating effects on the male X chromosome, as reviewed by Straub and Becker (2007), Gelbart and Kuroda (2009), and Hallacli and Akhtar (2009). The complex consists of five proteins, each required for male viability (MSL1, MSL2, MSL3, MLE, and MOF), together with two redundant non-coding RNAs: roX1 and roX2. The MSL-C only forms in males, due to the sex-restricted expression of MSL2, where it binds within the gene body of active X chromosome genes and the acetyltransferase MOF acetylates H4K16. This acetylation becomes strongly enriched in X chromosomal genes of males, and the prevailing models assume that enrichment of H4K16ac leads to decompaction of the chromatin fiber, which then increases gene transcription. Recent data suggest that the MSL-C also has intrinsic properties that constrain the activation potential of MOF, resulting in the required twofold activation of target genes (Prestel et al. 2010b). It should be noted that in msl2 mutants or in RNAi-mediated knock-downs of msl2 or mof, the male X chromosome expression decrease to ~75% of wild-type expression, suggesting that MSL-C does not mediate the entire twofold activation (Hamada et al. 2005; Deng et al. 2009; Zhang et al. 2010).

The prevailing model of the targeting is that the MSL-C is recruited in a sequence-dependent manner to 150–300 high-affinity MSL recognition elements (MRE) (Alekseyenko et al. 2008). This is followed by a spreading to neighboring genes, which depends on the local MSL complex concentration (Dahlsveen et al. 2006), X chromosome location (Gorchakov et al. 2009), and active transcription (Sass et al. 2003; Larschan et al. 2007). The mechanisms of the MSL-C have been reviewed in detail (Gelbart and Kuroda 2009; Hallacli and Akhtar 2009; Prestel et al. 2010a) and will not be further discussed here. However, an important aspect in this context is the level of expression regulation at which MSL-C acts. It was proposed by Lucchesi (1998) that the MSL-C elevates expression output by facilitating transcription elongation. Genome-wide mappings have shown that the MSL-C and MSL-C-dependent H4K16 acetylation is enriched within gene bodies and biased to their 3′ ends (Alekseyenko et al. 2006; Gilfillan et al. 2006; Alekseyenko et al. 2008; Kind et al. 2008; Prestel et al. 2010b). These observations support the view that MSL-C's main level of action is at transcription elongation, which recently also gained experimental support (Larschan et al. 2011). It should be noted that in principle, the MSL-C may very well act at more than one level.

Although the MSL-C can explain a large extent of the observed dosage compensation in Drosophila, and its main level of action is likely to be in transcription elongation, we still need to consider the likelihood that evolutionary pressures have also acted at other levels of expression regulation. We may, for example, consider male X chromosome-specific processes acting on splicing, mRNA release, and post-processing even if no specific evidence for these possibilities have been presented as yet. Furthermore, eukaryotic genomes are spatially organized and sub-nuclear positioning may have important functions for accessibility to transcription machineries and positioning in relation to nuclear pores, the primary channels in the nuclear envelope through which hnRNP molecules (for instance) pass. It has been shown that certain NUP proteins, e.g., NUP98, bind active genes (Capelson et al. 2010; Kalverda et al. 2010). If the binding of NUPs is important for expression output and their binding is sequence-dependent, this would also be a level subject to evolutionary pressure. Although current data are conflicting (see Grimaud and Becker 2009), a biochemical association between NUP proteins and the MSL-C has been suggested, and the nucleoporins Nup153 and Megator have proposed roles in dosage compensation (Mendjan et al. 2006; Vaquerizas et al. 2010).

In principle, evolution of dosage compensation could also act at the level of mRNA stability and decay. Although this possibility has not been investigated in Drosophila as yet, in mammals, it has been suggested that autosomal genes are more likely to be subject to nonsense-mediated mRNA decay (NMD) than X chromosomal genes. This is suggested to have a role in the increased X/A expression ratio observed in mammals (Yin et al. 2009). Finally, equalization of expression output from the single male X chromosome to that of the two X chromosomes in females may also involve mechanisms that influence their translation efficiency and the activities of the encoded proteins.

Clearly, most of the observed dosage compensation can be explained by the binding of MSL-C to the gene bodies of X chromosome genes in males and the associated acetylation of H4K16, which most likely facilitates elongation. Nevertheless, as aneuploidies emerge, e.g., via proto-Y chromosome degeneration, evolutionary pressure will act at many levels. Importantly, MSL-C targets gene bodies and new blocks of genes introduced to the X chromosome are relatively quickly adapted by MSL-C targeting (Marin et al. 1996). We speculate that MSL-C targeting may be considered as a secondary adaption to the monosomic state. In contrast, as aneuploidies emerge as a consequence of Y chromosome degeneration, adaptation to the dose changes seems likely to be based more on the regulation of individual genes. There may therefore be two major stages in the evolution of dosage compensation mechanisms. The first may be a primary adaptation, in which selection acts at all levels listed above and solves the problems associated with the emerging monosomic condition at a gene-by-gene level. Different genes will have very different requirements for compensation, thus the selection pressure for their compensation will vary. In the next stage, a uniform system may evolve, e.g., the MSL-C that compensates for the changed doses of these emerging set of genes. A likely scenario is that the MSL-C originated from a general buffering system that subsequently evolved X chromosome specificity. This may also explain the notion that upon the addition of (for instance) a “new” X chromosomal arm, MSL-C is recruited, but only when the homolog has degenerated (Bone and Kuroda 1996; Marin et al. 1996). The involvement of more than one system in dosage compensation is also evident from the fact that the MSL-C only supports a 1.35-fold rather than the full, required twofold. Interestingly, combination of this “MSL-C-dependent” effect with the 1.4-fold increase that seems to be general for all aneuploid regions provides very close to the required twofold effect (Stenberg et al. 2009; Prestel et al. 2010a; Zhang et al. 2010). The best evidence for a link between general buffering and the evolution of chromosome-wide gene regulation comes from the studies of POF and the fourth chromosome.

Autosome-specific gene regulation

For a long time, chromosome-specific targeting was only recognized for sex chromosomes and, accordingly, chromosome-specific gene regulation was considered only to function in dosage compensation. However, in D. melanogaster, a second chromosome-wide targeting system was described following the discovery of the protein Painting of fourth (POF) (Larsson et al. 2001). POF specifically targets, or “paints”, the fourth chromosome in D. melanogaster, both male and female. Since the fourth chromosome is an autosome, the discovery of POF serves as a starting point to extend the exploration of chromosome-wide regulatory functions to autosomes.

For comparing chromosomes between different Drosophila species and to consider the evolution of chromosome-wide gene regulation mechanisms, it is convenient to introduce Muller's naming of chromosomes. In Muller's scheme, the chromosome arms are denoted elements A, B, C, D, E, and F (Muller 1940). By definition, the X chromosome of D. melanogaster is named element A, chromosome arms 2L, 2R, 3L, and 3R are named element B, C, D, E, respectively, and the fourth chromosome in D. melanogaster is named element F. The major chromosome arms in D. melanogaster, i.e., elements A–E, are of similar length while the fourth chromosome is much shorter and thus also sometimes designated the dot chromosome. The presence of the F element as a unique small dot chromosome is surprisingly well conserved within the genus Drosophila (Ashburner et al. 2005). In a few described species, the F element is fused to one of the major chromosome arm. In Drosophila willistoni, the F element is fused to element E (Papaceit and Juan 1998), and in Drosophila busckii and Scaptodrosophila lebanonensis, the F element is fused to element A and thus is part of the X chromosome in these two species (Krivshenko 1955; Papaceit and Juan 1998). This may actually represent the ancestral state, i.e., the F element may originally have been part of the X chromosome (Fig. 3).

Fig. 3
figure 3

Phylogenetic relationships of studied species with simplified karyotypes of males and females. The X chromosome is indicated as a rectangle and the F element as an ellipse. MSL-C targets are indicated in green and POF targets in red (Bone and Kuroda 1996; Marin et al. 1996; Larsson et al. 2001; Larsson et al. 2004). In D. willistoni, the F element is fused to an autosome (gray rectangle). The conservation of POF binding to the F element in species such as D. melanogaster, D. pseudoobscura, and D. virilis, strongly indicates that this autosome-specific regulatory system has an important function. The integration of the F element in the X chromosome of D. busckii suggests that it originated as part of the X chromosome and that POF was part of an ancient dosage compensation system. The interrupted green on the D. busckii male X chromosome indicates that this chromosome is enriched in H4K16ac, but MSL-C has not been detected on it. In D. ananassae and D. malerkotliana, both POF and MSL-C are present on the male X chromosome (red/green), but only POF on the F element in both sexes

Evolution of the fourth chromosome

In most Drosophila species examined to date, the male X chromosome is targeted by the MSL-C (Bone and Kuroda 1996; Marin et al. 1996) and the F element is targeted by POF in both males and females (Larsson et al. 2004). These two systems represent the two known chromosome-specific targeting and gene regulatory systems in Drosophila. The POF protein, similarly to the MSL-C, decorates an entire chromosome by binding to interbands (Fig. 2). There is evidence suggesting a relationship between the X chromosome and the F element. The most striking indication of such a relationship is probably that in D. busckii, the F element is located at the base of the X and the Y chromosomes, and there is no free F element (dot chromosome) in this species. Instead, the X chromosome consists of the F element and the A element separated by the nucleolus organizer (NO). In D. busckii, a euchromatic-banded region in polytene chromosomes is also present on the Y chromosome, suggested to be an F element complement (Krivshenko 1952; Krivshenko 1955; Krivshenko 1959). Further support for the suggested relationship between the F element and the X chromosomes is provided by the minor role in sex determination that has been described for the fourth chromosome. Sexual fate in Drosophila is predicted by the ratio of X chromosomes to autosomes, or more specifically, the number of X chromosomes relative to the number of precellular nuclear divisions (Cline and Meyer 1996; Marin and Baker 1998; Erickson and Quintero 2007). In D. melanogaster, increased dosage of the fourth chromosome will, in contrast to increased dosage of the other autosomes, shift 2X:3A intersex individuals towards female development, while decreased dosage will shift intersex individuals towards male development (Bridges 1925; Fung and Gowen 1960). Hence, in terms of sex determination, the fourth chromosome behaves more like an X chromosome than an autosome. In flies with three copies of the fourth chromosome, an increased frequency of X chromosome non-disjunction has been observed, suggesting that the fourth chromosome has an intrinsic propensity to pair with the X in meiosis (Sandler and Novitski 1956). Finally, flies with only one copy of the fourth chromosome, i.e., haplo-4 flies, are viable and fertile. This is in marked contrast to the effects of haploidy of the autosomes, but similar to those of the X chromosome. However, a few cases are known when autosomal deletions are haplosufficient even if they uncover a region that in size and gene number is comparable to the euchromatic part of chromosome 4. In summary, we can conclude that several lines of evidence support a relationship between the F element and the X chromosome, and although the most parsimonious explanation is that the F element originated from the X chromosome, this still remains elusive.

POF binding in different Drosophila species

As well as the binding of the MSL-C, the binding properties of POF have been studied in an evolutionary context, and the findings are summarized in Fig. 3. Initially, POF was found to bind specifically to the fourth chromosome in D. melanogaster, and thus identified as an autosome-specific targeting protein (Larsson et al. 2001). This autosome-specific binding is conserved in evolution, as demonstrated by the F element specificity (also detected, for instance, in Drosophila pseudoobscura and Drosophila virilis). The conservation of the autosome-specific properties of POF indicates that it confers selective advantage, and thus that this autosome-specific targeting is functional. In D. busckii, in which the F element is part of the X chromosome, but also in Drosophila ananassae and Drosophila malerkotliana where the F element is a separate chromosome, POF shows male-specific targeting of the X chromosome. In these three species, POF also colocalizes with H4K16ac, which is indicative of a relationship to X chromosome dosage compensation (Larsson et al. 2004). In D. busckii, none of the MSL proteins have so far been detected on the male X chromosome, but enrichment of H4K16ac on the male X chromosome suggests that a similar mechanism is involved (Marin et al. 1996; Larsson et al. 2004). In D. ananassae, POF colocalizes perfectly with the MSL-C on the male X chromosome and decorates the F element both in males and females. Notably, the only examined species in the genus Drosophila in which POF has not been detected is D. willistoni. Here, the F element is part of one of the major autosome arms as it is fused to element E. It is not known how long ago this F + E fusion occurred, but the F element is still retained as a unique block in D. willistoni. Taken together, these findings show that the binding of POF to the autosomic F element is conserved, and thus represents an autosome-specific targeting mechanism with an autosome-specific function. Furthermore, the binding to the male X together with the colocalization of POF to the MSL-C and/or the male X chromosome-enrichment of H4K16ac modification suggest a connection to dosage-compensation functions. Thus, it seems likely that POF originated as part of an ancient dosage-compensation mechanism.

The fourth chromosome of D. melanogaster

The fourth chromosome in D. melanogaster is an odd chromosome in many respects. It is late replicating (Barigozzi et al. 1966; Zhimulev et al. 2003) and under normal conditions, it exhibits very low levels of meiotic recombination (Hochman 1976; Sandler and Szauter 1978; Wang et al. 2002; Ashburner et al. 2005; Arguello et al. 2010). The banded region seen in polytene chromosomes corresponding to cytogenetic bands 101E–102F appears as a mosaic of unique sequences interspersed with repetitive DNA with a high content of transposable elements (Miklos et al. 1988; Pimpinelli et al. 1995; Locke et al. 1999a; Locke et al. 1999b; Kaminker et al. 2002; Stenberg et al. 2005; Slawson et al. 2006; Leung et al. 2010). The fourth chromosome is also enriched in histone modifications and chromatin-associated proteins that are enriched in heterochromatin. For example, the heterochromatin protein HP1 is enriched on the fourth chromosome, as are specific histone modification markers of heterochromatin, e.g., methylated H3K9 (Eissenberg et al. 1992; Czermin et al. 2002; Schotta et al. 2002). In principle, the entire fourth chromosome can be regarded as the “green-chromatin” according to the definition by van Steensel and coworkers (Filion et al. 2010). In line with these heterochromatic, and thus gene-silencing properties, reporter genes inserted on the fourth chromosome often show a partially silenced, variegated expression (Wallrath and Elgin 1995; Wallrath et al. 1996; Sun et al. 2000). The fourth chromosome consists of a 3–4-Mb proximal part (Locke and McDermid 1993) and a banded, sequenced part that is approximately 1.3 Mb long. Interestingly, despite its heterochromatic properties, the sequenced part of the chromosome includes 92 genes; hence, it has similar gene density to the major chromosome arms. Moreover, the average level of expression of genes on this chromosome is comparable to, or even higher than, that of genes on the other chromosomes (Haddrill et al. 2008) (Johansson, Stenberg and Larsson unpublished). The sequence composition of the fourth chromosome is also abnormal in comparison to the other chromosomes in D. melanogaster. In fact, its structure with scattered repetitive elements is reminiscent of the organization of mammalian chromosomes. Thus, the genes located on the fourth chromosome appear to be adapted to function in this hostile, repressive environment.

Chromatin environment of the fourth chromosome

In line with its enrichment of repeated sequences, the fourth chromosome is targeted by typical heterochromatin histone modifications, e.g., methylated H3K9. At least three H3K9-specific methyl transferases mediate the methylation of H3K9 in Drosophila: Su(var)3–9, SETDB1, and G9a. The most thoroughly studied of these transferases is Su(var)3–9, which was initially found in genetic screens together with Su(var)2–5 (encoding HP1) as suppressors of position-effect variegation (Schotta et al. 2003). Methylated H3K9 stabilizes the binding of HP1 to chromatin, and this modification is generally mediated by Su(var)3–9. However, in Su(var)3–9 mutants, levels of pericentric H3K9me and pericentric HP1 are reduced, except on the fourth chromosome where levels of H3K9me appear to be unaltered (Czermin et al. 2002; Schotta et al. 2002; Ebert et al. 2004). This apparent discrepancy has recently been explained by findings that the Drosophila Setdb1 controls H3K9 methylation in the fourth chromosome (Seum et al. 2007; Tzeng et al. 2007). Hence, chromosome 4 is similar to pericentric heterochromatin in its methylated H3K9 enrichment, but this enrichment is mediated by different proteins. These K9-methylated H3 tails are targeted by the N-terminal chromo-domain of HP1 (Bannister et al. 2001; Lachner et al. 2001; Nakayama et al. 2001; Jacobs and Khorasanizadeh 2002). The proposed model for HP1-mediated chromatin condensation postulates that HP1 forms a dimer through its C-terminal shadow domains, and the two chromo-domains in the dimer link two adjacent nucleosomes. The spreading is proposed to be mediated by interaction between HP1 and the histone methyl transferase Su(var)3–9 (or Setdb1), which methylates nearby H3K9 and thus propagates spreading of the methylation and HP1 interaction. Although HP1's interaction with chromatin is mainly considered to act through methylated H3K9, it should be stressed that HP1 has also been shown to interact with high affinity to the H3 histone-fold on oligonucleosomal arrays (Nielsen et al. 2001; Dialynas et al. 2006; Billur et al. 2010). This interaction may be represented by a small, but stable, fraction of HP1 associated with chromatin.

Chromosome-specific regulation of the fourth chromosome

Despite the repressive nature of the fourth chromosome, the genes located on it are properly expressed, possibly partly because of the activity of the chromosome 4-specific protein POF. In contrast to HP1, which binds the fourth chromosome as well as pericentric heterochromatin, POF is highly specific for the banded part of the fourth chromosome. In this region, POF and HP1 colocalize close to perfectly at the resolution afforded by polytene chromosome staining (Johansson et al. 2007a). The binding of POF and HP1 is interdependent, hence POF binding to the fourth is strongly reduced in HP1 mutants and HP1 binding is reduced in Pof mutants (Johansson et al. 2007a). Although POF binds to the banded, gene rich region of the fourth chromosome, the results suggest that POF requires heterchromatic pressure for its targeting. This hypothesis has been corroborated by studies showing that a translocated fourth chromosome will not be targeted by POF unless the proximal heterochromatic region is present, and conditions favor heterochromatin formation (e.g., low temperature and/or absence of a Y chromosome) (Johansson et al. 2007a).

Both POF and HP1 have been mapped on the fourth chromosome to high resolution using the ChIP-chip technique (Johansson et al. 2007b). At ChIP-chip resolution, POF and HP1 colocalization within the transcribed region of chromosome 4 genes is close to perfect. The enrichment levels of these two factors show a linear correlation, and the HP1 and POF binding levels also positively correlate with transcription levels (Johansson et al. 2007b). In addition to the binding within genes, HP1 shows a distinct promoter-binding peak in almost all genes on the fourth chromosome if they are expressed (Johansson et al. 2007a). HP1 targets on the fourth chromosome can thus be divided into two distinct sets: one within the transcribed region and one in the promoters. POF, on the other hand, is only seen within the transcribed gene region. Reduction of HP1 levels causes increased expression of chromosome 4 genes, whereas loss of Pof causes a decrease in chromosome 4 expression (Johansson et al. 2007b). In D. melanogaster, haplo-4 flies are viable and fertile. However, the expression of non-ubiquitously expressed genes on the fourth chromosome has been shown to be compensated by POF in these flies, and the lack of this compensation causes lethality (Stenberg et al. 2009). In contrast, there is very little or no compensation for ubiquitously expressed genes. The POF-mediated compensation of chromosome 4 genes is more effective than the observed buffering of segmental aneuploidies. However, these two systems show many striking similarities, e.g., in the differences in their effects on differentially expressed and ubiquitously expressed genes (Stenberg et al. 2009). A balancing model has been proposed suggesting that POF and HP1 stimulate and repress, respectively, gene expression from the fourth chromosome, thus together they provide a buffer that prevents excessive transcription fluctuation. However, if HP1 and POF are interdependent, why does their mutation have opposite effects on transcription output? It is tempting to speculate that the removal of POF causes a reduction in HP1 levels, but that some HP1, for example in the promoter peak, remains and the repressive function dominates. In the opposite situation, when HP1 is reduced, some POF may still be present, and the stimulating function may dominate. An intriguing observation is that silenced transgenes on the fourth chromosome are preferentially located in HP1-rich regions, whereas non-silenced transgenes are preferentially located in regions with low amounts of HP1 (Stenberg et al. 2009). On chromosome 4, high levels of HP1 are observed in transcribed (and thus active) regions; so why are transgenes silenced? Maybe the HP1 protein will spread into these transgenes, but since these sequences do not have a chromosome 4 origin, POF will not target them. If so, the silencing properties of HP1 would dominate. This possibility has not been tested as yet.

The molecular mechanism of POF-mediated stimulation of gene transcription remains elusive. However, since POF has an RNA-binding domain, it is likely that POF is recruited through nascent RNA from actively transcribing chromosome 4 genes. Whether the stimulation of expression occurs through the promotion of transcription, more efficient splicing, or enhanced export, remains to be determined. The prevailing model for the repressive nature of HP1 is that it targets methylated H3K9, forms a dimer, and links two adjacent nucleosomes. However, it should be stressed that HP1 is not solely linked to gene repression. It is also known to be required for transcription of some genes located within heterochromatic regions (Hearn et al. 1991; Clegg et al. 1998; Lu et al. 2000), it has been shown to be associated with active genes in euchromatic regions, and gene expression analyses suggest that in some cases, HP1 stimulates transcription (Piacentini et al. 2003; Cryderman et al. 2005). It was also recently shown that HP1 associates with transcripts of hundreds of euchromatic Drosophila genes and that HP1 positively regulates these genes (Piacentini et al. 2009). A suggested model proposes that HP1's main function is nucleic acid compaction, which can act on both DNA and RNA. Compaction of DNA through binding to methylated H3K9 will cause repression, while compaction of RNA via interaction with hnRNP proteins may stabilize the messenger and support gene expression (Piacentini et al. 2009).

Why is the binding of POF and HP1 interdependent? So far, no evidence of physical association between POF and HP1 has been reported, but POF and Setdb1 have been shown to interact in vitro (Tzeng et al. 2007). Thus, POF may provide an adaptor system linking histone marks via HP1 and Setdb1 to pre-mRNA, in similar fashion to MRG15 and PTB (Luco et al. 2010; Luco et al. 2011).

POF in testes

POF is expressed at low levels in most, if not all cells, in both males and females. However, it is expressed at very high levels in testes. A number of appealing hypotheses could explain this strong germline expression. First, the X chromosome is also dosage-compensated in the male germline, but no mechanism for this compensation has been yet been characterized (Bachiller and Sánchez 1986; Rastelli and Kuroda 1998; Gupta et al. 2006). The strong relationship between POF and dosage compensation suggests that POF may have an X chromosome dosage-compensation function in the germline. However, expression microarray analysis has shown that the expression of the X chromosome is not affected in Pof mutant testes (Stenberg et al. 2009). Secondly, POF's activities allow proper expression of genes in the heterochromatic environment of the fourth chromosome. In the male germline, the genes on the Y chromosome should be expressed, and these genes are also located in a heterochromatic environment. However, expression analysis has shown that in the testes, POF mainly influences expression of chromosome 4 genes, and Pof mutants are not male sterile (Stenberg et al. 2009). Third, the fourth chromosome lacks recombination, which is also a hallmark of the Drosophila male germline. POF may therefore act as a suppressor of recombination. However, although this possibility has not been fully excluded, Pof mutants lack recombination both of the fourth chromosome and in the male germline. Thus, it seems that stimulation of expression of genes located on chromosome 4 is the main function of POF in both germline and somatic cells.

Origin of MSL- and POF-mediated dosage compensation mechanisms

An obvious question to address is why there is a chromosome-specific gene regulatory mechanism for the fourth chromosome. Possible clues are that POF is required for haplo-4 survival, and somatic elimination of the fourth chromosome is relatively frequent in D. melanogaster (Mohr 1932; Ashburner et al. 2005), although the fourth chromosome is normally present in diploid form in both males and females. Thus, POF may be important for sufficient expression of chromosome 4 genes in haplo-4 cells. It seems unlikely that the sole selective advantage of the system is aiding the survival of haplo-4 cells or individuals. However, there is a strong relationship between the X and the fourth chromosomes, and POF may indeed originate from a buffering/dosage compensating system.

As the proto-Y gradually degenerated, initial compensation of the monosomic X regions was probably provided by general buffering mechanisms that evolved due to the need to cope with general dosage differences (Fig. 4). These mechanisms would have acted at different levels of expression regulation with varying levels of compensation precision. During the course of Y degeneration, some of these systems would have had to become specific for the X chromosome to avoid the genomic instability associated with over-efficient general buffering. The resulting X specificity would be advantageous to males but would be accompanied with risks of overexpression in females. Participants in the X chromosome-specific systems that evolved include MSL, and most likely POF, since POF is observed on the male X chromosome in some species (Fig. 3). The F element, as part of the X chromosome, is most likely the ancestral state, and after the split of the ancestral D. busckii (Fig. 3), the F element detached from the X.

Fig. 4
figure 4

Putative origins of the X chromosome, the F element, and the compensatory systems POF (red) and MSL-C (green). MDA indicates the male determining allele. Upon degeneration of the proto-Y chromosome, the corresponding regions of the proto-X chromosome will require buffering of haploid regions, indicated by brown boxes. Parts of these general buffering systems later evolved into dosage compensation systems (POF and MSL-C). The interrupted green on D. busckii male X chromosome indicates that this chromosome is enriched in H4K16ac, but MSL-C has not been detected on it. The F element is indicated by an ellipse

In D. busckii, in which the F element has been retained as part of the X chromosome, both POF and MSL compensate for the difference between sexes in copy number of genes on the X chromosome. In contrast, in other species (e.g., D. melanogaster, in which the F element is a unique autosome), the two systems appear to have diverged and now target different chromosomes and meet distinct compensatory needs. However, an intrinsic propensity of the MSL-C to target the F element is still observable in roX1 roX2 mutants (Meller and Rattner 2002; Deng and Meller 2006). The retention of POF on the F element allows correct expression, despite the heterochromatic nature of this chromosome. This may in turn have caused a functional constraint of the genes located on the fourth chromosome, which may explain the conservation of the F element as a unique dot chromosome in many Drosophila species.

Shaping the genome

Compensatory and buffering systems provide molecular mechanisms that can counter problems associated with changes in gene copy number that emerge during the course of evolution and allow potentially beneficial variations in genome structure. Clearly, CNV is an important evolutionary driving force, and a delicate balance is needed between compensating for variations in gene doses (thus permitting variability) and selection against excessively destabilizing variation. Equally clearly, general buffering systems can be modified in ways that meet compensatory requirements and may provide templates for the evolution of a wide range of specific regulatory mechanisms. Considering the abundance of functional constraints on gene organization, clustering and order, and the conservation of many syntenic regions, it seems likely that several autosome-specific targeting systems in other organisms remain to be discovered.