Elsevier

Gene

Volume 333, 26 May 2004, Pages 135-141
Gene

Representing GC variation along eukaryotic chromosomes

https://doi.org/10.1016/j.gene.2004.02.041Get rights and content

Abstract

Genome sequencing now permits direct visual representation, at any scale, of GC heterogeneity along the chromosomes of several higher eukaryotes. Plots can be easily obtained from the chromosomal sequences, yet sequence releases of mammalian or plant chromosomes still tend to use small scales or window sizes that obscure important large-scale compositional features. To faithfully reveal, at one glance, the compositional variation at a given scale, we have devised a simple scheme that combines line plots with color-coded shading of the regions underneath the plots. The scheme can be applied to different eukaryotic genomes to facilitate their comparison, as illustrated here for a sample of chromosomes chosen from seven selected species. As a complement to a previously published compact view of isochores in the human genome sequence [FEBS Lett. 511 (2002a) 165], we include here an analogous map for the recently sequenced mouse genome, and discuss the contribution of repetitive DNA to the GC variation along the plots. Supplementary information, including a database of color-coded GC profiles for all recently sequenced eukaryotes and the program draw_chromosomes_gc.pl used to obtain them, are available at http://genomat.img.cas.cz.

Introduction

Entire genome sequences permit for the first time the graphic portrayal of compositional heterogeneity, at the DNA sequence level, at any desired scale. A portrait or profile of GC level allows one to monitor variation along chromosomes. It can be a powerful tool when comparing different regions of the same genome, when comparing different genomes, or when comparing different draft assemblies of individual chromosomes. Statistical descriptions of the GC variation along chromosomes can be derived directly from CsCl gradient ultracentrifugation of DNA. This principle was used to demonstrate the presence of isochores in mammals well before sequences were available (Macaya et al., 1976). Furthermore, in the absence of sequence information, the large-scale variation of GC along individual chromosomes of vertebrates can be plotted by in situ hybridization of GC fractions taken after preparative ultracentrifugation of the DNA Saccone et al., 1993, Saccone et al., 1996, Saccone et al., 2002. The availability of entire chromosome sequences now allows such variation to be displayed almost effortlessly at high resolutions, via fixed-length moving-window plots.

Published plots, accompanying releases of mammalian or plant chromosome sequences or follow-up analyses, still often use excessively small scales and/or window sizes, e.g. in attempts to economize space in journals or online supplements. Vertical scales representing GC level in line plots are sometimes only a few millimeters high with no guidelines, and color-coded GC/isochore tracks are typically considered a replacement of line plots, rather than their complement. Inappropriately small scales or window sizes have, in one or two cases, led authors to question the significance of clear intrachrosomal (inter-isochore) contrasts in relative GC or CpG frequencies International Human Genome Sequencing Consortium (IHGSC), 2001, Gentles and Karlin, 2001 that become obvious (and can be confirmed statistically) as soon as one changes scale. We therefore felt that this technical point needed to be explicitly addressed. With future comparisons in mind, we devised a simple scheme for portraying large-scale variation that can be applied to most, if not all, eukaryotic chromosome sequences. We also used this approach, together with available annotation databases, to graphically summarize the genome-wide compositional variation of repetitive and nonrepetitive DNA in human and mouse.

Section snippets

Materials and methods

The program draw_chromosome_gc.pl was written in Perl. It requires installation of Perl version 5 (freely available at http://www.perl.org) and the Perl GD module (freely available from http://www.cpan.org). The current version of the program works with GD module versions 1.19 and higher and produces files in the png (Portable Network Graphic) format. The program is freely distributed as source code under General Public License (GLP) and can be downloaded from http://genomat.img.cas.cz.

GC mosaicism, its visual display, and the uses of compositional maps

Abrupt changes in GC level represent landmarks that naturally partition or calibrate chromosome sequences. They also correlate with key biological properties in many eukaryotes, such as changes in gene density Mouchiroud et al., 1991, Zoubak et al., 1996, International Human Genome Sequencing Consortium (IHGSC), 2001, Venter et al., 2001, switches in replication timing Tenzen et al., 1997, Stephens et al., 1999, The MHC sequencing consortium (MHCSC), 1999, and differences between the locations

Conclusion

In summary, we have presented a scheme for mapping and presenting GC levels and their large-scale variation that should be applicable to most eukaryotic chromosome sequences, and in some cases the choices it implements appear nearly optimal. Applications range from the recognition of gene-dense regions (which are GC-rich, likely to replicate early, and preferentially extend away from the matrix in interphase) to comparisons between genomes or between draft assemblies of the same genome. The

Acknowledgements

This work was supported by the Center for Integrated Genomics of the Czech Republic.

References (34)

Cited by (30)

  • Human repetitive sequence densities are mostly negatively correlated with R/Y-based nucleosome-positioning motifs and positively correlated with W/S-based motifs

    2013, Genomics
    Citation Excerpt :

    This ratio is too small to account for the drop of WW-8-WW density in low-repeat-density regions. The relationship between [G,C]-contents of unique and repetitive sequence at the 100 kb window level was plotted in Fig. 3 of ref. [26], and besides systematic deviation between the two, the [G,C]-contents in the two types of sequences are generally matched. When WW-8-WW density is determined from the repeat-filtered sequence, its correlation with the repetitive sequence density remains positive (Table 1(C), Fig. 2(C)).

View all citing articles on Scopus
View full text