Elsevier

Gene

Volume 300, Issues 1–2, 30 October 2002, Pages 155-160
Gene

Expression patterns and gene distribution in the human genome

https://doi.org/10.1016/S0378-1119(02)01048-XGet rights and content

Abstract

Genes are non-uniformly distributed in the human genome, reaching the highest concentration in GC-rich isochores. This is one of the fundamental aspects of the human genome organization (Gene 241/259 (2000a,b) 3/31, for a review). In the present paper the gene distribution was analyzed in relationship to the gene expression pattern and levels. In this study evidence is produced showing: (i) that a biased gene distribution towards GC-rich isochores applies to both tissue-specific and housekeeping genes; and (ii) that genes localized in GC-rich isochores have high transcriptional levels. Since gene density and transcriptional levels are correlated with each other and both are correlated with the GC content of the isochores, the biased gene distribution in the human genome presumably is the result of selection at the gene expression levels.

Introduction

The localization of 40 human genes in isochore families first showed that genes were not uniformly distributed in the human genome, being more concentrated in GC-rich isochores (Bernardi et al., 1985). Thereafter, in silico localization of ∼1400 human genes (D'Onofrio et al., 1991) led to the same conclusions (Mouchiroud et al., 1991), further confirmed by larger sets of genes (Zoubak et al., 1996, Saccone et al., 2001).

The biased gene distribution in the human genome raised a question about the correlation between gene distribution and gene expression pattern or, in other words, about the distribution of tissue-specific and widely expressed genes according to the GC level of the isochores.

The first attempt to answer the above question, summarizing independent experimental results on chromatin structure and gene composition, as well as gene distribution, led up to the hypothesis that: ‘the H3 isochore family presumably has the highest level of transcription because of its very high concentration of genes – especially housekeeping genes’. (Bernardi, 1993; and references therein). The high expression levels of the genes localized in H3 were further supported by in silico investigation on the sequence context of the AUG start codon. The results showed that genes located in GC-rich isochores required highly efficient translation (Pesole et al., 1999).

Subsequent analyses on the correlation between gene distribution and gene expression levels showed that the majority of the widely expressed genes were localized mainly in GC-poor isochores, whereas tissue-specific genes were localized in the GC-rich ones (Gonçalves et al., 2000). The authors drew these conclusions by analyzing the base composition and the distribution of genes with or without retropseudogenes, the former being more widely expressed than the latter. However, in the human genome, using a different algorithm, the propensity for retrotransposition was found to be unaffected by the GC content of the genes (Venter et al., 2000).

Studying the origin of CpG islands, tissue-specific genes were confirmed to be mainly localized in GC-rich isochores (64% in H3), whereas widely expressed genes distribution was independent of the isochore context (Ponger et al., 2001). The finding led Galtier et al. (2001) to argue, ‘… selection, if any, must be unrelated to gene expression level or pattern’.

However, in both papers dealing with the correlation between gene distribution and expression patterns (Gonçalves et al., 2000, Ponger et al., 2001), the gene partition was performed according to the criteria defined in Mouchiroud et al. (1991). In the last decade, however, several results based on theoretical and experimental approaches led to an improvement of the gene partition criteria (Saccone et al., 1993, Saccone et al., 1999, Zoubak et al., 1996, Federico et al., 2000).

In the present paper the genome distribution of widely expressed genes was revisited by analyzing a dataset of orthologous genes from human and Xenopus, calf and murids, as well as the human CpG islands database (Larsen et al., 1992; database 4.0, 1996). The results from the two independent datasets led to the conclusion that widely expressed genes: (i) are mainly localized in GC-rich isochores; (ii) are not the majority of the genes in the GC-rich isochores; and (iii) are not GC3 poorer than tissue-specific genes.

Section snippets

Materials and methods

The human CpG-islands database (Larsen et al., 1992; release 4.0, 1996; retrieved from http://bioinformatics.weizmann.ac.il/databases/cpgisle/), contains 1711 entries. After removing those with no information on the expression level (15%), grouping those belonging to a single gene (22%), and taking off redundancy (14%), 882 complete coding sequences (CDS) were recovered, hereafter on referred to as dataset A.

A second CDS dataset was obtained by pooling available sequences of human genes

Results and discussion

The degree of overlap between datasets A and B was checked. The number of shared genes was 109, 28 of them were widely expressed genes. Therefore, widely expressed gene from the two datasets, accounting for the 22.0 and 24.3% of A and B dataset, respectively, can be considered as completely independent sets of genes.

In order to compare the results from the present datasets with those previously reported, the gene distribution of widely expressed genes was analyzed according to the criteria

Conclusions

Re-examination of experimental and theoretical data from publications spanning almost 10 years (Saccone et al., 1993, Saccone et al., 1999, Zoubak et al., 1996, Federico et al., 2000) allowed us to refine the gene partition criteria first used by Mouchiroud et al. (1991). Two independent datasets were analyzed: one was the updated Larsen's database (Larsen et al., 1992; database 4.0, 1996), the other a set of available human genes orthologous to genes from Xenopus, calf and mouse. The analysis

Acknowledgements

Thanks are due to Giorgio Bernardi, for critical reading and discussions, to Oliver Clay, for the Larsen's database, Stephan Cruveiller, for sets of vertebrates orthologous genes, and Luigi De Martino, for retrieving expression information from TIGER database.

References (23)

  • H Caron et al.

    The human transcriptome map: clustering of highly expressed genes in chromosomal domains

    Science

    (2001)
  • Cited by (28)

    • Metabolic rate and genomic GC. What we can learn from teleost fish

      2010, Marine Genomics
      Citation Excerpt :

      Both DNA features play a crucial role in the gene expression, and indeed, in the human genome it was found that the GC-rich chromosomal regions displayed a much more spread-out conformation compared to the GC-poor ones (Saccone et al., 2002). In other words, as the gene expression level increases the GC levels of both coding and non-coding regions increase, specific DNA properties are affected, all converging towards a correlation with an increment of the transcriptional activity of the genes (Arhondakis et al., 2004; D'Onofrio et al., 2007), not linked to tissue-specific or housekeeping gene classification (D'Onofrio, 2002). It is worth bringing to mind that vertebrate genomes are characterized by two evolution modes: the transition and the shifting mode (Bernardi and Bernardi, 1990).

    View all citing articles on Scopus
    View full text