Skip to main content
Log in

Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

We compared the exon/intron organization of vertebrate genes belonging to different isochore classes, as predicted by their GC content at third codon position. Two main features have emerged from the analysis of sequences published in GenBank: (1) genes coding for long proteins (i.e., ≥500 aa) are almost two times more frequent in GC-poor than in GC-rich isochores; (2) intervening sequences (=sum of introns) are on average three times longer in GC-poor than in GC-rich isochores. These patterns are observed among human, mouse, rat, cow, and even chicken genes and are therefore likely to be common to all warm-blooded vertebrates. Analysis of Xenopus sequences suggests that the same patterns exist in cold-blooded vertebrates. It could be argued that such results do not reflect the reality because sequence databases are not representative of entire genomes. However, analysis of biases in GenBank revealed that the observed discrepancies between GC-rich and GC-poor isochores are not artifactual, and are probably largely underestimated. We investigated the distribution of microsatellites and interspersed repeats in introns of human and mouse genes from different isochores. This analysis confirmed previous studies showing that Ll repeats are almost absent from GC-rich isochores. Microsatellites and SINES (Alu, B1, B2) are found at roughly equal frequencies in introns from all isochore classes. Globally, the presence of repeated sequences does not account for the increased intron length in GC-poor isochores. The relationships between gene structure and global genome organization and evolution are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Google Scholar 

  • Beckmann IS, Weber JL (1992) Survey of human and rat microsatellites. Genomics 12:627–631

    Google Scholar 

  • Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953–958

    Google Scholar 

  • Bernardi G (1989) The isochore organisation of the human genome. Annu Rev Genet 23:637–661

    Google Scholar 

  • Bernardi G, Bernardi G (1990) Compositional patterns in the nuclear genome of cold-blooded vertebrates. J Mol Evol 31:265–281

    Google Scholar 

  • Bernardi G (1993) The isochore organization of the human genome and its evolutionary history—a review. Gene 135:57–66

    Google Scholar 

  • Burks C, Cassidy M, Cinkowsky MJ, Cumella KE, Gilna P, Hayden JED, Keen GM, Kelley TA, Kelly M, Kristofferson D, Ryals J (1991) GenBank. Nucleic Acids Res 19:2221–2225

    Google Scholar 

  • Cavalier-Smith T (1985) Eukaryote gene numbers, non-coding DNA and genome size. In: Cavalier-Smith T (ed) The evolution of genome size. Wiley, London, pp 69–103

    Google Scholar 

  • Claverie J-M, States DJ (1993) Information enhancement methods for large scale sequence analysis. Computers Chem 17:191–201

    Google Scholar 

  • Dujon B, et al. (1994) Complete DNA sequence of yeast chromosome XI. Nature 369:371–378

    Google Scholar 

  • Duret L, Mouchiroud D, Gouy M (1994) HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res 22:2360–2365

    Google Scholar 

  • Eyre-Walker A (1993) Recombination and mammalian genome evolution. Proc Roy Soc Lond [Biol] 252:237–243

    Google Scholar 

  • Fields C, Adams MD, White O, Venter JC (1994) How many genes in the human genome? Nature Genet 7:345–346

    Google Scholar 

  • Gouy M, Gautier C, Attimonelli M, Lanave C, Di Paola G (1985) ACNUC—a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage. Comput Appl Biosci 1:167–172

    Google Scholar 

  • Grant D, Shuali Y, Li W-H (1989) Deletions in processed pseudogenes accumulate faster in rodents than in humans. J Mol Evol 28:279–285

    Google Scholar 

  • Hawkins JD (1988) A survey of intron and exon lengths. Nucleic Acids Res 16:9893–9908

    Google Scholar 

  • Hwu HR, Roberts JW, Davidson EH, Britten RJ (1986) Insertion and/or deletion of many repeated DNA sequences in human and higher ape evolution. Proc Natl Acad Sci USA 83:3875–3879

    Google Scholar 

  • Jurka J, Milosavljevik A (1991) Reconstruction and analysis of human Alu genes. J Mol Evol 32:105–121

    Google Scholar 

  • Karlin S, Blaisdell BE, Sapolsky RJ, Cardon L, Burge C (1993) Assessments of DNA inhomogeneities in yeast chromosome III. Nucleic Acids Res 21:703–711

    Google Scholar 

  • Kramerov DA, Grigoryan AA, Ryskov AP, Georgiev GP (1979) Long double-stranded sequences (dsRNA-B) of nuclear pre-mRNA consist in a few highly abondant classes of sequences: evidence from DNA cloning experiments. Nucleic Acids Res 6:697–713

    Google Scholar 

  • Lehmann EL (1975) Nonparametrics: statistical methods based on ranks, Holden-Day Inc., San Francisco

    Google Scholar 

  • Lehrman MA, Goldstein JL, Russel DW, Brown MS (1987) Duplication of seven exons in LDL receptor gene caused by Alu-Alu recombination in a subject with familial hypercholesterolemia. Cell 48:827–835

    Google Scholar 

  • Martin-Gallardo A, McCombie WR, Gocayne JD, Fitzgerald MG, Wallace S, Lee BMB, Lamerdin J, Trapp S, Kelley JM, Liu L-I, Dubnick M, Johnston-Dow LA, Kerlavage AR, De Jong P, Carrano A, Fields C, Venter JC (1992) Automated DNA sequencing and analysis of 106 kilobases from human chromosome 19g13.3. Nature Genet 1:34–39

    Google Scholar 

  • Mouchiroud D, D'Onofrio G, Aissani B, Macaya G, Gautier C, Bernardi G (1991) The distribution of genes in the human genome. Gene 100:181–187

    Google Scholar 

  • Mouchiroud D, Bernardi G (1993) Compositional properties of coding sequences and mammalian phylogeny. J Mol Evol 37:109–116

    Google Scholar 

  • Quentin Y (1988) The Alu family developed through successive waves of fixation closely connected with primate lineage history. J Mol Evol 27:194–202

    Google Scholar 

  • Rinehart FP, Ritch TG, Deininger PL, Schmid CW (1981) Renaturation rate studies of a single family of interspersed repeated sequences in human deoxyribonucleic acid. Biochemistry 20:3003–3010

    Google Scholar 

  • Saccone S, Desario A, Wiegant J, Raap AK, Dellavalle G, Bernardi G (1993) Correlations between isochores and chromosomal bands in the human genome. Proc Natl Acad Sci USA 90:11929–11933

    Google Scholar 

  • Sharp PM, Lloyd AT (1993) Regional base composition variation along yeast chromosome-III—evolution of chromosome primary structure. Nucleic Acids Res 21:179–183

    Google Scholar 

  • Smith MW (1988) Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol 27:45–55

    Google Scholar 

  • Soriano P, Meunier-Rotival M, Bernardi G (1983) The distribution of interspersed repeats is nonuniform and conserved in the mouse and human genomes. Proc Natl Acad Sci USA 80:1816–1820

    Google Scholar 

  • Tiersch TR, Wachtel SS (1991) On the evolution of the genome size of birds. J Hered 82:363–368

    Google Scholar 

  • Zerial M, Salinas J, Filipski J, Bernardi G (1986) Gene distribution and nucleotide sequence organization in the human genome. Eur J Biochem 160:479–485

    Google Scholar 

  • Zoubak S, Rynditch A, Bernardi G (1992) Compositional bimodality and evolution of retroviral genomes. Gene 119:207–213

    Google Scholar 

  • Zoubak S, Richardson JH, Rynditch A, Höllsberg P, Hafler DA, Boeri E, Lever AML, Bernardi G (1994) Regional specificity of HTLV-I proviral integration in the human genome. Gene 143:155–163

    Google Scholar 

  • Zuckerkandl E (1981) A general function of noncoding polynucleotide sequences. Mol Biol Rep 7:149–158

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duret, L., Mouchiroud, D. & Gautier, C. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol 40, 308–317 (1995). https://doi.org/10.1007/BF00163235

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00163235

Key words

Navigation