Abstract
We compared the exon/intron organization of vertebrate genes belonging to different isochore classes, as predicted by their GC content at third codon position. Two main features have emerged from the analysis of sequences published in GenBank: (1) genes coding for long proteins (i.e., ≥500 aa) are almost two times more frequent in GC-poor than in GC-rich isochores; (2) intervening sequences (=sum of introns) are on average three times longer in GC-poor than in GC-rich isochores. These patterns are observed among human, mouse, rat, cow, and even chicken genes and are therefore likely to be common to all warm-blooded vertebrates. Analysis of Xenopus sequences suggests that the same patterns exist in cold-blooded vertebrates. It could be argued that such results do not reflect the reality because sequence databases are not representative of entire genomes. However, analysis of biases in GenBank revealed that the observed discrepancies between GC-rich and GC-poor isochores are not artifactual, and are probably largely underestimated. We investigated the distribution of microsatellites and interspersed repeats in introns of human and mouse genes from different isochores. This analysis confirmed previous studies showing that Ll repeats are almost absent from GC-rich isochores. Microsatellites and SINES (Alu, B1, B2) are found at roughly equal frequencies in introns from all isochore classes. Globally, the presence of repeated sequences does not account for the increased intron length in GC-poor isochores. The relationships between gene structure and global genome organization and evolution are discussed.
Similar content being viewed by others
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Beckmann IS, Weber JL (1992) Survey of human and rat microsatellites. Genomics 12:627–631
Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953–958
Bernardi G (1989) The isochore organisation of the human genome. Annu Rev Genet 23:637–661
Bernardi G, Bernardi G (1990) Compositional patterns in the nuclear genome of cold-blooded vertebrates. J Mol Evol 31:265–281
Bernardi G (1993) The isochore organization of the human genome and its evolutionary history—a review. Gene 135:57–66
Burks C, Cassidy M, Cinkowsky MJ, Cumella KE, Gilna P, Hayden JED, Keen GM, Kelley TA, Kelly M, Kristofferson D, Ryals J (1991) GenBank. Nucleic Acids Res 19:2221–2225
Cavalier-Smith T (1985) Eukaryote gene numbers, non-coding DNA and genome size. In: Cavalier-Smith T (ed) The evolution of genome size. Wiley, London, pp 69–103
Claverie J-M, States DJ (1993) Information enhancement methods for large scale sequence analysis. Computers Chem 17:191–201
Dujon B, et al. (1994) Complete DNA sequence of yeast chromosome XI. Nature 369:371–378
Duret L, Mouchiroud D, Gouy M (1994) HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res 22:2360–2365
Eyre-Walker A (1993) Recombination and mammalian genome evolution. Proc Roy Soc Lond [Biol] 252:237–243
Fields C, Adams MD, White O, Venter JC (1994) How many genes in the human genome? Nature Genet 7:345–346
Gouy M, Gautier C, Attimonelli M, Lanave C, Di Paola G (1985) ACNUC—a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage. Comput Appl Biosci 1:167–172
Grant D, Shuali Y, Li W-H (1989) Deletions in processed pseudogenes accumulate faster in rodents than in humans. J Mol Evol 28:279–285
Hawkins JD (1988) A survey of intron and exon lengths. Nucleic Acids Res 16:9893–9908
Hwu HR, Roberts JW, Davidson EH, Britten RJ (1986) Insertion and/or deletion of many repeated DNA sequences in human and higher ape evolution. Proc Natl Acad Sci USA 83:3875–3879
Jurka J, Milosavljevik A (1991) Reconstruction and analysis of human Alu genes. J Mol Evol 32:105–121
Karlin S, Blaisdell BE, Sapolsky RJ, Cardon L, Burge C (1993) Assessments of DNA inhomogeneities in yeast chromosome III. Nucleic Acids Res 21:703–711
Kramerov DA, Grigoryan AA, Ryskov AP, Georgiev GP (1979) Long double-stranded sequences (dsRNA-B) of nuclear pre-mRNA consist in a few highly abondant classes of sequences: evidence from DNA cloning experiments. Nucleic Acids Res 6:697–713
Lehmann EL (1975) Nonparametrics: statistical methods based on ranks, Holden-Day Inc., San Francisco
Lehrman MA, Goldstein JL, Russel DW, Brown MS (1987) Duplication of seven exons in LDL receptor gene caused by Alu-Alu recombination in a subject with familial hypercholesterolemia. Cell 48:827–835
Martin-Gallardo A, McCombie WR, Gocayne JD, Fitzgerald MG, Wallace S, Lee BMB, Lamerdin J, Trapp S, Kelley JM, Liu L-I, Dubnick M, Johnston-Dow LA, Kerlavage AR, De Jong P, Carrano A, Fields C, Venter JC (1992) Automated DNA sequencing and analysis of 106 kilobases from human chromosome 19g13.3. Nature Genet 1:34–39
Mouchiroud D, D'Onofrio G, Aissani B, Macaya G, Gautier C, Bernardi G (1991) The distribution of genes in the human genome. Gene 100:181–187
Mouchiroud D, Bernardi G (1993) Compositional properties of coding sequences and mammalian phylogeny. J Mol Evol 37:109–116
Quentin Y (1988) The Alu family developed through successive waves of fixation closely connected with primate lineage history. J Mol Evol 27:194–202
Rinehart FP, Ritch TG, Deininger PL, Schmid CW (1981) Renaturation rate studies of a single family of interspersed repeated sequences in human deoxyribonucleic acid. Biochemistry 20:3003–3010
Saccone S, Desario A, Wiegant J, Raap AK, Dellavalle G, Bernardi G (1993) Correlations between isochores and chromosomal bands in the human genome. Proc Natl Acad Sci USA 90:11929–11933
Sharp PM, Lloyd AT (1993) Regional base composition variation along yeast chromosome-III—evolution of chromosome primary structure. Nucleic Acids Res 21:179–183
Smith MW (1988) Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol 27:45–55
Soriano P, Meunier-Rotival M, Bernardi G (1983) The distribution of interspersed repeats is nonuniform and conserved in the mouse and human genomes. Proc Natl Acad Sci USA 80:1816–1820
Tiersch TR, Wachtel SS (1991) On the evolution of the genome size of birds. J Hered 82:363–368
Zerial M, Salinas J, Filipski J, Bernardi G (1986) Gene distribution and nucleotide sequence organization in the human genome. Eur J Biochem 160:479–485
Zoubak S, Rynditch A, Bernardi G (1992) Compositional bimodality and evolution of retroviral genomes. Gene 119:207–213
Zoubak S, Richardson JH, Rynditch A, Höllsberg P, Hafler DA, Boeri E, Lever AML, Bernardi G (1994) Regional specificity of HTLV-I proviral integration in the human genome. Gene 143:155–163
Zuckerkandl E (1981) A general function of noncoding polynucleotide sequences. Mol Biol Rep 7:149–158
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Duret, L., Mouchiroud, D. & Gautier, C. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol 40, 308–317 (1995). https://doi.org/10.1007/BF00163235
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF00163235