Abstract
Papaya (Carica papaya L.) is an important fruit crop cultivated in tropical and subtropical regions worldwide. A first draft of its genome sequence has been recently released. Together with Arabidopsis, rice, poplar, grapevine and other genomes in the pipeline, it represents a good opportunity to gain insight into the organization of plant genomes. Here we report a detailed analysis of repetitive elements in the papaya genome, including transposable elements (TEs), tandemly-arrayed sequences, and high copy number genes. These repetitive sequences account for ∼56% of the papaya genome with TEs being the most abundant at 52%, tandem repeats at 1.3% and high copy number genes at 3%. Most common types of TEs are represented in the papaya genome with retrotransposons being the dominant class, accounting for 40% of the genome. The most prevalent retrotransposons are Ty3-gypsy (27.8%) and Ty1-copia (5.5%). Among the tandem repeats, microsatellites are the most abundant in number, but represent only 0.19% of the genome. Minisatellites and satellites are less abundant, but represent 0.68% and 0.43% of the genome, respectively, due to greater repeat length. Despite an overall smaller gene repertoire in papaya than many other angiosperms, a significant fraction of genes (>2%) are present in large gene families with copy number greater than 20. This repeat database clarified a major part of the papaya genome organization and partly explained the lower gene repertoire in papaya than in Arabidopsis.
Similar content being viewed by others
References
Akkaya MS, Shoemaker RC, Specht JE, Bhagwat AA, Cregan PB (1995) Integration of simple sequence repeat DNA markers into a soybean linkage map. Crop Sci 35:1439–1445
Arabidopsis Genome Initiative (2001) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 doi:10.1038/35048692
Bennetzen JL (2002) Mechanisms and rates of genome expansion and contraction in flowering plants. Genetica 115:29–36 doi:10.1023/A:1016015913350
Bennetzen JL, Ma J, Devos KM (2005) Mechanisms of recent genome size variation in flowering plants. Ann Bot (Lond) 95:127–132 doi:10.1093/aob/mci008
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580 doi:10.1093/nar/27.2.573
Camacho JP, Sharbel TF, Beukeboom LW (2000) B-chromosome evolution. Philos Trans R Soc Lond B Biol Sci 355:163–178 doi:10.1098/rstb.2000.0556
Cheng XD, Ling HQ (2006) Non-LTR retrotransposons: LINEs and SINEs in plant genome. Yichuan 28:731–736
Csink AK, Henikoff S (1998) Something from nothing: the evolution and utility of satellite repeats. Trends Genet 14:200–204 doi:10.1016/S0168-9525(98)01444-9
de la Herrán R, Cuñado N, Navajas-Pérez N, Santos JL, Ruiz Rejón C, Garrido-Ramos MA et al (2005) The controversial telomeres of lily plants. Cytogenet Genome Res 109:144–147 doi:10.1159/000082393
de Ridder C, Kourie DG, Watson BW (2006) FireμSat: meeting the challenge of detecting microsatellites in DNA. Proc SAICSIT 2006:247–256 doi:10.1145/1216262.1216289
Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21:i152–i158 doi:10.1093/bioinformatics/bti1003
Elder JR, Turner BJ (1995) Concerted evolution of repetitive DNA sequences in eukaryotes. Q Rev Biol 70:297–320 doi:10.1086/419073
Fitzgerald DJ, Dryden GL, Bronson EC, Williams JS, Anderson JN (1994) Conserved patterns of bending in satellite and nucleosome positioning DNA. J Biol Chem 269:21303–21314
Flavell RB, Bennett MD, Smith JB, Smith DB (1974) Genome size and proportion of repeated nucleotide-sequence DNA in plants. Biochem Genet 12:257–269 doi:10.1007/BF00485947
Hatch FT, Mazrimas JA (1974) Fractionation and characterisation of satellite DNAs of the kangaroo rat (Dipodomys ordii). Nucleic Acids Res 1:559–575 doi:10.1093/nar/1.4.559
Henikoff S, Ahmad K, Malik HS (2001) The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293:1098–1102 doi:10.1126/science.1062939
International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467 doi:10.1159/000084979
Kubis SE, Schmidt T, Heslop-Harrison JS (1998) Repetitive DNA elements as a major component of plant genomes. Ann Bot (Lond) 82:45–55 doi:10.1006/anbo.1998.0779
Lagercrantz U, Ellegren H, Andersson L (1993) The abundance of various polymorphic microsatellite motifs differs between plants and vertebrates. Nucleic Acids Res 21:1111–1115 doi:10.1093/nar/21.5.1111
Lai CW, Yu Q, Hou S, Skelton RL, Jones MR, Lewis KL et al (2006) Analysis of papaya BAC end sequences reveals first insights into the organization of a fruit tree genome. Mol Genet Genomics 276(1):1–12 doi:10.1007/s00438-006-0122-z
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H et al (2007) ClustalW and ClustalX version 2. Bioinformatics 23(21):2947–2948 doi:10.1093/bioinformatics/btm404
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659 doi:10.1093/bioinformatics/btl158
Loridon K, Cournoyer B, Goubely C, Depeiges A, Picard G (1998) Length polymorphism and allele structure of trinucleotide microsatellites in natural accessions of Arabidopsis thaliana. Theor Appl Genet 97:591–604 doi:10.1007/s001220050935
Macas J, Mészáros T, Nouzová M (2002) PlantSat: a specialized database for plant satellite repeats. Bioinformatics 18:28–35 doi:10.1093/bioinformatics/18.1.28
McCombie WR et al (2000) The complete sequence of a heterochromatic island from a higher eukaryote. Cell 100:377–386 doi:10.1016/S0092-8674(00)80673-X
Meagher TR, Vassiliadis C (2005) Phenotypic impacts of repetitive DNA in flowering plants. New Phytol 168:71–80 doi:10.1111/j.1469-8137.2005.01527.x
Messing J, Bharti AK, Karlowski WM, Gundlach H, Kim HR, Yu Y et al (2004) Sequence composition and genome organization of maize. Proc Natl Acad Sci U S A 101:14349–14354 doi:10.1073/pnas.0406163101
Miklos GL (1985) Localited highly repetitive DNA sequences in vertebrate and invertebrate genomes. In: McIntryre JR (ed) Molecular evolutionary genetics. Plenum, New York, pp 231–241
Ming R et al (2008) The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452:991–996 doi:10.1038/nature06856
Murray MG, Peters DL, Thompson WF (1981) Ancient repeated sequences in the pea and mung bean genomes and implications for genome evolution. J Mol Evol 17:31–42 doi:10.1007/BF01792422
Navajas-Pérez R, Rubio-Escudero C, Aznarte JL, Ruiz Rejón M, Garrido-Ramos MA (2007) SatDNA Analyzer: a computing tool for satellite-DNA evolutionary analysis. Bioinformatics 23:767–768 doi:10.1093/bioinformatics/btm005
Navajas-Pérez R, Schwarzacher T, de la Herrán R, Ruiz Rejón C, Ruiz Rejón M, Garrido-Ramos MA (2006) The origin and evolution of the variability in a Y-specific satellite-DNA of Rumex acetosa and its relatives. Gene 368:61–71 doi:10.1016/j.gene.2005.10.013
Nunome T, Suwabe K, Ohyama A, Fukuoka H (2003) Characterization of trinucleotide microsatellites in eggplant. Breed Sci 53:77–83 doi:10.1270/jsbbs.53.77
Ohno S (1972) So much “junk” DNA in our genome. Brookhaven Symp Biol 23:366–370
Orgel LE, Crick FH (1980) Selfish DNA: the ultimate parasite. Nature 284:604–607 doi:10.1038/284604a0
Pelissier T, Tutois S, Tourmente S, Deragon JM, Picard G (1996) DNA regions flanking the major Arabidopsis thaliana are principally enriched in Athila retroelement sequences. Genetica 97:141–151 doi:10.1007/BF00054621
Petitpierre E, Juan C, Pons J, Plohl M, Ugarković D (1995) Satellite DNA and constitutive heterochromatin in tenebrionid beetles. In: Brandham PE, Bennett MD (eds) Kew chromosome conference IV. Royal Botanic Gardens, London, pp 351–362
Plohl M, Mestrovic N, Bruvo B, Ugarkovic D (1998) Similarity of structural features and evolution of satellite DNAs from Palorus subdepressus (Coleoptera) and related species. J Mol Evol 46:234–239 doi:10.1007/PL00006298
Poole RL (2007) The TAIR Database. Methods Mol Biol 406:179–212 doi:10.1007/978-1-59745-535-0_8
Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21:351–358 doi:10.1093/bioinformatics/bti1018
Rajagopal J, Das S, Khurana DK, Srivastava PS, Lakshmikumaran M (1999) Molecular characterization and distribution of a 145-bp tandem repeat family in the genus Populus. Genome 42:909–918 doi:10.1139/gen-42-5-909
Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574 doi:10.1093/bioinformatics/btg180
Schmidt AL, Anderson LM (2006) Repetitive DNA elements as mediators of genomic change in response to environmental cues. Biol Rev Camb Philos Soc 81:531–543 doi:10.1017/S146479310600710X
Smulders MJM, Bredemeijer G, Rus-Kortekaas W, Arens P, Vosman B (1997) Use of short microsatellites from database sequences to generate polymorphisms among Lycopersicon esculentum cultivars and accessions of other Lycopersicon species. Theor Appl Genet 97:264–272 doi:10.1007/s001220050409
Song QJ, Fickus EW, Cregan PB (2002) Characterization of trinucleotide SSR motifs in wheat. Theor Appl Genet 104:286–293 doi:10.1007/s001220100698
Thomas CA Jr (1971) The genetic organization of chromosomes. Annu Rev Genet 5:237–256 doi:10.1146/annurev.ge.05.120171.001321
Thornburg BG, Gotea V, Makałowski W (2006) Transposable elements as a significant source of transcription regulating signals. Gene 365:104–110 doi:10.1016/j.gene.2005.09.036
Ugarković D, Plohl M (2002) Variation in satellite DNA profiles, causes and effects. EMBO J 21:5955–5959 doi:10.1093/emboj/cdf612
Volfovsky N, Haas BJ, Salzberg SL (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2(8):research0027.1–research0027.11
Wicker T, Matthews DE, Keller B (2002) TREP: a database for Triticeae repetitive elements. Trends Plant Sci 7:561–562 doi:10.1016/S1360-1385(02)02372-5
Acknowledgements
We thank Ning Jiang for suggestions and discussion on construction and characterization of the papaya repeat database. We appreciate financial support from the U.S. National Institutes of Health (R01-GM083873 to S.S.), the U.S. National Science Foundation (DBI-0553417 to R.M. and A.H.P.), the U. Hawaii and U.S. Department of Defense (W81XWH0520013 to M.A).
Author information
Authors and Affiliations
Corresponding author
Additional information
Nagarajan and Navajas-Pérez contributed equally to this work.
Rights and permissions
About this article
Cite this article
Nagarajan, N., Navajas-Pérez, R., Pop, M. et al. Genome-Wide Analysis of Repetitive Elements in Papaya. Tropical Plant Biol. 1, 191–201 (2008). https://doi.org/10.1007/s12042-008-9015-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12042-008-9015-0