Abstract
Papaya (Carica papaya L.) is a major tree fruit crop of tropical and subtropical regions with an estimated genome size of 372 Mbp. We present the analysis of 4.7% of the papaya genome based on BAC end sequences (BESs) representing 17 million high-quality bases. Microsatellites discovered in 5,452 BESs and flanking primer sequences are available to papaya breeding programs at http://www.genomics.hawaii.edu/papaya/BES. Sixteen percent of BESs contain plant repeat elements, the vast majority (83.3%) of which are class I retrotransposons. Several novel papaya-specific repeats were identified. Approximately 19.1% of the BESs have homology to Arabidopsis cDNA. Increasing numbers of completely sequenced plant genomes and BES projects enable novel approaches to comparative plant genomics. Paired BESs of Carica, Arabidopsis, Populus, Brassica and Lycopersicon were mapped onto the completed genomes of Arabidopsis and Populus. In general the level of microsynteny was highest between closely related organisms. However, papaya revealed a higher degree of apparent synteny with the more distantly related poplar than with the more closely related Arabidopsis. This, as well as significant colinearity observed between peach and poplar genome sequences, support recent observations of frequent genome rearrangements in the Arabidopsis lineage and suggest that the poplar genome sequence may be more useful for elucidating the papaya and other rosid genomes. These insights will play a critical role in selecting species and sequencing strategies that will optimally represent crop genomes in sequence databases.
Similar content being viewed by others
Abbreviations
- BAC:
-
Bacterial artificial chromosome
- BES:
-
BAC end sequence
- kb:
-
Kilobase
- Mbp:
-
Megabase pairs
- MYA:
-
Million years ago
- nt:
-
Nucleotide
- SSR:
-
Simple sequence repeat
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9(3):211–215
Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unraveling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422:433–438
Chen M, Presting G, Barbazuk WB, Goicoechea JL, Blackmon B, Fang G, Kim H, Frisch D, Yu Y, Sun S, Higingbottom S, Phimphilai J, Phimphilai D, Thurmond S, Gaudette B, Li P, Liu J, Hatfield J, Main D, Farrar K, Henderson C, Barnett L, Costa R, Williams B, Walser S, Atkins M, Hall C, Budiman MA, Tomkins JP, Luo M, Bancroft I, Salse J, Regad F, Mohapatra T, Singh NK, Tyagi AK, Soderlund C, Dean RA, Wing RA (2002) An integrated physical and genetic map of the rice genome. Plant Cell 14:1–10
Cheng Z, Presting G, Buell CR, Wing RA, Jiang J (2001) High-resolution pachytene chromosome mapping of bacterial artificial chromosomes anchored by genetic markers reveals the centromere location and the distribution of genetic recombination along chromosome 10 of rice. Genetics 157:1749–1757
Choi S, Creelman RA, Mullet JE, Wing RA (1995) Construction and characterization of a bacterial artificial chromosome library of Arabidopsis thaliana. Plant Mol Biol Rep 13:124–129
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Ewing B, Hillier L, Wend MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–185
Georgi LL, Wang Y, Reighard GL, Mao L, Wing RA, Abbott AG (2003) Comparison of peach and Arabidopsis genomic sequences: fragmentary conservation of gene neighborhoods. Genome 46:268–276
Goff SA, Ricke D, Lan T, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange B, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun W, Chen L, Cooper B, Park S, Wood T, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller R, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100
Hong CP, Lee SJ, Park JY, Plaha P, Park YS, Lee YK, Choi JE, Kim KY, Lee JH, Lee J, Jin H, Choi SR, Lim YP (2004) Construction of a BAC library of Korean ginseng and initial analysis of BAC-end sequences. Mol Genet Genomics 271:709–716
Huang S, van der Vossen EAG, Kuang H, Vleeshouwers VGAA, Zhang N, Borm TJA, van Eck HJ, Baker B, Jacobsen E, Visser RGF (2005) Comparative genomics enabled the isolation of the R3a late blight resistance gene in potato. Plant J 42:251–261
Ilic K, SanMiguel PJ, Bennetzen JL (2003) A complex history of rearrangements in an orthologous region of the maize, sorghum, and rice genomes. Proc Natl Acad Sci USA 100:12265–12270
International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800
Judd WS, Campbell CS, Kellogg EA, Stevens PF, Donoghue MJ (2002) Plant systematics: a phylogenetic approach, 2nd edn. Sinauer Associates, Inc. Sunderland
Jung S, Abbott A, Jesudurai C, Tomkins J, Main D (2005) Frequency, type, distribution and annotation of simple sequence repeats in Rosaceae ESTs. Funct Integr Genomics 5:136–143
Katti M, Ranjekar PK, Gupta VS (2001) Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol 18:1161–1167
Kim MS, Moore PH, Zee F, Fitch MM, Steiger DL, Manshardt RM, Paull RE, Drew RA, Sekioka T, Ming R (2002) Genetic diversity of Carica papaya as revealed by AFLP markers. Genome 45:503–512
Lange BM, Presting G (2004) Genomic survey of metabolic pathways in rice. In: Romeo JT (ed) Recent advances in phytochemistry. Elsevier, Amsterdam, pp 111–137
Liu Z, Moore PH, Ma H, Ackerman CM, Ragiba M, Yu Q, Pearl HM, Kim MS, Chartton JW, Stiles JI, Zee FT, Paterson AH, Ming R (2004) A primitive Y chromosome in papaya marks incipient sex chromosome evolution. Nature 427:348–352
Ma H, Moore PH, Liu Z, Kim MS, Yu Q, Fitch MM, Sekioka T, Paterson AH, Ming R (2004) High-density linkage mapping revealed suppression of recombination at the sex determination locus in papaya. Genetics 166:419–436
Mao L, Wood T, Yu Y, Budiman MA, Tomkins J, Woo S, Sasinowski M, Presting G, Frisch D, Goff S, Dean RA, Wing RA (2000) Rice transposable elements: a survey of 73,000 sequence-tagged-connectors. Genome Res 10:982–990
Messing J, Bharti AK, Karlowski WM, Gundlach H, Kim HR, Yu Y, Wei F, Fuks G, Soderlund CA, Mayer KF, Wing RA (2004) Sequence composition and genome organization of maize. Proc Natl Acad Sci 101:14349–14354
Ming R, Moore PH, Zee F, Abbey CA, Ma H, Paterson AH (2001) Construction and characterization of a papaya BAC library as a foundation for molecular dissection of a tree-fruit genome. Theor Appl Genet 102:892–899
Mozo T, Fischer S, Shizuya H, Altmann T (1998) Construction and characterization of the IGF Arabidopsis BAC library. Mol Gen Genet 258:562–570
O’Neill CM, Bancroft I (2000) Comparative physical mapping of segments of the genome of Brassica olearacea var. alboglabra that are homeologous to sequenced regions of chromosomes 4 and 5 of Arabidopsis thaliana. Plant J 23:233–43
Rice Chromosome 10 Sequencing Consortium (2003) In-depth view of structure, activity and evolution of rice chromosome 10. Science 300:1566–1569
Rong J, Bowers JE, Schulze SR, Waghmare VN, Rogers CJ, Pierce GJ, Zhang H, Estill JC, Paterson AH (2005) Comparative genomics of Gossypium and Arabidopsis: unraveling the consequences of both ancient and recent polyploidy. Genome Res 15:1198–1210
Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics methods and protocols (methods in molecular biology). Humana Press, Totowa, pp 365–386
Schlueter JA, Dixon P, Granger C, Grant D, Clark L, Doyle JJ, Shoemaker RC (2004) Mining EST databases to resolve evolutionary events in major crop species. Genome 47:868–877
Shizuya H, Birren B, Kim UJ, Mancino V, Slepak T, Tachiiri Y, Simon M (1992) Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci USA 89:8794–8797
Temnykh S, DeClerck G, Lukashova A, Lipoviich L, Cartinhour, McCouch S (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res 11:1441–1452
The Arabidopsis Genome Initiative (2000) Analysis of the genome structure of the flowering plant Arabidopsis thaliana. Nature 408:796–815
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 24:4876–4882
Tomkins J, Fregene M, Main D, Kim H, Wing R, Tohme J (2004) Bacterial artificial chromosome (BAC) library resource for positional cloning of pest and disease resistance genes in cassava (Manihot esculenta Crantz). Plant Mol Biol 56:555–561
Van Droogenbroeck B, Breyne P, Goetghebeur P, Romeijn-Peeters E, Kyndt T, Gheysen G (2002) AFLP analysis of genetic relationships among papaya and its wild relatives (Caricaceae) from Ecuador. Theor Appl Genet 105:289–297
Wikström N, Savolainen V, Chase M (2001) Evolution of the angiosperms: calibrating the family tree. Proc R Soc Lond B 268:2211–2220
Yan L, Loukoianov A, Tranquilli G, Helguera M, Fahima T, Dubcovsky J (2003) Positional cloning of the wheat vernalization gene VRN1. Proc Natl Acad Sci USA 100:6263–6268
Yang Y-W, Lai K-N, Tai P-Y, Li W-H (1999) Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J Mol Evol 48:597–604
Zhao S, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Krol M, Gebregeorgis E, Shvartsbeyn A, Russell D, Overton L, Jiang L, Dimitrov G, Tran K, Shetty J, Malek JA, Feldblyum T, Nierman WC, Fraser CM (2001) Mouse BAC ends quality assessment and sequence analyses. Genome Res 11:1736–1745
Zhu H, Kim D-J, Baek J-M, Choi H-K, Ellis LC, Küester H, McCombie WR, Peng H-M, Cook DR (2003) Syntenic relationships between Medicago trunculata and Arabidopsis reveal extensive divergence of genome organization. Plant Physiol 131:1028–1026
Acknowledgements
The Center for Genomics, Proteomics and Bioinformatics Research Initiative at the University of Hawai’i contributed 11,013 of the BAC end sequence chromatograms that are described here, using funds provided by the University of Hawai’i. The remainder of the BES data was generated with funds from the United States Department of Agriculture (USDA) Tropical and Subtropical Agriculture Research Program (grant HAW00557G) and a USDA-Agricultural Research Service Cooperative Agreement (CA 58-3020-8-134) with the Hawai’i Agriculture Research Center.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by R. Hagemann
Chun Wan J. Lai and Qingyi Yu have contributed equally to this work.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Lai, C.W.J., Yu, Q., Hou, S. et al. Analysis of papaya BAC end sequences reveals first insights into the organization of a fruit tree genome. Mol Genet Genomics 276, 1–12 (2006). https://doi.org/10.1007/s00438-006-0122-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00438-006-0122-z