Skip to main content
Log in

Analysis of papaya BAC end sequences reveals first insights into the organization of a fruit tree genome

  • Original Paper
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Papaya (Carica papaya L.) is a major tree fruit crop of tropical and subtropical regions with an estimated genome size of 372 Mbp. We present the analysis of 4.7% of the papaya genome based on BAC end sequences (BESs) representing 17 million high-quality bases. Microsatellites discovered in 5,452 BESs and flanking primer sequences are available to papaya breeding programs at http://www.genomics.hawaii.edu/papaya/BES. Sixteen percent of BESs contain plant repeat elements, the vast majority (83.3%) of which are class I retrotransposons. Several novel papaya-specific repeats were identified. Approximately 19.1% of the BESs have homology to Arabidopsis cDNA. Increasing numbers of completely sequenced plant genomes and BES projects enable novel approaches to comparative plant genomics. Paired BESs of Carica, Arabidopsis, Populus, Brassica and Lycopersicon were mapped onto the completed genomes of Arabidopsis and Populus. In general the level of microsynteny was highest between closely related organisms. However, papaya revealed a higher degree of apparent synteny with the more distantly related poplar than with the more closely related Arabidopsis. This, as well as significant colinearity observed between peach and poplar genome sequences, support recent observations of frequent genome rearrangements in the Arabidopsis lineage and suggest that the poplar genome sequence may be more useful for elucidating the papaya and other rosid genomes. These insights will play a critical role in selecting species and sequencing strategies that will optimally represent crop genomes in sequence databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

BAC:

Bacterial artificial chromosome

BES:

BAC end sequence

kb:

Kilobase

Mbp:

Megabase pairs

MYA:

Million years ago

nt:

Nucleotide

SSR:

Simple sequence repeat

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    PubMed  CAS  Google Scholar 

  • Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9(3):211–215

    Google Scholar 

  • Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unraveling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422:433–438

    Article  PubMed  CAS  Google Scholar 

  • Chen M, Presting G, Barbazuk WB, Goicoechea JL, Blackmon B, Fang G, Kim H, Frisch D, Yu Y, Sun S, Higingbottom S, Phimphilai J, Phimphilai D, Thurmond S, Gaudette B, Li P, Liu J, Hatfield J, Main D, Farrar K, Henderson C, Barnett L, Costa R, Williams B, Walser S, Atkins M, Hall C, Budiman MA, Tomkins JP, Luo M, Bancroft I, Salse J, Regad F, Mohapatra T, Singh NK, Tyagi AK, Soderlund C, Dean RA, Wing RA (2002) An integrated physical and genetic map of the rice genome. Plant Cell 14:1–10

    Article  CAS  Google Scholar 

  • Cheng Z, Presting G, Buell CR, Wing RA, Jiang J (2001) High-resolution pachytene chromosome mapping of bacterial artificial chromosomes anchored by genetic markers reveals the centromere location and the distribution of genetic recombination along chromosome 10 of rice. Genetics 157:1749–1757

    PubMed  CAS  Google Scholar 

  • Choi S, Creelman RA, Mullet JE, Wing RA (1995) Construction and characterization of a bacterial artificial chromosome library of Arabidopsis thaliana. Plant Mol Biol Rep 13:124–129

    Article  Google Scholar 

  • Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194

    PubMed  CAS  Google Scholar 

  • Ewing B, Hillier L, Wend MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–185

    PubMed  CAS  Google Scholar 

  • Georgi LL, Wang Y, Reighard GL, Mao L, Wing RA, Abbott AG (2003) Comparison of peach and Arabidopsis genomic sequences: fragmentary conservation of gene neighborhoods. Genome 46:268–276

    Article  PubMed  CAS  Google Scholar 

  • Goff SA, Ricke D, Lan T, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange B, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun W, Chen L, Cooper B, Park S, Wood T, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller R, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100

    Article  PubMed  CAS  Google Scholar 

  • Hong CP, Lee SJ, Park JY, Plaha P, Park YS, Lee YK, Choi JE, Kim KY, Lee JH, Lee J, Jin H, Choi SR, Lim YP (2004) Construction of a BAC library of Korean ginseng and initial analysis of BAC-end sequences. Mol Genet Genomics 271:709–716

    Article  PubMed  CAS  Google Scholar 

  • Huang S, van der Vossen EAG, Kuang H, Vleeshouwers VGAA, Zhang N, Borm TJA, van Eck HJ, Baker B, Jacobsen E, Visser RGF (2005) Comparative genomics enabled the isolation of the R3a late blight resistance gene in potato. Plant J 42:251–261

    Article  PubMed  CAS  Google Scholar 

  • Ilic K, SanMiguel PJ, Bennetzen JL (2003) A complex history of rearrangements in an orthologous region of the maize, sorghum, and rice genomes. Proc Natl Acad Sci USA 100:12265–12270

    Article  PubMed  CAS  Google Scholar 

  • International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800

    Article  Google Scholar 

  • Judd WS, Campbell CS, Kellogg EA, Stevens PF, Donoghue MJ (2002) Plant systematics: a phylogenetic approach, 2nd edn. Sinauer Associates, Inc. Sunderland

    Google Scholar 

  • Jung S, Abbott A, Jesudurai C, Tomkins J, Main D (2005) Frequency, type, distribution and annotation of simple sequence repeats in Rosaceae ESTs. Funct Integr Genomics 5:136–143

    Article  PubMed  CAS  Google Scholar 

  • Katti M, Ranjekar PK, Gupta VS (2001) Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol 18:1161–1167

    PubMed  CAS  Google Scholar 

  • Kim MS, Moore PH, Zee F, Fitch MM, Steiger DL, Manshardt RM, Paull RE, Drew RA, Sekioka T, Ming R (2002) Genetic diversity of Carica papaya as revealed by AFLP markers. Genome 45:503–512

    Article  PubMed  CAS  Google Scholar 

  • Lange BM, Presting G (2004) Genomic survey of metabolic pathways in rice. In: Romeo JT (ed) Recent advances in phytochemistry. Elsevier, Amsterdam, pp 111–137

    Google Scholar 

  • Liu Z, Moore PH, Ma H, Ackerman CM, Ragiba M, Yu Q, Pearl HM, Kim MS, Chartton JW, Stiles JI, Zee FT, Paterson AH, Ming R (2004) A primitive Y chromosome in papaya marks incipient sex chromosome evolution. Nature 427:348–352

    Article  PubMed  CAS  Google Scholar 

  • Ma H, Moore PH, Liu Z, Kim MS, Yu Q, Fitch MM, Sekioka T, Paterson AH, Ming R (2004) High-density linkage mapping revealed suppression of recombination at the sex determination locus in papaya. Genetics 166:419–436

    Article  PubMed  CAS  Google Scholar 

  • Mao L, Wood T, Yu Y, Budiman MA, Tomkins J, Woo S, Sasinowski M, Presting G, Frisch D, Goff S, Dean RA, Wing RA (2000) Rice transposable elements: a survey of 73,000 sequence-tagged-connectors. Genome Res 10:982–990

    Article  PubMed  CAS  Google Scholar 

  • Messing J, Bharti AK, Karlowski WM, Gundlach H, Kim HR, Yu Y, Wei F, Fuks G, Soderlund CA, Mayer KF, Wing RA (2004) Sequence composition and genome organization of maize. Proc Natl Acad Sci 101:14349–14354

    Article  PubMed  CAS  Google Scholar 

  • Ming R, Moore PH, Zee F, Abbey CA, Ma H, Paterson AH (2001) Construction and characterization of a papaya BAC library as a foundation for molecular dissection of a tree-fruit genome. Theor Appl Genet 102:892–899

    Article  CAS  Google Scholar 

  • Mozo T, Fischer S, Shizuya H, Altmann T (1998) Construction and characterization of the IGF Arabidopsis BAC library. Mol Gen Genet 258:562–570

    Article  PubMed  CAS  Google Scholar 

  • O’Neill CM, Bancroft I (2000) Comparative physical mapping of segments of the genome of Brassica olearacea var. alboglabra that are homeologous to sequenced regions of chromosomes 4 and 5 of Arabidopsis thaliana. Plant J 23:233–43

    Article  PubMed  CAS  Google Scholar 

  • Rice Chromosome 10 Sequencing Consortium (2003) In-depth view of structure, activity and evolution of rice chromosome 10. Science 300:1566–1569

    Article  Google Scholar 

  • Rong J, Bowers JE, Schulze SR, Waghmare VN, Rogers CJ, Pierce GJ, Zhang H, Estill JC, Paterson AH (2005) Comparative genomics of Gossypium and Arabidopsis: unraveling the consequences of both ancient and recent polyploidy. Genome Res 15:1198–1210

    Article  PubMed  CAS  Google Scholar 

  • Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics methods and protocols (methods in molecular biology). Humana Press, Totowa, pp 365–386

    Google Scholar 

  • Schlueter JA, Dixon P, Granger C, Grant D, Clark L, Doyle JJ, Shoemaker RC (2004) Mining EST databases to resolve evolutionary events in major crop species. Genome 47:868–877

    Article  PubMed  CAS  Google Scholar 

  • Shizuya H, Birren B, Kim UJ, Mancino V, Slepak T, Tachiiri Y, Simon M (1992) Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci USA 89:8794–8797

    Article  PubMed  CAS  Google Scholar 

  • Temnykh S, DeClerck G, Lukashova A, Lipoviich L, Cartinhour, McCouch S (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res 11:1441–1452

    Article  PubMed  CAS  Google Scholar 

  • The Arabidopsis Genome Initiative (2000) Analysis of the genome structure of the flowering plant Arabidopsis thaliana. Nature 408:796–815

    Article  Google Scholar 

  • Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 24:4876–4882

    Article  Google Scholar 

  • Tomkins J, Fregene M, Main D, Kim H, Wing R, Tohme J (2004) Bacterial artificial chromosome (BAC) library resource for positional cloning of pest and disease resistance genes in cassava (Manihot esculenta Crantz). Plant Mol Biol 56:555–561

    Article  PubMed  CAS  Google Scholar 

  • Van Droogenbroeck B, Breyne P, Goetghebeur P, Romeijn-Peeters E, Kyndt T, Gheysen G (2002) AFLP analysis of genetic relationships among papaya and its wild relatives (Caricaceae) from Ecuador. Theor Appl Genet 105:289–297

    Article  PubMed  Google Scholar 

  • Wikström N, Savolainen V, Chase M (2001) Evolution of the angiosperms: calibrating the family tree. Proc R Soc Lond B 268:2211–2220

    Article  Google Scholar 

  • Yan L, Loukoianov A, Tranquilli G, Helguera M, Fahima T, Dubcovsky J (2003) Positional cloning of the wheat vernalization gene VRN1. Proc Natl Acad Sci USA 100:6263–6268

    Article  PubMed  CAS  Google Scholar 

  • Yang Y-W, Lai K-N, Tai P-Y, Li W-H (1999) Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J Mol Evol 48:597–604

    Article  PubMed  CAS  Google Scholar 

  • Zhao S, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Krol M, Gebregeorgis E, Shvartsbeyn A, Russell D, Overton L, Jiang L, Dimitrov G, Tran K, Shetty J, Malek JA, Feldblyum T, Nierman WC, Fraser CM (2001) Mouse BAC ends quality assessment and sequence analyses. Genome Res 11:1736–1745

    Article  PubMed  Google Scholar 

  • Zhu H, Kim D-J, Baek J-M, Choi H-K, Ellis LC, Küester H, McCombie WR, Peng H-M, Cook DR (2003) Syntenic relationships between Medicago trunculata and Arabidopsis reveal extensive divergence of genome organization. Plant Physiol 131:1028–1026

    Article  Google Scholar 

Download references

Acknowledgements

The Center for Genomics, Proteomics and Bioinformatics Research Initiative at the University of Hawai’i contributed 11,013 of the BAC end sequence chromatograms that are described here, using funds provided by the University of Hawai’i. The remainder of the BES data was generated with funds from the United States Department of Agriculture (USDA) Tropical and Subtropical Agriculture Research Program (grant HAW00557G) and a USDA-Agricultural Research Service Cooperative Agreement (CA 58-3020-8-134) with the Hawai’i Agriculture Research Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gernot G. Presting.

Additional information

Communicated by R. Hagemann

Chun Wan J. Lai and Qingyi Yu have contributed equally to this work.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, C.W.J., Yu, Q., Hou, S. et al. Analysis of papaya BAC end sequences reveals first insights into the organization of a fruit tree genome. Mol Genet Genomics 276, 1–12 (2006). https://doi.org/10.1007/s00438-006-0122-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-006-0122-z

Keywords

Navigation