Abstract
Background Ensete glaucum (2n = 2x = 18) is a giant herbaceous monocotyledonous plant in the small Musaceae family along with banana (Musa). A high-quality reference genome sequence of E. glaucum offers a vital genomic resource for functional and evolutionary studies of Ensete, the Musaceae, and more widely in the Zingiberales.
Findings Using a combination of Illumina and Oxford Nanopore Technologies (ONT) sequencing, genome-wide chromosome conformation capture (Hi-C), and RNA survey sequence, we report a high-quality assembly of the 481.5Mb genome with 9 pseudochromosomes and 36,836 genes (BUSCO 94.7%). A total of 55% of the genome is composed of repetitive sequences with LTR-retroelements (37%) and DNA transposons (7%) predominant. The 5S and 45S rDNA were each present at one locus, and the 5S rDNA had an exceptionally long monomer length of c.1,056 bp, contrasting with the c. 450 bp monomer at multiple loci in Musa. A tandemly repeated c. 134 bp satellite, 1.1% of the genome (with no similar sequence in Musa), was present around all nine centromeres, with a LINE retroelement also found at Musa centromeres. The assembly, including centromeric positions, enabled us to characterize in detail the chromosomal rearrangements occurring between the x = 9 species and x = 11 species of Musa. Only one chromosome has the same gene content as M. acuminata (ma). Three ma chromosomes represent part of only one E. glaucum (eg) chromosome, while the remaining seven ma chromosomes are fusions of parts of two, three, or four eg chromosomes, demonstrating complex and multiple evolutionary rearrangements in the change between x = 9 and x = 11.
Conclusions The advance towards a Musaceae pangenome including E. glaucum, tolerant of extreme environments, makes a complete set of gene alleles available for crop breeding and understanding environmental responses. The chromosome-scale genome assembly show how chromosome number evolves, and features of the rapid evolution of repetitive sequences.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
The authors' institutions and addresses were edited. Both the manuscript content and supplementary files were kept unchanged.
List of abbreviations
- BLAST
- Basic Local Alignment Search Tool
- bp
- base pairs
- BUSCO
- Benchmarking Universal Single-Copy Orthologs
- BWA
- Burrows-Wheeler Aligner
- FISH
- fluorescence in situ hybridization
- GeMoMa
- Gene Model Mapper
- Gb
- gigabase pairs
- GC
- guanine-cytosine
- GO
- gene ontogeny
- CTAB
- cetyl trimethylammonium bromide
- GWAS
- Genome Wide Association Studies
- Hi-C
- High-throughput chromosome conformation capture
- ITS
- internal transcribed spacer of rDNA
- kb
- kilobase pairs
- KEGG
- Kyoto Encyclopedia of Genes and Genomes
- GeMoMa
- Gene Model Mapper
- LACHESIS
- Ligating Adjacent Chromatin Enables Scaffolding In Situ
- LINE
- long interspersed nucleotide elements
- LTR
- long terminal repeat
- Mb
- megabase pairs
- ML
- maximum likelihood
- miRNA
- microRNA
- Mya
- million years ago
- NCBI
- National Center for Biotechnology Information
- NR
- RefSeq non-redundant proteins
- NOR
- Nucleolar Organizing Region
- NTS
- non transcribed spacer of rDNA
- ONT
- Oxford Nanopore Technologies
- PAML
- Phylogenetic Analysis by Maximum Likelihood
- PacBio
- Pacific Biosciences
- PASA
- Program to Assemble Spliced Alignments
- RAxML
- Randomized Accelerated Maximum Likelihood
- RNA-seq
- RNA sequencing
- rDNA
- ribosomal DNA
- SRA
- Sequence Read Archive
- SSR
- Simple sequence repeat
- TE
- transposable element
- TF
- Transcription Factor
- tRNA
- transfer RNA
- WGD
- whole genome duplication.