Skip to main content

DATA REPORT article

Front. Genet., 11 December 2023
Sec. Computational Genomics

Whole genome sequence of a long-legged fly Condylostylus longicornis from Hawaiʻi

  • 1New York University, New York, NY, United States
  • 2NYU Grossman School of Medicine, New York, NY, United States
  • 3University of Hawaiʻi at Mānoa, Honolulu, HI, United States
  • 4Independent Researcher, Loves Park, IL, United States
  • 5University of California San Diego, San Diego, CA, United States

Introduction

Long-legged flies of the family Dolichopodidae represent one of the most species-rich families of the insect order Diptera (Bickel, 2009). The largest number of species is found in the New World tropics (Brown et al., 2018; Pollet et al., 2018; Runyon, 2020; Riccardi et al., 2022), but many species are also present in other regions of the world, and some have been described as tramp species thriving in their non-native ranges (Bickel, 1991; Naglis and Bickel, 2017; Riccardi et al., 2022). Most long-legged flies are predators (Figure 1), and some species are able to hunt common agricultural pests and disease vectors, including mosquitoes (Laing and Welch, 1963), bark beetles (Beaver, 1966) and other insects (Kautz and Gardiner, 2019), making them potential agents of biological pest control. Dolichopodidae were also suggested to be useful as indicators of environmental quality (Pollet, 2010; Gelbič and Olejníček, 2011).

FIGURE 1
www.frontiersin.org

FIGURE 1. On the left, the head of C. longicornis showing alternating columns of two differently colored types of corneal lenses. In the middle, the head of Chrysosoma sp. showing a stochastic arrangement of corneal lenses. On the right, a long-legged fly consuming its prey.

In addition to their ecological significance, long-legged flies have attracted the interest of researchers studying the function and development of sensory organs and the nervous system (Buschbeck and Strausfeld, 1996; Johnston, 2013; Heinloth et al., 2018). Constituent units (ommatidia) of the compound eyes of some dolichopodid species have a highly unusual arrangement (Figure 1): orange-red and green-yellow corneal lenses form alternating vertical rows, and photoreceptor cells that underlie the lenses of different color display different ultrastructure and divergent spectral properties (Trujillo-Cenóz and Bernard, 1972; Stavenga et al., 2017; Ebadi et al., 2018). This ordering has prompted questions about both its functional role, e.g., whether the different rows filter out differently polarized light in high-glare environments, and the underlying developmental mechanisms, considering that most other insects have stochastically arranged eye units in the retina (Johnston, 2013; Wernet et al., 2015; Perry et al., 2016; Heinloth et al., 2018). Finally, Dolichopodidae provide a convenient framework for studying the evolution of sexual behavior as some species display elaborate courtship behavior not seen in related lineages (Zimmer et al., 2003). Thus, long-legged flies are a widespread group of insects whose study has the potential to provide important insight into ecology, pest control, and neurobiology.

Access to high-quality genomic and transcriptomic resources is essential for studying all aspects of biology, from molecular and developmental mechanisms to adaptation to global climate change (McCulloch and Waters, 2023). Despite the abundance and the potential importance of Dolichopodidae, only two genome assemblies are currently available in Genbank for this family: a highly fragmented assembly of Condylostylus patibulatus from the Midwestern US (Genbank accession GCA_001014875.1) and a recently released chromosome-level assembly of Poecilobothrus nobilitatus from the UK (Genbank accession GCA_947095535.1) created as part of the Darwin Tree of Life Initiative. Here, we report a first high-quality genome assembly and an accompanying transcriptome data set for a species from the tropics - Condylostylus longicornis collected in Hawaiʻi. Importantly, the eyes of C. longicornis exhibit the unusual arrangement of orange-red and green-yellow corneal lenses, which is only found in a small number of genera (Figure 1). Thus, this species holds particular promise to the studies of eye development and evolution.

Material collection

Flies were collected from plant leaves near the Hale Koa Hotel in Honolulu, Hawaiʻi, on 24 November 2020 (Figure 2). Condylostylus longicornis was distinguished from other Dolichopodidae that are abundant in the same location, such as Chrysosoma globiferum, based on the following characteristics: dark-colored legs, narrow wings canted downward when the fly is in a resting position, rounded abdominal segments, and the dark tip of the abdomen (Naglis and Bickel, 2017). Flies were frozen at −80°C on the day of collection and shipped on dry ice to sequencing facilities.

FIGURE 2
www.frontiersin.org

FIGURE 2. On the left, C. longicornis male used for genome sequencing. On the right, C. longicornis female used for transcriptome sequencing.

DNA sequencing and genome assembly

DNA extraction and genome sequencing was performed at the Center for Advanced Genomics Technology at the Icahn School of Medicine at Mount Sinai. Considering that most species of the suborder Brachycera (to which Dolichopodidae belong) have the XY sex determination system, and males carry different sex chromosomes (Vicoso and Bachtrog, 2015), a male sample was used for the genome sequencing. The whole body of a single individual was pulverized using cryoPREP CP02 (Covaris) and high molecular weight genomic DNA was extracted using MagAttract HMW DNA Kit (Qiagen). The size distribution of the extracted DNA was verified using a Femto Pulse system (Agilent), which showed a single band migrating at 23.6 kb (average size). The DNA was sheared using a g-TUBE (Covaris) at 1,500 g for 2 min 3 times back-and-forth for a total of 6 passes. The average size of the sheared DNA was about 10–11 kb. A HiFi SMRTbell library was prepared using Template Prep Kit 2.0 (PacBio) following the Low DNA Input protocol and sequenced on a Sequel II SMRTcell (PacBio).

Circular consensus sequences (CCS) were generated using PacBio CCS on SMRTLink v10.1 with default parameters. The CCS were assembled using the following strategy. First, mitochondrial reads were identified and assembled using MitoHiFi v3.0.0 (Laslett and Canbäck, 2008; Allio et al., 2020; Uliano-Silva et al., 2023) with the complete mitochondrial genome of C. luteicoxa (Genbank accession NC_067856.1) as the reference. Next, the nuclear genome was assembled from the remaining reads using hifiasm v0.16.0 (Cheng et al., 2022) with the --primary flag. Genome completeness of the decontaminated primary assembly was analyzed using BUSCO v5.3.0 (Manni et al., 2021) with a diptera_odb10 database. Contaminant contigs were identified using FCS-GX v0.3.0 (Strope et al., 2023) and removed from the primary assembly. The mitochondrial and the nuclear genomes were merged, and the genome was deposited at NCBI. Adapter sequences identified in the process of the submission were either trimmed off the ends of contigs or removed from the middle of contigs, whereby the contigs containing them were split.

RNA sequencing and transcriptome assembly

RNA extraction and transcriptome sequencing was performed at New York University. To extract RNA, the head of a single female was separated from the body, and the two body parts were placed in separate tubes containing 0.1 and 2.0 mm BashingBeads (Zymo Research). 1 mL TRIzol (Invitrogen) was added to each tube, and the tissues were homogenized for 2 min at 50 Hz on a TissueLyser LT (Qiagen). The tubes were centrifuged, and the supernatant was mixed with chloroform. Upon phase separation, the aqueous phase was used for RNA cleanup using RNA Clean and Concentrator-5 kit (Zymo Research). The RNA integrity was verified on a TapeStation (Agilent), and four libraries (two from the head RNA and two from the headless body RNA) were prepared using NEBNext Ultra II RNA library prep (New England Biolabs). The libraries were sequenced on NextSeq 500 (Illumina) in MidOutput mode using a 2 × 150 bp run configuration.

The RNA-seq reads were aligned to the RefSeq version of the genome assembly (see below) using STAR v2.7.6a (Dobin et al., 2013). The reads were assembled in the genome-guided mode using Trinity v2.15.1 (Grabherr et al., 2011). Completeness of the transcriptome assembly was analyzed using BUSCO v5.3.0 (Manni et al., 2021) with a diptera_odb10 database. The assembly and the raw data were deposited at NCBI. Assembled transcripts that were either too short or contained adapter or primer contamination were removed during submission.

Assembly description

The mitochondrial genome assembly had a size of 16,606 bp (greater than 600x coverage) and it contained 37 genes without any frameshifts in the coding sequences (Table 1). Assembling the nuclear genome yielded a 547 Mb (36.3x coverage) primary assembly with 874 contigs, contig N50 of 7.1 Mb, GC content of 26.2% and a 461 Mb alternate assembly with 1,720 contigs, contig N50 of 1.6 Mb, and GC content of 26.7% (Table 1). The size of both the primary and the alternate assemblies is comparable to the size of the publicly available assembly of C. patibulatus (452 Mb), although it is considerably smaller than the size of the more distantly related P. nobilitatus (944 Mb primary assembly). The GC content is comparable to both C. patibulatus and P. nobilitatus, which places Dolichopodidae on the lower end of the GC content spectrum among insects (Dennis et al., 2020). Completeness analysis of the primary assembly identified 3,050 (92.8%) BUSCOs, out of which 3,017 (91.8%) were complete and single-copy and 33 (1.0%) were complete and duplicated (Table 1). FCS-GX search identified 17 contaminant contigs in the primary assembly with similarity to the sequences of Gammaproteobacteria. The contaminant contigs were removed from the primary nuclear assembly, which was merged with the mitochondrial assembly and submitted to NCBI. The final assembly (Genbank accession GCA_029603195.2, RefSeq accession GCF_029603195.1) has a size of 544 Mb, it contains 847 contigs with a contig N50 of 7.2 Mb, and the average GC content is 26.5% (Table 1).

TABLE 1
www.frontiersin.org

TABLE 1. Genome and transcriptome assembly statistics.

RNA sequencing yielded 134 million reads. Completeness analysis of the transcriptome assembly identified 2,425 (73.8%) BUSCOs (Table 1). The RNA-seq data deposited to NCBI facilitated the creation of a RefSeq gene annotation set (GCF_029603195.1-RS_2023_04) produced using the Eukaryotic Genome Annotation Pipeline (Thibaud-Nissen et al., 2013). The annotation set contains 14,253 genes, which includes 12,227 protein-coding and 1,617 non-coding genes (Table 1).

Conclusion

In summary, we have created a highly contiguous genome assembly of C. longicornis. It is the third genome assembly for the family Dolichopodidae, and the only long-legged fly genome assembly that used biological material collected in the tropics, which harbor the greatest species diversity, yet remain understudied. The genome assembly of C. longicornis will serve as a valuable resource for those who study eye development and behavior of long-legged flies. In addition to the nuclear genome, we report the complete mitochondrial genome assembly, which is useful for phylogenetic inferences. Finally, we provide RNA-seq data which facilitate gene structure annotations and are often used in studies on gene evolution.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm.nih.gov/, PRJNA932838 https://www.ncbi.nlm.nih.gov/, PRJNA932843 https://www.ncbi.nlm.nih.gov/genbank/, JAQSLO000000000 https://www.ncbi.nlm.nih.gov/, GCF_029603195.1.

Ethics statement

Ethical approval was not required for the study involving animals in accordance with the local legislation and institutional requirements because collection of non-edangered insects from the wild does not require ethical approval.

Author contributions

BS: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation, Methodology, Writing–original draft, Writing–review and editing. MP: Methodology, Resources, Writing–review and editing. JM: Methodology, Writing–review and editing. KS: Conceptualization, Methodology, Resources, Writing–review and editing. FL: Conceptualization, Methodology, Resources, Writing–review and editing. IH: Funding acquisition, Methodology, Resources, Writing–review and editing. CD: Conceptualization, Funding acquisition, Project administration, Writing–review and editing. MP: Conceptualization, Funding acquisition, Methodology, Resources, Writing–review and editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The project was funded by the NIH grant R01 EY13010 from the National Eye Institute. BS was supported by a Long-Term Fellowship LT000010/2020-L from the Human Frontier Science Program. IH was supported by a Long-Term Fellowship LT000757/2017-L from the Human Frontier Science Program. MP was supported by the NIH grant R00 EY027016 from the National Eye Institute.

Acknowledgments

We would like to thank the US Army and the staff at Hale Koa Hotel for kindly allowing us to collect flies on their property. We are also grateful to Daniel J. Bickel of the Australian Museum Research Institute and Neal L. Evenhuis of the Bishop Museum for their help with species identification. Finally, we greatly appreciate the assistance of the NCBI staff, including Françoise Thibaud-Nissen and Terence Murphy.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Allio, R., Schomaker-Bastos, A., Romiguier, J., Prosdocimi, F., Nabholz, B., and Delsuc, F. (2020). MitoFinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics. Mol. Ecol. Resour. 20, 892–905. doi:10.1111/1755-0998.13160

PubMed Abstract | CrossRef Full Text | Google Scholar

Beaver, R. A. (1966). The biology and immature stages of two species of Medetera (Diptera: Dolichopodidae) associated with the bark beetle Scolytus scolytus (F.). Proc. R. Entomol. Soc. Lond. 41, 145–154. doi:10.1111/j.1365-3032.1966.tb00334.x

CrossRef Full Text | Google Scholar

Bickel, D. J. (1991). Sciapodinae, medeterinae (insecta: Diptera) with a generic review of the Dolichopodidae. Auckland, New Zealand: DSIR Plant Protection/Te Wāhanga Manaaki Tupu.

Google Scholar

Bickel, D. J. (2009). “Family Dolichopodidae,” in Manual of central American Diptera. Editors B. V. Brown, A. Borkent, J. M. Cumming, D. M. Wood, N. E. Woodley, and M. Zumbado (Ottawa: NRC Press), 671–694.

Google Scholar

Brown, B. V., Borkent, A., Adler, P. H., Amorim, D. de S., Barber, K., Bickel, D., et al. (2018). Comprehensive inventory of true flies (Diptera) at a tropical site. Commun. Biol. 1, 21. doi:10.1038/s42003-018-0022-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Buschbeck, E. K., and Strausfeld, N. J. (1996). Visual motion-detection circuits in flies: small-field retinotopic elements responding to motion are evolutionarily conserved across taxa. J. Neurosci. 16, 4563–4578. doi:10.1523/JNEUROSCI.16-15-04563.1996

PubMed Abstract | CrossRef Full Text | Google Scholar

Cheng, H., Jarvis, E. D., Fedrigo, O., Koepfli, K.-P., Urban, L., Gemmell, N. J., et al. (2022). Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 40, 1332–1335. doi:10.1038/s41587-022-01261-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Dennis, A. B., Ballesteros, G. I., Robin, S., Schrader, L., Bast, J., Berghöfer, J., et al. (2020). Functional insights from the GC-poor genomes of two aphid parasitoids, Aphidius ervi and Lysiphlebus fabarum. BMC Genomics 21, 376. doi:10.1186/s12864-020-6764-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., et al. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29 (1), 15–21. doi:10.1093/bioinformatics/bts635

PubMed Abstract | CrossRef Full Text | Google Scholar

Ebadi, H., Perry, M., Short, K., Klemm, K., Desplan, C., Stadler, P. F., et al. (2018). Patterning the insect eye: from stochastic to deterministic mechanisms. PLoS Comput. Biol. 14, e1006363. doi:10.1371/journal.pcbi.1006363

PubMed Abstract | CrossRef Full Text | Google Scholar

Gelbič, I., and Olejníček, J. (2011). Ecology of Dolichopodidae (Diptera) in a wetland habitat and their potential role as bioindicators. Cent. Eur. J. Biol. 6, 118–129. doi:10.2478/s11535-010-0098-x

CrossRef Full Text | Google Scholar

Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., et al. (2011). Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29 (7), 644–652. doi:10.1038/nbt.1883

PubMed Abstract | CrossRef Full Text | Google Scholar

Heinloth, T., Uhlhorn, J., and Wernet, M. F. (2018). Insect responses to linearly polarized reflections: orphan behaviors without neural circuits. Front. Cell. Neurosci. 12, 50. doi:10.3389/fncel.2018.00050

PubMed Abstract | CrossRef Full Text | Google Scholar

Johnston, R. J. (2013). “Lessons about terminal differentiation from the specification of color-detecting photoreceptors in the Drosophila retina. Ann. N.Y. Acad. Sci. 1293, 33–44. doi:10.1111/nyas.12178

PubMed Abstract | CrossRef Full Text | Google Scholar

Kautz, A. R., and Gardiner, M. M. (2019). Agricultural intensification may create an attractive sink for Dolichopodidae, a ubiquitous but understudied predatory fly family. J. Insect Conserv. 23, 453–465. doi:10.1007/s10841-018-0116-2

CrossRef Full Text | Google Scholar

Laing, J. E., and Welch, H. E. (1963). A dolichopodid predacious on larvae of Culex restuans theob. Proc. Entomol. Soc. Ont. 93, 89–90.

Google Scholar

Laslett, D., and Canbäck, B. (2008). ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences. Bioinformatics 24, 172–175. doi:10.1093/bioinformatics/btm573

PubMed Abstract | CrossRef Full Text | Google Scholar

McCulloch, G. A., and Waters, J. M. (2023). Rapid adaptation in a fast-changing world: emerging insights from insect genomics. Glob. Chang. Biol. 29, 943–954. doi:10.1111/gcb.16512

PubMed Abstract | CrossRef Full Text | Google Scholar

Naglis, S., and Bickel, D. J. (2017). “Order Diptera, family Dolichopodidae. Subfamily sciapodinae,” in Arthropod fauna of the UAE. Editor A. van Harten (Abu Dhabi: Dar Al Ummah), 565–571.

Google Scholar

Perry, M., Kinoshita, M., Saldi, G., Huo, L., Arikawa, K., and Desplan, C. (2016). Molecular logic behind the three-way stochastic choices that expand butterfly colour vision. Nature 535, 280–284. doi:10.1038/nature18616

PubMed Abstract | CrossRef Full Text | Google Scholar

Pollet, M. (2010). “Diptera as ecological indicators of habitat and habitat change,” in Diptera diversity: status, challenges and tools (Leiden: Brill), 302–322.

CrossRef Full Text | Google Scholar

Pollet, M., Leponce, M., Pascal, O., Touroult, J., and Van Calster, H. (2018). Dipterological survey in Mitaraka Massif (French Guiana) reveals megadiverse dolichopodid fauna with an unprecedented species richness in Paraclius Loew, 1864 (Diptera: Dolichopodidae). Zoosystema 40, 471–491. doi:10.5252/zoosystema2018v40a21

CrossRef Full Text | Google Scholar

Riccardi, P. R., Fachin, D. A., Ale-Rocha, R., Amaral, E. M., Amorim, D. de S., Gil-Azevedo, L. H., et al. (2022). Checklist of the dipterofauna (insecta) from roraima, Brazil, with special reference to the Brazilian ecological station of maracá. Pap. Avulsos Zool. 62, e202262014. doi:10.11606/1807-0205/2022.62.014

CrossRef Full Text | Google Scholar

Runyon, J. B. (2020). The Dolichopodidae (Diptera) of Montserrat, west indies. Zookeys 966, 57–151. doi:10.3897/zookeys.966.55192

PubMed Abstract | CrossRef Full Text | Google Scholar

Stavenga, D. G., Meglič, A., Pirih, P., Koshitaka, H., Arikawa, K., Wehling, M. F., et al. (2017). Photoreceptor spectral tuning by colorful, multilayered facet lenses in long-legged fly eyes (Dolichopodidae). J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 203, 23–33. doi:10.1007/s00359-016-1131-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Strope, P., Sweeney, D., and Holmes, B. (2023). The NCBI foreign contamination screen. Available at: https://github.com/ncbi/fcs.

Google Scholar

Thibaud-Nissen, F., Souvorov, A., Murphy, T., DiCuccio, M., and Kitts, P. (2013). “Eukaryotic genome annotation pipeline,” in The NCBI handbook. 2nd edition (Bethesda, MD: National Center for Biotechnology Information).

Google Scholar

Trujillo-Cenóz, O., and Bernard, G. D. (1972). Some aspects of the retinal organization of Sympycnus linetaus Loew (Diptera, Dolichopodidae). J. Ultrastruct. Res. 38, 149–160. doi:10.1016/s0022-5320(72)90089-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Uliano-Silva, M., Ferreira, J. G. R., Krasheninnikova, K., Torrance, J., Formenti, G., Abueg, L., et al. (2023). MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio High Fidelity reads. bioRxiv, 2022.12.23.521667. doi:10.1101/2022.12.23.521667

CrossRef Full Text | Google Scholar

Vicoso, B., and Bachtrog, D. (2015). Numerous transitions of sex chromosomes in Diptera. PLoS Biol. 13, e1002078. doi:10.1371/journal.pbio.1002078

PubMed Abstract | CrossRef Full Text | Google Scholar

Wernet, M. F., Perry, M. W., and Desplan, C. (2015). The evolutionary diversity of insect retinal mosaics: common design principles and emerging molecular logic. Trends Genet. 31, 316–328. doi:10.1016/j.tig.2015.04.006

PubMed Abstract | CrossRef Full Text | Google Scholar

Zimmer, M., Diestelhorst, O., and Lunau, K. (2003). Courtship in long-legged flies (Diptera: Dolichopodidae): function and evolution of signals. Behav. Ecol. 14, 526–530. doi:10.1093/beheco/arg028

CrossRef Full Text | Google Scholar

Keywords: Condylostylus longicornis, Dolichopodidae, long-legged flies, genome, transcriptome, Hawaiʻi

Citation: Sieriebriennikov B, Porter ML, Mlejnek J, Short K, Lebhardt F, Holguera I, Desplan C and Perry MW (2023) Whole genome sequence of a long-legged fly Condylostylus longicornis from Hawaiʻi. Front. Genet. 14:1325213. doi: 10.3389/fgene.2023.1325213

Received: 20 October 2023; Accepted: 30 November 2023;
Published: 11 December 2023.

Edited by:

Luca Ermini, Luxembourg Institute of Health, Luxembourg

Reviewed by:

Taro Nakamura, Graduate University for Advanced Studies (Sokendai), Japan
Pedro Andrade, Centro de Investigacao em Biodiversidade e Recursos Geneticos (CIBIO-InBIO), Portugal

Copyright © 2023 Sieriebriennikov, Porter, Mlejnek, Short, Lebhardt, Holguera, Desplan and Perry. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bogdan Sieriebriennikov, bs167@nyu.edu; Claude Desplan, cd38@nyu.edu; Michael W. Perry, mwperry@ucsd.edu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.