Single nucleotide polymorphisms identification in expressed genes of Schistosoma mansoni

https://doi.org/10.1016/j.molbiopara.2007.04.003Get rights and content

Abstract

Single nucleotide polymorphism (SNP) markers have been shown to be useful in genetic investigations of medically important parasites and their hosts. In this paper, we describe the prediction and validation of SNPs in ESTs of Schistosoma mansoni. We used 107,417 public sequences of S. mansoni and identified 15,614 high-quality candidate SNPs in 12,184 contigs. The presence of predicted SNPs was observed in well characterized antigens and vaccine candidates such as those coding for myosin; Sm14 and Sm23; cathepsin B and triosephosphate isomerase (TPI). Additionally, SNPs were experimentally validated for the cathepsin B. A comparative model of the S. mansoni cathepsin B was built for predicting the possible consequences of amino acid substitutions on the protein structure. An analysis of the substitutions indicated that the amino acids were mostly located on the surface of the molecule, and we found no evidence for a significant conformational change of the enzyme. However, at least one of the substitutions could result in a structural modification of an epitope.

Introduction

The availability of genome sequences and a large number of transcriptome sequencing initiatives opens new doors for the discovery of a class of polymorphic molecular markers called single nucleotide polymorphisms (SNPs). SNPs are the most abundant type of genetic variation between individuals and can provide information about phenotypic differences. Owing to their high density, the exploitation of SNPs for marker assays has the potential to provide answers to a large number of important biological, genetic, pharmacological and medical questions [1]. Identifying the polymorphisms in relation to disease predisposition and drug response is a major aim of the post genomic era.

Many of the recent efforts to describe the genomes of organisms focus on the generation of expressed sequence tags (ESTs) by partial sequencing of cDNAs. ESTs have been extensively used for gene discovery, expression analysis and transcript mapping of genes from a wide variety of organisms, including Schistosoma mansoni [2]. The transcriptome, however, lacks information on regulatory sequences, intergenic regions and introns. Currently, in depth information on genetic variation in Schistosomes is obtained with polymorphic microsatellite markers, generally located in non-coding regions [3]. In contrast, SNPs have been identified directly in coding regions (cSNPs) with a software-based approach using large numbers of redundant ESTs data sets [4], [5], [6]. Nevertheless, up to the present investigation, such molecular markers have not been studied on a large scale in S. mansoni. Therefore, in this study we focused on SNPs in gene coding regions of S. mansoni.

Schistosomiasis remains a major public health problem in Africa, Asia and parts of South America, despite strenuous efforts to control its impact on human populations. The disease is caused by digenetic blood trematodes, with S. mansoni being the only human infecting species in South America and one of the two most relevant species in Africa. Disease control efforts are mainly based on mass chemotherapy, as there is no available vaccine [7]. The study of the genetic variation in S. mansoni parasites has practical significance for developing additional strategies to control the disease. This information could be used for the study of transmission dynamics (as genetic markers) or for observing the variability of antigens and drug targets [8], [9]. In this study, we developed an automated pipeline to detect SNPs in silico in ESTs of S. mansoni using high-quality sequences and alignment parameters. Furthermore, we observed the predicted SNPs in vaccine target candidates, validated putative SNPs in the cathepsin B gene and analyzed model variant proteins for possible conformational modifications. Detailed experiment outcomes, including SNP information and EST assemblies are available at http://bioinfo.cpqrr.fiocruz.br/snp.

Section snippets

Sequence data sets and polymorphism identification

We used public expressed sequence tags (ESTs) generated by Verjovski-Almeida et al. [2], including quality information of the bases obtained with Phred download from the web site mentioned in the manuscript [10], [11]. The sequences were assembled into contigs using CAP3 [12].

To automate the process of SNP prediction, we developed cSNPer—a new program to detect SNPs. cSNPer reads the ACE file generated by CAP3 to identify candidate SNPs. To calculate a Neighborhood Quality Standard (NQS), the

SNP identification

A large number of studies have focused on investigating genetic polymorphisms in individual genes in order to estimate the genetic contribution to the disease outcome. ESTs have been used to mine SNPs in several model organisms, including parasites in a limited manner [20], [21]. In this paper, we describe the use of 107,417, representing the near complete transcriptome of S. mansoni, ESTs from cDNA libraries of different stages of S. mansoni to identify SNPs. A summary of the results is in

Acknowledgements

This work was partially funded by NIH-Fogarty training grant 5D43TW007012-03 and FAPEMIG grants 17001/01 and 407/02 to G.O. MS received financial support by funds from a NIH Fogarty Training grants (5D43TW006580-05) and CNPq.

References (36)

  • N. Rodrigues et al.

    Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Mem Inst Oswaldo Cruz

    (2002)
  • D. Somers et al.

    Mining single-nucleotide polymorphisms from hexaploid wheat ESTs

    Genome

    (2003)
  • M. Nelson et al.

    Large-scale validation of single nucleotide polymorphisms in gene regions

    Genome Res

    (2004)
  • V. Guryev et al.

    Single nucleotide polymorphisms associated with rat expressed sequences

    Genome Res

    (2004)
  • B. Vennervald et al.

    Morbidity in schistosomiasis: an update

    Curr Opin Infect Dis

    (2004)
  • D.C. Criscione et al.

    Molecular ecology of parasites: elucidating ecological and microevolutionary processes

    Mol Ecol

    (2005)
  • B. Ewing et al.

    Basecalling of automated sequencer traces using phred. I. Accuracy assessment

    Genome Res

    (1998)
  • B. Ewing et al.

    Base calling of automated sequencer traces using phred. II. Error probabilities

    Genome Res

    (1998)
  • Cited by (18)

    • Low allelic diversity in vaccine candidates genes from different locations sustain hope for Fasciola hepatica immunization

      2018, Veterinary Parasitology
      Citation Excerpt :

      To date, investigation of vaccine candidate antigen variability in trematodes has only been previously examined in schistosomes (Gleichsner et al., 2015). Consistent with our study of F. hepatica, analysis of the cathepsin B sequence in Schistosoma mansoni also identified low levels of that most probably will not impact on protein-antibody interaction and binding (Simões et al., 2007). Studies of candidate antigens tetraspanins both in S. mansoni and S. japonicum show more variability, however it should be taken into account that these membrane proteins constitute a large gene family, so it is not clear if the variability is allelic or due to multiple genes, similarly to what we found in FhCL1 (Cupit et al., 2011; Young et al., 2015; Zhang et al., 2011).

    • Sesquiterpenes effects on DNA of Schistosoma mansoni after in vivo treatment

      2018, Gene Reports
      Citation Excerpt :

      According to Clark and Lanigan (1993), genetic polymorphism are detected by the presence of amplified fragments with differences in the DNA sequence, which may be caused by a change in the nucleotide sequence, either by mutations, deletions or replacement of a single base between conserved sites of specific loci. Previous authors have described that the responses to therapy are usually associated with biochemical and biological factors such as genetics, drug absorption, drug transport, drug metabolism and drug elimination or excretion, in the definitive host (Sato et al., 2004; Simões et al., 2007; Manthena et al., 2017). However, current data on genetic polymorphisms in S. mansoni, caused mainly to an artificial selection due to drug pressure and detected and analysed by the RAPD-PCR protocol are still scarce in the literature, and nevertheless important considering the fact that this protocol can be used for S. mansoni and other Schistosoma parasites, both in the field or in the laboratory.

    View all citing articles on Scopus
    View full text