Single nucleotide polymorphisms identification in expressed genes of Schistosoma mansoni
Introduction
The availability of genome sequences and a large number of transcriptome sequencing initiatives opens new doors for the discovery of a class of polymorphic molecular markers called single nucleotide polymorphisms (SNPs). SNPs are the most abundant type of genetic variation between individuals and can provide information about phenotypic differences. Owing to their high density, the exploitation of SNPs for marker assays has the potential to provide answers to a large number of important biological, genetic, pharmacological and medical questions [1]. Identifying the polymorphisms in relation to disease predisposition and drug response is a major aim of the post genomic era.
Many of the recent efforts to describe the genomes of organisms focus on the generation of expressed sequence tags (ESTs) by partial sequencing of cDNAs. ESTs have been extensively used for gene discovery, expression analysis and transcript mapping of genes from a wide variety of organisms, including Schistosoma mansoni [2]. The transcriptome, however, lacks information on regulatory sequences, intergenic regions and introns. Currently, in depth information on genetic variation in Schistosomes is obtained with polymorphic microsatellite markers, generally located in non-coding regions [3]. In contrast, SNPs have been identified directly in coding regions (cSNPs) with a software-based approach using large numbers of redundant ESTs data sets [4], [5], [6]. Nevertheless, up to the present investigation, such molecular markers have not been studied on a large scale in S. mansoni. Therefore, in this study we focused on SNPs in gene coding regions of S. mansoni.
Schistosomiasis remains a major public health problem in Africa, Asia and parts of South America, despite strenuous efforts to control its impact on human populations. The disease is caused by digenetic blood trematodes, with S. mansoni being the only human infecting species in South America and one of the two most relevant species in Africa. Disease control efforts are mainly based on mass chemotherapy, as there is no available vaccine [7]. The study of the genetic variation in S. mansoni parasites has practical significance for developing additional strategies to control the disease. This information could be used for the study of transmission dynamics (as genetic markers) or for observing the variability of antigens and drug targets [8], [9]. In this study, we developed an automated pipeline to detect SNPs in silico in ESTs of S. mansoni using high-quality sequences and alignment parameters. Furthermore, we observed the predicted SNPs in vaccine target candidates, validated putative SNPs in the cathepsin B gene and analyzed model variant proteins for possible conformational modifications. Detailed experiment outcomes, including SNP information and EST assemblies are available at http://bioinfo.cpqrr.fiocruz.br/snp.
Section snippets
Sequence data sets and polymorphism identification
We used public expressed sequence tags (ESTs) generated by Verjovski-Almeida et al. [2], including quality information of the bases obtained with Phred download from the web site mentioned in the manuscript [10], [11]. The sequences were assembled into contigs using CAP3 [12].
To automate the process of SNP prediction, we developed cSNPer—a new program to detect SNPs. cSNPer reads the ACE file generated by CAP3 to identify candidate SNPs. To calculate a Neighborhood Quality Standard (NQS), the
SNP identification
A large number of studies have focused on investigating genetic polymorphisms in individual genes in order to estimate the genetic contribution to the disease outcome. ESTs have been used to mine SNPs in several model organisms, including parasites in a limited manner [20], [21]. In this paper, we describe the use of 107,417, representing the near complete transcriptome of S. mansoni, ESTs from cDNA libraries of different stages of S. mansoni to identify SNPs. A summary of the results is in
Acknowledgements
This work was partially funded by NIH-Fogarty training grant 5D43TW007012-03 and FAPEMIG grants 17001/01 and 407/02 to G.O. MS received financial support by funds from a NIH Fogarty Training grants (5D43TW006580-05) and CNPq.
References (36)
- et al.
Extreme geographical fixation of variation in the Plasmodium falciparum gamete surface protein gene Pfs48/45 compared with microsatellite loci
Mol Biochem Parasitol
(2001) - et al.
Comparative protein modeling by satisfaction of spatial restraints
J Mol Biol
(1993) - et al.
Crystal structure of the wild-type human procathepsin B at 2.5. A resolution reveals the native active site of a papain-like cysteine protease zymogen
J Mol Biol
(1997) - et al.
Mining single nucleotide polymorphisms from EST data of silkworm, Bombyx mori, inbred strain Dazao
Insect Biochem Mol Biol
(2004) - et al.
Gene polymorphisms of Plasmodium falciparum merozoite surface protein 4 and 5
Mol Biochem Parasitol
(2005) - et al.
Functional expression and characterization of Schistosoma mansoni cathepsin B and its trans-activation by an endogenous asparaginyl endopeptidase
Mol Biochem Parasitol
(2003) - et al.
Long-term suppression of cathepsin B levels by RNA interference retards schistosome growth
Mol Biochem Parasitol
(2005) - et al.
Populational structure of Schistosoma mansoni assessed by DNA microsatellites
Int J Parasitol
(2002) - et al.
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms
Nature
(2001) - et al.
Transcriptome analysis of the acoelomate human parasite Schistosoma mansoni
Nat Genet
(2003)
Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries
Mem Inst Oswaldo Cruz
Mining single-nucleotide polymorphisms from hexaploid wheat ESTs
Genome
Large-scale validation of single nucleotide polymorphisms in gene regions
Genome Res
Single nucleotide polymorphisms associated with rat expressed sequences
Genome Res
Morbidity in schistosomiasis: an update
Curr Opin Infect Dis
Molecular ecology of parasites: elucidating ecological and microevolutionary processes
Mol Ecol
Basecalling of automated sequencer traces using phred. I. Accuracy assessment
Genome Res
Base calling of automated sequencer traces using phred. II. Error probabilities
Genome Res
Cited by (18)
Low allelic diversity in vaccine candidates genes from different locations sustain hope for Fasciola hepatica immunization
2018, Veterinary ParasitologyCitation Excerpt :To date, investigation of vaccine candidate antigen variability in trematodes has only been previously examined in schistosomes (Gleichsner et al., 2015). Consistent with our study of F. hepatica, analysis of the cathepsin B sequence in Schistosoma mansoni also identified low levels of that most probably will not impact on protein-antibody interaction and binding (Simões et al., 2007). Studies of candidate antigens tetraspanins both in S. mansoni and S. japonicum show more variability, however it should be taken into account that these membrane proteins constitute a large gene family, so it is not clear if the variability is allelic or due to multiple genes, similarly to what we found in FhCL1 (Cupit et al., 2011; Young et al., 2015; Zhang et al., 2011).
Sesquiterpenes effects on DNA of Schistosoma mansoni after in vivo treatment
2018, Gene ReportsCitation Excerpt :According to Clark and Lanigan (1993), genetic polymorphism are detected by the presence of amplified fragments with differences in the DNA sequence, which may be caused by a change in the nucleotide sequence, either by mutations, deletions or replacement of a single base between conserved sites of specific loci. Previous authors have described that the responses to therapy are usually associated with biochemical and biological factors such as genetics, drug absorption, drug transport, drug metabolism and drug elimination or excretion, in the definitive host (Sato et al., 2004; Simões et al., 2007; Manthena et al., 2017). However, current data on genetic polymorphisms in S. mansoni, caused mainly to an artificial selection due to drug pressure and detected and analysed by the RAPD-PCR protocol are still scarce in the literature, and nevertheless important considering the fact that this protocol can be used for S. mansoni and other Schistosoma parasites, both in the field or in the laboratory.
From parasite genomes to one healthy world: Are we having fun yet?
2009, Veterinary ParasitologyEfficient genotyping of Schistosoma mansoni miracidia following whole genome amplification
2009, Molecular and Biochemical ParasitologyBrazilian studies on the genetics of Schistosoma mansoni
2008, Acta Tropica