Trends in Genetics
ReviewBioinformatics challenges of new sequencing technology
Section snippets
New technologies: more data and new types of data
The ongoing revolution in sequencing technology has led to the production of sequencing machines with dramatically lower costs and higher throughput than the technology of just 2 years ago. Sequencers from 454 Life Sciences/Roche, Solexa/Illumina and Applied Biosystems (SOLiD technology) are already in production, and a competing technology from Helicos should appear soon. However, the increase in the volume of raw sequence that can be produced from these sequencers is threatening to swamp our
Sequence assembly using SRS technology
The development of automated sequencing technologies has revolutionized biological research by allowing scientists to decode the genomes of many organisms. SRS technologies can accelerate the pace at which we explore the natural world, yet pose new challenges to the software tools used to reconstruct genetic information from the raw data produced by sequencing machines.
Genome resequencing
The near completion of a reference human genome has greatly accelerated research on genetic diversity within our species. Resequencing efforts have thus far targeted individual genes or other genomic regions of interest [3], but advances in SRS technologies have opened up the possibility of whole genome resequencing. The resequencing of multiple strains of several model organisms (e.g. Drosophila melanogaster and Caenorhabditis elegans) and the large-scale resequencing of human cancers are
Assembly of closely related species – mind the gap
Genome scientists have sequenced and assembled the human genome, most model organisms, and almost all major human pathogens to high degrees of accuracy. Many of these genomes – particularly the bacterial and viral species – have been finished, meaning that all chromosomes are sequenced end-to-end with no gaps. Almost as soon as the first genome from each species was published, scientists started to make plans to sequence additional strains and isolates. The dramatically lower cost of sequencing
De novo assembly
Despite a dramatic increase in the number of complete genome sequences available in public databases, the vast majority of the biological diversity in our world remains unexplored. SRS technologies have the potential to significantly accelerate the sequencing of new organisms. De novo assembly of SRS data, however, will require the development of new software tools that can overcome the technical limitations of these technologies. An overview of genome assembly is provided in Box 1.
Studies by
Annotation of genomes sequenced with SRS technology
The highly fragmented assemblies resulting from SRS projects present several problems for genome annotation. The use of SRS technology is so new that few methods have been published describing how current annotation methods can be adapted to account for the various types of sequencing errors that might be present in a genome sequenced with the newer technology.
We can expect that the annotation of genomes sequenced by the new technologies will be reasonably accurate for genes that are found in
Sequencing of transcripts and regulatory elements
The sequencing of transcribed gene products [expressed sequence tags (ESTs)] has long been a vital tool for the characterization of genes in the human genome and other species. EST sequencing also has an important role in the characterization of splice variants and the identification of regulatory signals in a genome—tasks that are not effectively performed through computational means alone. Transcriptome and regulome sequencing projects have been, perhaps, the most successful application of
Annotation of metagenomics projects
One of the most promising applications of SRS technologies is sequencing of environmental samples, also known as metagenomics. In these projects, DNA is purified from an environment such as soil, water or part of the human body, and the mixture of species is sequenced using a random shotgun technique. The resulting reads might originate from hundreds or even thousands of different species, presenting a much greater assembly challenge than a single genome sequencing project.
Currently,
Concluding remarks
Fifteen years of research have shown that, for DNA sequencing technology, longer is better, especially where genome assembly is involved. Someday, perhaps, we will be able to isolate a single chromosome and read it end to end, eliminating the assembly step entirely. At present, however, new short read sequencing (SRS) technologies can sequence so rapidly and so cheaply, that it is clear that SRS is here to stay. Despite their limitations, these still-evolving technologies can replace Sanger
References (52)
The impact of next generation sequencing technology on genetics
Trends Genet.
(2008)Basic local alignment search tool
J. Mol. Biol.
(1990)High-resolution profiling of histone methylations in the human genome
Cell
(2007)A tool for analyzing and annotating genomic sequences
Genomics
(1997)Celebrity genomes alarm researchers
Nature
(2007)Efficient high-throughput resequencing of genomic DNA
Genome Res.
(2003)BLAT – the BLAST-like alignment tool
Genome Res.
(2002)Multigene amplification and massively parallel sequencing for cancer mutation discovery
Proc. Natl. Acad. Sci. U. S. A.
(2007)Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing
Plant Physiol.
(2007)PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data
BMC Bioinform.
(2006)
Paired-end mapping reveals extensive structural variation in the human genome
Science
Microbial diversity in the deep sea and the underexplored ‘rare biosphere’
Proc. Natl. Acad. Sci. U. S. A.
Accuracy and quality of massively parallel DNA pyrosequencing
Genome Biol.
Rapid and accurate pyrosequencing of angiosperm plastid genomes
BMC Plant Biol.
PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing
Nucleic Acids Res.
PolyScan: an automatic indel and SNP detection approach to the analysis of human resequencing data
Genome Res.
Tuberculosis. Few mutations divide some drug-resistant TB strains
Science
An analysis of the feasibility of short read sequencing
Nucleic Acids Res.
Comparative genome assembly
Brief. Bioinform.
Fragment assembly with short reads
Bioinformatics
Whole-genome sequencing and assembly with high-throughput, short-read technologies
PLoS ONE
Genome sequence of a clinical isolate of Campylobacter jejuni from Thailand
Infect. Immun.
Finding novel genes in bacterial communities isolated from the environment
Bioinformatics
Assembling millions of short DNA sequences using SSAKE
Bioinformatics
Extending assembly of short DNA sequences to handle error
Bioinformatics
SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing
Genome Res.
Cited by (364)
Perspective on the application of genome sequencing for monkeypox virus surveillance
2023, Virologica SinicaDetection, Identification, and Inactivation of Histamine-forming Bacteria in Seafood: A Mini-review
2023, Journal of Food ProtectionOverview of NGS platforms and technological advancements for forensic applications
2023, Next Generation Sequencing (NGS) Technology in DNA AnalysisThe history and organization of the Workshop on Population and Speciation Genomics
2023, Evolution: Education and OutreachFull-length transcriptome analysis provides insights into flavonoid biosynthesis in Ranunculus japonicus
2023, Physiologia Plantarum