Trends in Plant Science
Expressed sequence tags: alternative or complement to whole genome sequences?
Section snippets
Contemporary uses of ESTs
Plant genome sizes extend over at least four orders of magnitude. Arabidopsis and Oryza sativa (rice), our model plants with fully sequenced genomes, have among the smallest known genomes: 125 Mbp and 430 Mbp, respectively. Tomato has a genome size of ∼950 Mbp [6] and maize has a genome size of ∼2670 Mbp. Cycad and wheat have genome sizes of ∼14 000 Mbp and ∼17 000 Mbp, respectively. The largest known genomes are currently those of Fritillaria assyriaca (125 000 Mbp) and Psilotum nudum
EST sequence availability and biodiversity
With the latest release of the EMBL sequence database [18] and the weekly updates to the EST database (ftp://ftp.ebi.ac.uk/pub/databases/embl/new/), there were ∼16.1 million ESTs available within the public domain by 14 April 2003. Of these, over 3.1 million are from plant species and account for 1550 Mbp sequence, with almost 200 species represented. Table 1 lists the plant species with most available ESTs ranked by the number of ESTs.
When we consider the overall biodiversity represented
ESTs and their limitations
There are two main problems associated with EST sequences: (1) the overall representation of host genes within a library and (2) the overall quality of any individual sequence within a collection.
Bioinformatics of plant EST collections
Bioinformatics-based sequence resources have been developed that address the quality, redundancy and partial nature of EST sequences. Sequence resources such as the dbEST database [4] and the EMBL database [18] archive all the available ESTs and provide methods to search for individual sequences on the basis of species, clone or homology attributes. However, these searches are limited to the sequence features that are supplied when the sequence is submitted.
A range of plant specific EST
ESTs as a current alternative to complete genomes
Within the field of ‘reconstructomics’ [30], ESTs have widely been applied as the foundation sequence of some genome-scale analyses. Such reconstructomic analyses use the EST cluster assemblies and singletons as an equivalent to a whole genome's gene collection. EST derived cluster sequences have been widely annotated with tentative functions. Sources of functional annotation have included non-redundant protein databases [31], the Arabidopsis genome annotation [6] and catalogues of functionally
ESTs as a complement to complete genomes
Complete genome sequences have been produced for Arabidopsis [33] and rice 34, 38. The complete genome scaffolds for Zea mays, Medicago truncatula, Brassica napus and Populus are either within the sequencing or preparation stages and other plant genomes will follow. ESTs really spring into the limelight when we are presented with a new complete genome sequence and wish to start annotating genes to the chromosomes. Although the underlying methods and science required for the detection and
New tricks with old sequences
It is only recently that plant biologists have taken these vast EST datasets in hand and started a concerted effort to mine the data for novel attributes, started de novo annotation of the sequences, used the sequences within proteomics-based analysis pipelines and exploited the sequences for molecular marker development. There has recently been much interest in the field of expression profiling. By clustering and relating genes on the basis of their expression patterns, genes can be identified
Final comment
As long as ESTs continue to be actively sequenced to fill in knowledge gaps from the gene complement of the large plant genomes, our potential knowledge bases will continue to grow. EST sequencing certainly avoids the biggest problems associated with genome size and the accompanying retrotransposon repetitiveness. The EST sequence resources have been shown to have a wide range of applications and novel uses have been found for the resources. There are, however, some fundamental limitations to
Acknowledgements
Thanks to Heiko Schoof and Wojciech Karlowski for critical appraisal of the manuscript. I am funded within the GABI project by the BMBF (0312270/4).
References (56)
- et al.
How can we deliver the large plant genomes? Strategies and perspectives
Curr. Opin. Plant Biol.
(2002) Deciding among green plants for whole genome studies
Trends Plant Sci.
(2002)- et al.
Moloney murine leukemia reverse transcriptase suspect in the production of multiple misincorporations during hprt cDNA synthesis
Mutat. Res.
(1997) A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing
Nature
(1983)The human genome: the nature of the enterprise
CIBA Found. Symp.
(1990)Complementary DNA sequencing: expressed sequence tags and human genome project
Science
(1991)dbEST – database for ‘expressed sequence tags’
Nat. Genet.
(1993)Hybridization fingerprinting of high-density cDNA-library arrays with cDNA pools derived from whole tissues
Mamm. Genome
(1992)Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing
Plant Cell
(2002)Nuclear DNA C-values in 30 species double the familial representation in pteridophytes
Ann. Bot.
(2002)
Maize as a model for the evolution of plant nuclear genomes
Proc. Natl. Acad. Sci. U. S. A.
Comparative genome organization in plants: from sequence and markers to chromatin and chromosomes
Plant Cell
Mechanisms and rates of genome expansion and contraction in flowering plants
Genetica
The gene distribution of the maize genome
Proc. Natl. Acad. Sci. U. S. A.
The distribution of genes in the genomes of Gramineae
Proc. Natl. Acad. Sci. U. S. A.
Assembly of the working draft of the human genome with GigAssembler
Genome Res.
Somatic and germinal mobility of the RescueMu transposon in transgenic maize
Plant Cell
Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome
Nat. Genet.
The EMBL nucleotide sequence database: major new developments
Nucleic Acids Res.
Plant systematics in the age of genomics
Plant Physiol.
Construction of a ‘unigene’ cDNA clone set by oligonucleotide fingerprinting allows access to 25 000 potential sugar beet genes
Plant J.
Normalization and subtraction: two approaches to facilitate gene discovery
Genome Res.
Generation and analysis of 280,000 human expressed sequence tags
Genome Res.
Base-calling of automated sequencer traces using phred. II. Error probabilities
Genome Res.
High-efficiency cloning of Arabidopsis full-length cDNA by biotinylated CAP trapper
Plant J.
The hashed position tree (HPT): a suffix tree variant for large data sets stored on slow mass storage devices
Consed: a graphical tool for sequence finishing
Genome Res.
Cited by (245)
Harnessing the potential of modern omics approaches to study plant biotic and abiotic stresses
2021, Plant Perspectives to Global Climate Changes: Developing Climate-Resilient PlantsDevelopment of Transcriptome Analysis Methods
2020, Comprehensive FoodomicsBioinformatics as a tool to counter climate change: Challenges and prospects
2019, Climate Change and Agricultural Ecosystems: Current Challenges and AdaptationMicrosatellite markers of finger millet (Eleusine coracana (L.) Gaertn) and foxtail millet (Setaria italica (L.) Beauv) provide resources for cross-genome transferability and genetic diversity analyses in other millets
2018, Biocatalysis and Agricultural BiotechnologyCitation Excerpt :It may be due to the fact that the EST- SSR markers are derived from expressed regions and therefore they are more conserved across a number of related species than non-coding regions (Varshney et al., 2005). Rudd (2003) observed that EST-SSR markers are more conserved and have a higher cross-genome amplification than gSSR markers. But in sorghum, EST-SSR markers produced lesser percentage of transferability across a range of closely related genus (wheat, rice, maize, durum wheat, finger millet, Cynodon and Paspalum) than gSSR markers (Wang et al., 2005).
Metabolic pathway responsive gene encoding enzyme anchored EST–SSR markers based genetic and population assessment among Capsicum accessions
2023, Genetic Resources and Crop Evolution