ECgene: Genome-based EST clustering and gene modeling for alternative splicing

  1. Namshin Kim1,2,
  2. Seokmin Shin2, and
  3. Sanghyuk Lee1,3
  1. 1 Division of Molecular Life Sciences, Ewha Womans University, Seoul 120-750, Korea
  2. 2 School of Chemistry, Seoul National University, Seoul 151-747, Korea

Abstract

With the availability of the human genome map and fast algorithms for sequence alignment, genome-based EST clustering became a viable method for gene modeling. We developed a novel gene-modeling method, ECgene (Gene modeling by EST Clustering), which combines genome-based EST clustering and the transcript assembly procedure in a coherent and consistent fashion. Specifically, ECgene takes alternative splicing events into consideration. The position of splice sites (i.e., exon–intron boundaries) in the genome map is utilized as the critical information in the whole procedure. Sequences that share any splice sites are grouped together to define an EST cluster in a manner similar to that of the genome-based version of the UniGene algorithm. Transcript assembly is achieved using graph theory that represents the exon connectivity in each cluster as a directed acyclic graph (DAG). Distinct paths along exons correspond to possible gene models encompassing all alternative splicing events. EST sequences in each cluster are subclustered further according to the compatibility with gene structure of each splice variant, and they can be regarded as clone evidence for the corresponding isoform. The reliability of each isoform is assessed from the nature of cluster members and from the minimum number of clones required to reconstruct all exons in the transcript.

Footnotes

  • [Supplemental material is available online at www.genome.org. Gene models from genome-wide analyses for the human, mouse, and rat genomes are available at the ECgene Web site (http://genome.ewha.ac.kr/ECgene) or may be viewed through the UCSC genome browser.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3030405.

  • 3 Corresponding author. E-mail sanghyuk{at}ewha.ac.kr; fax 82-2-3277-2384.

    • Accepted January 11, 2005.
    • Received July 20, 2004.
| Table of Contents

Preprint Server