Linear time algorithms for finding and representing all the tandem repeats in a string

https://doi.org/10.1016/j.jcss.2004.03.004Get rights and content
Under an Elsevier user license
open archive

Abstract

A tandem repeat (or square) is a string αα, where α is a non-empty string. We present an O(|S|)-time algorithm that operates on the suffix tree T(S) for a string S, finding and marking the endpoint in T(S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats.

Cited by (0)

1

Research partially supported by Grant DBI-9723346 from the National Science Foundation, and by Grant DE-FG03-90ER60999 from the Department of Energy.

2

Research supported by the German Academic Exchange Service (DAAD).

3

Present address: Universität Bielefeld, Technische Fakultät, 33594 Bielefeld, Germany.