Sequence-based estimation of minisatellite and microsatellite repeat variability

  1. Matthieu Legendre1,4,
  2. Nathalie Pochet1,2,4,
  3. Theodore Pak1, and
  4. Kevin J. Verstrepen1,3,5
  1. 1 FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts 02138, USA;
  2. 2 Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;
  3. 3 Centre of Microbial and Plant Genetics, Department of Molecular and Microbial Systems, Katholieke Universiteit Leuven, Faculty of Applied Bioscience and Engineering, B-3001 Leuven (Heverlee), Belgium
  1. 4 These authors contributed equally to this work.

Abstract

Variable tandem repeats are frequently used for genetic mapping, genotyping, and forensics studies. Moreover, variation in some repeats underlies rapidly evolving traits or certain diseases. However, mutation rates vary greatly from repeat to repeat, and as a consequence, not all tandem repeats are suitable genetic markers or interesting unstable genetic modules. We developed a model, “SERV,” that predicts the variability of a broad range of tandem repeats in a wide range of organisms. The nonlinear model uses three basic characteristics of the repeat (number of repeated units, unit length, and purity) to produce a numeric “VARscore” that correlates with repeat variability. SERV was experimentally validated using a large set of different artificial repeats located in the Saccharomyces cerevisiae URA3 gene. Further in silico analysis shows that SERV outperforms existing models and accurately predicts repeat variability in bacteria and eukaryotes, including plants and humans. Using SERV, we demonstrate significant enrichment of variable repeats within human genes involved in transcriptional regulation, chromatin remodeling, morphogenesis, and neurogenesis. Moreover, SERV allows identification of known and candidate genes involved in repeat-based diseases. In addition, we demonstrate the use of SERV for the selection and comparison of suitable variable repeats for genotyping and forensic purposes. Our analysis indicates that tandem repeats used for genotyping should have a VARscore between 1 and 3. SERV is publicly available from http://hulsweb1.cgr.harvard.edu/SERV/.

Footnotes

  • 5 Corresponding author.

    5 E-mail kverstrepen{at}cgr.harvard.edu; fax (617) 495-2196.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6554007

    • Received March 28, 2007.
    • Accepted August 29, 2007.
  • Freely available online through the Genome Research Open Access option.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server