Multiple sequence alignment with Clustal X

https://doi.org/10.1016/S0968-0004(98)01285-7Get rights and content

Section snippets

A history of the Clustal programs

Clustal programs have undergone continual development for over ten years, so the versions available do not all give the same results. This can be confusing to new users; we therefore felt that a short history of Clustal development would help to clarify matters.

The first Clustal program2, 3, written by Des Higgins in 1988, was designed to perform efficient alignment on PCs, which then had feeble computing power by today's standards. It harnessed a memory-efficient, recursive alignment algorithm

Getting started with Clustal X

The Clustal W and Clustal X programs have self-explanatory layouts, and on-line help is available, so that using the programs should not be difficult. For inexperienced users, the chief hurdle seems to be getting the program to read their sequences. Sequences must be collected into a single file in a format that Clustal can read. The simplest format is FASTA format, but the EMBL and SWISS-PROT database formats can be read directly. Usually, the set of sequences will be exported from some other

When and how to use Clustal X—and when not to!

The wide usage of Clustal W and X might seem to imply that they always align sequences well. In fact, this is not always so. The alignment algorithm has been optimized to align sets of sequences that are entirely collinear—that is, the sequences have the same protein domains, and these domains are in the same order. If this condition is not met (and it often is not), Clustal X can produce serious misalignments. The user must give a little thought to the nature of the sequence set.

When to use Clustal X

The Clustal X program can be used to align any group of protein or nucleic acid sequences that are related to each other over their entire lengths. However, some problems can still be encountered.

The sequences do not share common ancestry

This is attempted surprisingly often—mostly, but not always, by accident. (This most often happens to us when we extract a set of sequences by using a keyword search.)

The sequences have large, variable, N- and C-terminal overhangs (e.g. kinesins)

The unconserved termini must be removed, or the `Use Negative Matrix' option must be invoked—otherwise a completely false alignment can result.

The sequences are partially related

Multidomain proteins that have complex evolutionary histories often share some, but not all, of the domain set. The aligments produced in these cases can be unpredictable.

The sequences including short non-overlapping fragments

Sometimes, people

Conclusion

In this article, we provide some guidance that we hope will prove useful to Clustal users. In the not-too-distant future, progressive alignment—the dominant strategy for the last ten years—will probably be rendered obsolete. Iterative alignment strategies, such as PRRP[19] and SAGA[20], are reported to perform as well as, or better than, Clustal X for small numbers of sequences but are currently still too slow to handle large datasets. More-efficient iterative strategies, harnessed to

Acknowledgements

Clustal development has been supported by Trinity College, Dublin, the EMBL and ICGEB, Strasbourg. Current Clustal development is supported by funds from INSERM, CNRS and Ministère de l'âducation nationale de la Recherche et de la Technologie and the EMBL. We thank former developers for their input and the Clustal users for their informative feedback.

First page preview

First page preview
Click to open first page preview

References (20)

  • D.G. Higgins et al.

    Gene

    (1988)
  • G. Perriere et al.

    Biochimie

    (1996)
  • O. Gotoh

    J. Mol. Biol.

    (1996)
  • J.D. Thompson

    Nucleic Acids Res.

    (1997)
  • D.G. Higgins et al.

    Comput. Appl. Biosci.

    (1989)
  • E.W. Myers et al.

    Comput. Appl. Biosci.

    (1988)
  • D.F. Feng et al.

    J. Mol. Evol.

    (1987)
  • W.R. Taylor

    J. Mol. Evol.

    (1988)
  • W.J. Wilbur et al.

    Proc. Natl. Acad. Sci. U. S. A.

    (1983)
  • Sneath, P. H. A. and Sokal, R. R. (1973) in Numerical Taxonomy (pp. 230–234), W. H....
There are more references available in the full text version of this article.

Cited by (0)

View full text