Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Comprehensive cluster analysis with Transitivity Clustering

Abstract

Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes 1 h to complete.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of the Transitivity Clustering functionalities and user interfaces.
Figure 2: Cytoscape plug-ins.
Figure 3: Intra- versus inter-cluster similarity distributions.
Figure 4: Stand-alone software user interface.

Similar content being viewed by others

References

  1. Enright, A.J., Kunin, V. & Ouzounis, C.A. Protein families and TRIBES in genome sequence space. Nucleic Acids Res. 31, 4632–4638 (2003).

    Article  CAS  Google Scholar 

  2. Enright, A.J., Van Dongen, S. & Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).

    Article  CAS  Google Scholar 

  3. Krause, A., Stoye, J. & Vingron, M. Large scale hierarchical clustering of protein sequences. BMC Bioinformatics 6, 15 (2005).

    Article  Google Scholar 

  4. Enright, A.J. & Ouzounis, C.A. GeneRAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16, 451–457 (2000).

    Article  CAS  Google Scholar 

  5. Paccanaro, A., Casbon, J.A. & Saqi, M.A. Spectral clustering of protein sequences. Nucleic Acids Res. 34, 1571–1580 (2006).

    Article  CAS  Google Scholar 

  6. Frey, B.J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).

    Article  CAS  Google Scholar 

  7. Wittkop, T. et al. Partitioning biological data with transitivity clustering. Nat. Methods 7, 419–420 (2010).

    Article  CAS  Google Scholar 

  8. Cline, M.S. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).

    Article  CAS  Google Scholar 

  9. Wittkop, T. Transitivity Clustering: Clustering Biological Data by Unraveling Hidden Transitive Substructures, 148 (Suedwestdeutscher Verlag fuer Hochschulschriften, 2010).

  10. Rahmann, S. et al. Exact and heuristic algorithms for weighted cluster editing. Comput. Syst. Bioinformatics Conf. 6, 391–401 (2007).

    Article  Google Scholar 

  11. Wittkop, T., Baumbach, J., Lobo, F.P. & Rahmann, S. Large scale clustering of protein sequences with FORCE—a layout based heuristic for weighted cluster editing. BMC Bioinformatics 8, 396 (2007).

    Article  Google Scholar 

  12. Böcker, S., Briesemeister, S. & Klau, G.W. Exact algorithms for cluster editing: evaluation and experiments. Algorithmica (in press) (2009).

  13. Brown, S.D., Gerlt, J.A., Seffernick, J.L. & Babbitt, P.C. A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol. 7, R8 (2006).

    Article  Google Scholar 

  14. Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).

    CAS  Google Scholar 

  15. Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91–118 (2003).

    Article  Google Scholar 

  16. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

J.B. thanks the German Academic Exchange Service (DAAD) for funding his work at ICSI, Berkeley. J.B. and M.A. are grateful for support from the German Research Foundation (DFG)-funded Cluster of Excellence for Multimodal Computing and Interaction. D.E. and M.A. received funding from the German National Genome Research Network. T.W. gained financial support through NIH grant NIH R01 LM009722 and the Buck Trust.

Author information

Authors and Affiliations

Authors

Contributions

T.W. and J.B. collected data, and tested and wrote Step 2A. D.E. and M.A. prepared Step 2B. A.T. and S.B. were responsible for Step 2C. All authors contributed to the preparation and proofreading of all other parts of the manuscript.

Corresponding author

Correspondence to Jan Baumbach.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1: Cytoscape Plug-in Blast2SimilarityGraph

The screenshot shows the user interface of the Blast2Similarity Cytoscape plug-in together with the similarity network after importing the gold standard proteins as described in Option 2A, step v. (TIFF 1991 kb)

Supplementary Fig. 2: Cytoscape Plug-in ClusterExplorer

The screenshot shows the user interface of the ClusterExplorer Cytoscape plug-in together with the inter/intra similarity distribution for the gold standard family assignments obtained after step viii, Option 2A. (TIFF 704 kb)

Supplementary Fig. 3: Cytoscape Plug-in TransClust

The screenshot shows the user interface of the TransClust Cytoscape plug-in including the results window from the density threshold determination and the gold standard network clustered with a threshold of 57, as described in step xi, Option 2A. (TIFF 595 kb)

Supplementary Data 1: Brown et al. all-vs-all VOC subset BLAST

The result file of an all-vs-all BLAST of the 133 protein sequences of the vicinal oxygen chelate (VOC) superfamily from the Brown et al. gold standard. We used an E-value cutoff of 100 and the “-m 8” option of BLAST for table-based output. (TXT 769 kb)

Supplementary Data 2: Brown et al. VOC subset FASTA

The 133 protein sequences of the VOC superfamily from the Brown et al. gold standard in FASTA format. (TXT 41 kb)

Supplementary Data 3: Brown et al. VOC subset family assignment

Tab-delimited flat file containing the family assignments for the 133 proteins of the VOC superfamily from the Brown et al. gold standard. (TXT 5 kb)

Supplementary Data 4: Brown et al. all-vs-all BLAST (TXT 8408 kb)

Supplementary Data 5: Brown et al. FASTA (TXT 365 kb)

Supplementary Data 6

Tab-delimited flat file containing the family assignments for the 866 proteins of the Brown et al. gold standard. (TXT 24 kb)

Supplementary Data 7: Brown et al. subset family pre-assignment (TXT 84 kb)

Supplementary Data 8: Gene expression data file

This file contains an expression matrix of 38 bone marrow samples from acute leukemia patients with 999 monitored genes14 processed with a Human Genome HU6800 Affymetrix microarray. (TXT 448 kb)

Supplementary Data 9: Gene expression gold standard file

This file contains the gold standard for the "Leukemia" dataset of Supplementary File 8. The 38 samples are classified in 11 cases of acute myeloid leukemia (AML), 8 of T-lineage acute lymphoblastic leukemia (T-ALL), and 19 of B-lineage ALL (B-ALL). (TXT 0 kb)

Supplementary information (ZIP 2709 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wittkop, T., Emig, D., Truss, A. et al. Comprehensive cluster analysis with Transitivity Clustering. Nat Protoc 6, 285–295 (2011). https://doi.org/10.1038/nprot.2010.197

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2010.197

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing