Arrangements in the modular evolution of proteins

doi:10.1016/j.tibs.2008.05.008

Trends in Biochemical Sciences

Volume 33, Issue 9, September 2008, Pages 444-451

https://doi.org/10.1016/j.tibs.2008.05.008 Get rights and content

It has been known for the last couple of decades that proteins evolve partly through rearrangements of larger fragments, typically domains. These units are considered the basic modules of protein structure, evolution and function. In the last few years, the analysis of protein-domain rearrangements has provided us with functional and evolutionary insights and has aided improved functional predictions and domain assignments to previously uncharacterised genes and proteins. Although some mechanisms that govern modular rearrangements of protein domains have been uncovered, such as the addition or deletion of a single N- or C-terminal domain, much is still unknown about the genetics behind these arrangements.

Section snippets

Domains are the modules of proteins

In engineering, systems are built by combining smaller, independent parts or modules, in such a way that they might be reused in various systems in varying contexts. The augmentation or exclusion of modules to different combinations enables new complex tasks to be fulfilled by previously approved modules. Similarly, nature tends to reuse instead of reinvent while being more opportunistic; it is this modularity that provides a set of reusable parts that expedite the speed with which biological

Domain combinations and rearrangements

The concept that multidomain proteins are created through rearrangements between domains was described over 30 years ago 7, 29, 30. The development of domain databases has facilitated the analysis of genomic data and has led to the definition of domain families delimited either by structure or evolutionary heritage 8, 10. Typically, a domain family consists of small proteins or fragments of larger proteins and most proteins contain more than one domain 31, 32, 33.

The organization of domains

The origin of domain rearrangements

The genomic events that govern domain rearrangements can be effective on various levels ranging from simple point mutations to large-scale chromosomal mutations. For example, short intragenic duplications can be formed during replication through slippage of the DNA polymerase [44], whereas crossing-over events might facilitate larger, intergenic repeats [45]. Other fundamental mechanisms might involve DNA-strand breakage and repair or transposition [45]. In plants, for example, exons have been

Exons and the evolution of domain architectures

Beyond the fusion and fission of genes following duplication, exon shuffling has the potential to create new domain combinations. In fact, as early as 1978, Gilbert [61] proposed that new proteins could arise by the shuffling of domain-coding regions. One process in which exon shuffling has been particularly important is the extracellular communication in metazoa [62]. One would expect a correlation between exon and domain boundaries if exon shuffling has been a major factor in domain

Orphan domains and unassigned regions

For many years, analysis of domain assignments has been used to extract information about functional aspects of protein sequences. Often, the basis for domain detection is sensitive position-specific scoring matrices or hidden Markov models 65, 66. Today, such methods can assign approximately half of the proteome in terms of residues into discrete domains (Box 1). One possible method to increase the coverage of domain assignments is to enable assignments of less characterized domains, such as

Concluding remarks and future perspectives

It is interesting to contemplate the evolution of proteins in terms of rearrangements of modular units. Although domain rearrangement events often seem to occur at the protein termini, much of what can be described at the level of domain-wise rearrangements has been difficult to explain in terms of the genetic mechanisms that are involved. For example, the intriguing uniformity with which some repeats appear could be a hint to some, not yet fully understood, mechanism at the DNA level. In

Acknowledgements

The authors would like to acknowledge Sabine Ivison for helpful comments on the manuscript. This work was supported by grants to A.E. from the Swedish Natural Sciences Research Council, SSF (the Foundation for Strategic Research) and the EU 6’th Framework Program is gratefully acknowledged for support to the GeneFun project, contract No: LSHG-CT-2004–503567. E.B.B. and A.D.M. acknowledge support by the DFG (Deutsche Forschungs Gemeinschaft) through grant BO 2544/2–1.

Glossary

CATH: a database with semi-automatic classification of protein-domain structures. It clusters proteins at four major levels: Class (C), Architecture (A), Topology (T) and Homologous superfamily (H).
Clade: a taxonomic group with species that have descended from a common ancestor.
Disordered or unstructured region: a part of a protein that does not fold into α-helices or β-sheets. These regions often contain a high proportion of charged and polar amino acids.
Domain arrangement (DA), domain combination

References (76)

A.K. Björklund
Domain rearrangements in protein evolution
J. Mol. Biol.
(2005)
L.W. Tjoelker
Structural and functional definition of the human chitinase chitin-binding domain
J. Biol. Chem.
(2000)
D. Ekman
Quantification of the elevated rate of domain rearrangements in metazoa
J. Mol. Biol.
(2007)
J.I. Lucas
Comparative genomics and protein domain graph analyses link ubiquitination and RNA metabolism
J. Mol. Biol.
(2006)
K.C. Chou et al.
Using functional domain composition and support vector machines for prediction of protein subcellular location
J. Biol. Chem.
(2002)
Z. Qian
Automatic transcription factor classifier based on functional domain composition
Biochem. Biophys. Res. Commun.
(2006)
L. Aravind
Comparative genomics and structural biology of the molecular innovations of eukaryotes
Curr. Opin. Struct. Biol.
(2006)
C. Vogel
Structure, function and evolution of multidomain proteins
Curr. Opin. Struct. Biol.
(2004)
D. Ekman
Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions
J. Mol. Biol.
(2005)
L. Patthy
Evolution of the proteases of blood coagulation and fibrinolysis by assembly from modules
Cell
(1985)

An integrated approach to the prediction of domain–domain interactions

BMC Bioinformatics

(2006)

Cited by (0)

^*: Authors contributed equally to this article.

View full text

Trends in Biochemical Sciences

ReviewArrangements in the modular evolution of proteins

Section snippets

Domains are the modules of proteins

Domain combinations and rearrangements

The origin of domain rearrangements

Exons and the evolution of domain architectures

Orphan domains and unassigned regions

Concluding remarks and future perspectives

Acknowledgements

Glossary

J. Mol. Biol.

J. Biol. Chem.

J. Mol. Biol.

J. Mol. Biol.

J. Biol. Chem.

Biochem. Biophys. Res. Commun.

Curr. Opin. Struct. Biol.

Curr. Opin. Struct. Biol.

J. Mol. Biol.

Cell

J. Mol. Biol.

J. Mol. Biol.

J. Mol. Biol.

Curr. Opin. Genet. Dev.

Trends Ecol. Evol.

Curr. Biol.

Trends Genet.

J. Mol. Biol.

Trends Genet.

FEBS Lett.

J. Mol. Biol.

From molecular to modular cell biology

Nature

Recombinatoric exploration of novel folded structures: a heteropolymer-based model of protein evolutionary landscapes

Proc. Natl. Acad. Sci. U. S. A.

More than the sum of their parts: on the evolution of proteins from peptides

Bioessays

Robustness and Evolvability in Living Systems

The evolution of domain arrangements in proteins and interaction networks

Cell. Mol. Life Sci.

Chemical and biological evolution of nucleotide-binding protein

Nature

SCOP database in 2004: refinements integrate structure and sequence family data

Nucleic Acids Res.

A unifold, mesofold, and superfold model of protein fold use

Proteins

Pfam: clans, web tools and services

Nucleic Acids Res.

A tree of life based on protein domain organizations

Mol. Biol. Evol.

Prokaryotic phylogenies inferred from protein structural domains

Genome Res.

Global phylogeny determined by the combination of protein domains in proteomes

Mol. Biol. Evol.

Phylogeny determined by protein domain content

Proc. Natl. Acad. Sci. U. S. A.

The Crohn's disease susceptibility gene DLG5 as a member of the CARD interaction network

J. Mol. Med.

Identification of genomic features using microsyntenies of domains: domain teams

Genome Res.

InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes

Nucleic Acids Res.

An integrated approach to the prediction of domain–domain interactions

BMC Bioinformatics

Review
Arrangements in the modular evolution of proteins