Trends in Biochemical Sciences
ReviewArrangements in the modular evolution of proteins
Section snippets
Domains are the modules of proteins
In engineering, systems are built by combining smaller, independent parts or modules, in such a way that they might be reused in various systems in varying contexts. The augmentation or exclusion of modules to different combinations enables new complex tasks to be fulfilled by previously approved modules. Similarly, nature tends to reuse instead of reinvent while being more opportunistic; it is this modularity that provides a set of reusable parts that expedite the speed with which biological
Domain combinations and rearrangements
The concept that multidomain proteins are created through rearrangements between domains was described over 30 years ago 7, 29, 30. The development of domain databases has facilitated the analysis of genomic data and has led to the definition of domain families delimited either by structure or evolutionary heritage 8, 10. Typically, a domain family consists of small proteins or fragments of larger proteins and most proteins contain more than one domain 31, 32, 33.
The organization of domains
The origin of domain rearrangements
The genomic events that govern domain rearrangements can be effective on various levels ranging from simple point mutations to large-scale chromosomal mutations. For example, short intragenic duplications can be formed during replication through slippage of the DNA polymerase [44], whereas crossing-over events might facilitate larger, intergenic repeats [45]. Other fundamental mechanisms might involve DNA-strand breakage and repair or transposition [45]. In plants, for example, exons have been
Exons and the evolution of domain architectures
Beyond the fusion and fission of genes following duplication, exon shuffling has the potential to create new domain combinations. In fact, as early as 1978, Gilbert [61] proposed that new proteins could arise by the shuffling of domain-coding regions. One process in which exon shuffling has been particularly important is the extracellular communication in metazoa [62]. One would expect a correlation between exon and domain boundaries if exon shuffling has been a major factor in domain
Orphan domains and unassigned regions
For many years, analysis of domain assignments has been used to extract information about functional aspects of protein sequences. Often, the basis for domain detection is sensitive position-specific scoring matrices or hidden Markov models 65, 66. Today, such methods can assign approximately half of the proteome in terms of residues into discrete domains (Box 1). One possible method to increase the coverage of domain assignments is to enable assignments of less characterized domains, such as
Concluding remarks and future perspectives
It is interesting to contemplate the evolution of proteins in terms of rearrangements of modular units. Although domain rearrangement events often seem to occur at the protein termini, much of what can be described at the level of domain-wise rearrangements has been difficult to explain in terms of the genetic mechanisms that are involved. For example, the intriguing uniformity with which some repeats appear could be a hint to some, not yet fully understood, mechanism at the DNA level. In
Acknowledgements
The authors would like to acknowledge Sabine Ivison for helpful comments on the manuscript. This work was supported by grants to A.E. from the Swedish Natural Sciences Research Council, SSF (the Foundation for Strategic Research) and the EU 6’th Framework Program is gratefully acknowledged for support to the GeneFun project, contract No: LSHG-CT-2004–503567. E.B.B. and A.D.M. acknowledge support by the DFG (Deutsche Forschungs Gemeinschaft) through grant BO 2544/2–1.
Glossary
- CATH
- a database with semi-automatic classification of protein-domain structures. It clusters proteins at four major levels: Class (C), Architecture (A), Topology (T) and Homologous superfamily (H).
- Clade
- a taxonomic group with species that have descended from a common ancestor.
- Disordered or unstructured region
- a part of a protein that does not fold into α-helices or β-sheets. These regions often contain a high proportion of charged and polar amino acids.
- Domain arrangement (DA), domain combination
References (76)
Domain rearrangements in protein evolution
J. Mol. Biol.
(2005)Structural and functional definition of the human chitinase chitin-binding domain
J. Biol. Chem.
(2000)Quantification of the elevated rate of domain rearrangements in metazoa
J. Mol. Biol.
(2007)Comparative genomics and protein domain graph analyses link ubiquitination and RNA metabolism
J. Mol. Biol.
(2006)- et al.
Using functional domain composition and support vector machines for prediction of protein subcellular location
J. Biol. Chem.
(2002) Automatic transcription factor classifier based on functional domain composition
Biochem. Biophys. Res. Commun.
(2006)Comparative genomics and structural biology of the molecular innovations of eukaryotes
Curr. Opin. Struct. Biol.
(2006)Structure, function and evolution of multidomain proteins
Curr. Opin. Struct. Biol.
(2004)Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions
J. Mol. Biol.
(2005)Evolution of the proteases of blood coagulation and fibrinolysis by assembly from modules
Cell
(1985)
Domain combinations in archaeal, eubacterial and eukaryotic proteomes
J. Mol. Biol.
Supra-domains: evolutionary units larger than single protein domains
J. Mol. Biol.
The relationship between domain duplication and recombination
J. Mol. Biol.
Transposable elements, gene creation and genome rearrangement in flowering plants
Curr. Opin. Genet. Dev.
Evolution by gene duplication: an update
Trends Ecol. Evol.
Gene complexity and gene duplicability
Curr. Biol.
Relative rates of gene fusion and fission in multi-domain proteins
Trends Genet.
Modeling the evolution of protein domain architectures using maximum parsimony
J. Mol. Biol.
Protein domains correlate strongly with exons in multiple eukaryotic genomes–evidence of exon shuffling?
Trends Genet.
Are non-functional, unfolded proteins (‘junk proteins’) common in the genome?
FEBS Lett.
Structural diversity of domain superfamilies in the CATH database
J. Mol. Biol.
From molecular to modular cell biology
Nature
Recombinatoric exploration of novel folded structures: a heteropolymer-based model of protein evolutionary landscapes
Proc. Natl. Acad. Sci. U. S. A.
More than the sum of their parts: on the evolution of proteins from peptides
Bioessays
Robustness and Evolvability in Living Systems
The evolution of domain arrangements in proteins and interaction networks
Cell. Mol. Life Sci.
Chemical and biological evolution of nucleotide-binding protein
Nature
SCOP database in 2004: refinements integrate structure and sequence family data
Nucleic Acids Res.
A unifold, mesofold, and superfold model of protein fold use
Proteins
Pfam: clans, web tools and services
Nucleic Acids Res.
A tree of life based on protein domain organizations
Mol. Biol. Evol.
Prokaryotic phylogenies inferred from protein structural domains
Genome Res.
Global phylogeny determined by the combination of protein domains in proteomes
Mol. Biol. Evol.
Phylogeny determined by protein domain content
Proc. Natl. Acad. Sci. U. S. A.
The Crohn's disease susceptibility gene DLG5 as a member of the CARD interaction network
J. Mol. Med.
Identification of genomic features using microsyntenies of domains: domain teams
Genome Res.
InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes
Nucleic Acids Res.
An integrated approach to the prediction of domain–domain interactions
BMC Bioinformatics
Cited by (0)
- *
Authors contributed equally to this article.