Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks

https://doi.org/10.1016/j.jmb.2006.02.019Get rights and content

The structure of complex transcriptional regulatory networks has been studied extensively in certain model organisms. However, the evolutionary dynamics of these networks across organisms, which would reveal important principles of adaptive regulatory changes, are poorly understood. We use the known transcriptional regulatory network of Escherichia coli to analyse the conservation patterns of this network across 175 prokaryotic genomes, and predict components of the regulatory networks for these organisms. We observe that transcription factors are typically less conserved than their target genes and evolve independently of them, with different organisms evolving distinct repertoires of transcription factors responding to specific signals. We show that prokaryotic transcriptional regulatory networks have evolved principally through widespread tinkering of transcriptional interactions at the local level by embedding orthologous genes in different types of regulatory motifs. Different transcription factors have emerged independently as dominant regulatory hubs in various organisms, suggesting that they have convergently acquired similar network structures approximating a scale-free topology. We note that organisms with similar lifestyles across a wide phylogenetic range tend to conserve equivalent interactions and network motifs. Thus, organism-specific optimal network designs appear to have evolved due to selection for specific transcription factors and transcriptional interactions, allowing responses to prevalent environmental stimuli. The methods for biological network analysis introduced here can be applied generally to study other networks, and these predictions can be used to guide specific experiments.

Introduction

Of the several steps at which the flow of information from a gene to its protein product is controlled, regulation at the transcriptional level is a fundamental mechanism observed in all organisms. This form of regulation is typically mediated by a DNA-binding protein (transcription factor) that binds to target sites in the genome and, either singly or in combination with other factors, regulates the expression of one or more target genes. The sum total of such transcriptional interactions in an organism can be conceptualised as a network, and is termed the transcriptional regulatory network.1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 In such a network, nodes represent genes and edges represent regulatory interactions. Studies on the transcriptional regulatory network at an abstract level have shown that they have architectures resembling scale-free networks, with striking structural and topological similarity to other networks from biological and non-biological systems. They are characterized by the recurrence of small patterns of interconnections, called network motifs,3, 4, 5, 12, 13, 14, 15, 16 which were first defined in Escherichia coli,3 and were subsequently found in yeast and other organisms.2, 4, 12

Even though the general structural properties of transcriptional networks are well understood, there are several fundamental questions regarding the provenance and evolution of transcriptional regulatory networks that remain unanswered: What are the trends of conservation of transcription factors, target genes and regulatory interactions in the network? How do interactions that specify topologically equivalent motifs within the network evolve? How does the global structure of the network evolve? We addressed these questions by using the experimentally determined transcriptional regulatory network of E. coli as a reference network,3, 17 and performing a comparative genomic analysis to predict components of the regulatory network for 175 prokaryotes with completely sequenced genomes, from diverse lineages of the bacterial and archaeal kingdoms (a list of the genomes is provided as Supplementary Data).

While there has been considerable progress in unravelling the regulatory networks of various model organisms such as E. coli, the extrapolation of this information to poorly studied organisms, whose complete genome sequences are now available, remains a major challenge. Target genes for a transcription factor can be identified by using sequence profiles of known binding sites across different organisms and by arriving at a set of genes with conserved regulatory sequences. However, this method requires prior knowledge about binding sites, and is applicable only to closely related genomes, since orthologous transcription factors may regulate orthologous target genes through very divergent binding sites in distantly related organisms.18, 19, 20, 21

Alternatively, using an experimentally characterized transcriptional network as template, one can infer transcriptional targets of a regulator in a genome of interest by identifying orthologues of transcription factors and their target genes. It is now generally accepted that in the majority of cases, orthologous transcription factors regulate orthologous target genes. This procedure of transferring information about transcriptional regulation from a genome with known regulatory interactions to another genome by identifying orthologous proteins was assessed recently by Yu et al.,22 and was found to be a fairly robust method for predicting such interactions in eukaryotes. In fact similar approaches, based on orthologue detection using a bi-directional best-hit procedure,23, 24, 25, 26, 27, 28 have been developed successfully to transfer information on interactions to other organisms and have proved to be useful in predicting new interactions.

Detecting orthologues is a non-trivial process. After testing various orthologue detection procedures (e.g. bi-directional best-hit and best hits with defined e-value cut-offs), we arrived at a hybrid procedure, which was used to identify orthologous proteins in a genome. Using this approach, we predicted components of the transcriptional networks. With the best-characterized transcriptional regulatory network currently available, that of E. coli with 755 genes (112 transcription factors) and 1295 transcriptional interactions, as a template, we used our orthologue detection procedure and predicted transcriptional interaction networks, for the first time, for 175 prokaryotic genomes (Figure 1; Supplementary Data M1, M2 and S3). Our method is based on identifying the orthologues of E. coli transcription factors and the orthologues of E. coli target genes in each of the prokaryote genomes using protein sequence comparisons, without considering conservation of the transcription factor-binding sites in the DNA. We chose the orthology-based method rather than using binding site information because using binding sites (i) would place an additional constraint and would drastically reduce the size of the E. coli network that we considered (only a few transcription factors in the network have enough experimentally characterized binding site information to build a reliable profile) and (ii) would limit the number of genomes that can be compared, as reliable detection of these short binding sites in distantly related organisms is not possible, and may hence bias our analysis.20 It should be noted that the method we develop can, in principle, be applied to any reference network, and we chose the E. coli network because it is the most comprehensive when compared to networks available for other prokaryotes.

The transcriptional regulatory network for E. coli is shown in Figure 1, along with the networks for a Gram-positive mammalian pathogen, Bacillus anthracis and a free-living organism, Streptomyces coelicolor, which were reconstructed using the E. coli network as the reference network. We wish to emphasize that we do not predict new regulatory interactions that have been gained in the other organisms, and no current computational methods allow us to do it in the absence of other external information such as gene expression data, etc. However, we can still predict potentially conserved interactions, apart from conserved regulatory components and genes, which can shed light on network evolution (see Supplementary Data S17).

Orthologue detection, which is the foundation for our network reconstruction procedure can be confounded by rapid duplication, divergence and loss of genes. Hence, in this study, we assessed our procedure using the expression data available for Vibrio cholerae and the known regulatory network of Bacillus subtilis. We studied the extent to which target genes with the same set of known and predicted transcription factors have similar expression profile.29 We found that co-regulated genes in E. coli, for which the transcriptional network is known, and in V. cholerae, for which predictions were based on the reconstructed network, tend to be strongly co-expressed (Supplementary Data S2). Ideally, one would want to carry out such an assessment for as many genomes as possible; however, the availability of meaningful gene expression data for other organisms limits us to restrict this analysis to V. cholerae. This result supports the validity of reconstructing transcriptional networks by inferring regulatory interactions between orthologous transcription factors and orthologous target genes in prokaryotes. Additionally, the experimentally determined transcriptional regulatory network of B. subtilis (a bacterium very distantly related to our reference organism E. coli) shows a good degree of congruence with the interactions predicted by our analysis, thereby lending support to the validity of our reconstruction procedure (Supplementary Data S11; note that the experimentally determined B. subtilis network is far from complete and is much smaller than the E. coli network and, hence, this comparison should not be treated as a gold standard to get false positive and false negative estimates).

Section snippets

Transcription factors evolve rapidly and independently of their target genes

To assess the evolutionary trends in the network conservation, we asked if transcriptional regulators and their targets are conserved differentially in evolution. When we quantified the extent of conservation of the 755 genes in the 175 genomes, we found that transcription factors are less conserved across genomes during evolution than their target genes (Figure 2(a)). We assessed the significance of this observed bias in the conservation patterns by simulating network evolution (Supplementary

Conclusions

We present the first comprehensive analysis of the evolution of transcriptional regulatory networks at three distinct levels of organization by comparing the conservation of an experimentally established reference network of 1295 interactions across 175 microbial genomes (computationally analyzing ∼500,000 protein sequences). At the level of individual genes, we show that target genes are more conserved across genomes than transcription factors, and the conservation of a target gene and its

Materials and Methods

Detailed descriptions of the methods are given in the Supplementary Data.

Acknowledgements

M.M.B. and L.A. gratefully acknowledge the Intramural research program of National Institutes of Health, USA for funding their research. M.M.B. acknowledges the MRC Laboratory of Molecular Biology, Trinity College, Cambridge, Cambridge Commonwealth Trust and the National Institute of Health Visitor Program for financial support. We thank Dr Nakai and Yuko Makita for sending us information on the B. subtilis network. We thank Drs N. Luscombe, C. Chothia, P. Ten Wolde, L. LoConte, L. M. Iyer, V.

References (60)

  • H. Yu et al.

    Genomic analysis of essentiality within protein networks

    Trends Genet.

    (2004)
  • N.M. Luscombe et al.

    Protein–DNA interactions: amino acid conservation and the effects of mutations on binding specificity

    J. Mol. Biol.

    (2002)
  • A.L. Barabasi et al.

    Network biology: understanding the cell's functional organization

    Nature Rev. Genet.

    (2004)
  • R. Milo et al.

    Network motifs: simple building blocks of complex networks

    Science

    (2002)
  • S.S. Shen-Orr et al.

    Network motifs in the transcriptional regulation network of Escherichia coli

    Nature Genet.

    (2002)
  • T.I. Lee et al.

    Transcriptional regulatory networks in Saccharomyces cerevisiae

    Science

    (2002)
  • N. Guelzim et al.

    Topological and causal structure of the yeast transcriptional regulatory network

    Nature Genet.

    (2002)
  • H.H. McAdams et al.

    The evolution of genetic regulatory systems in bacteria

    Nature Rev. Genet.

    (2004)
  • M.E. Wall et al.

    Design of gene circuits: lessons from bacteria

    Nature Rev. Genet.

    (2004)
  • S. Carroll

    Endless Forms Most Beautiful: The New Science of Evo Devo and the Making of the Animal Kingdom

    (2005)
  • C.T. Harbison et al.

    Transcriptional regulatory code of a eukaryotic genome

    Nature

    (2004)
  • M. Madan Babu et al.

    Structure and evolution of transcriptional regulatory networks

    Curr. Opin. Struct. Biol.

    (2004)
  • R. Milo et al.

    Superfamilies of evolved and designed networks

    Science

    (2004)
  • R. Albert et al.

    Error and attack tolerance of complex networks

    Nature

    (2000)
  • A.L. Barabasi et al.

    Emergence of scaling in random networks

    Science

    (1999)
  • Z.N. Oltvai et al.

    Systems biology. Life's complexity pyramid

    Science

    (2002)
  • E. Ravasz et al.

    Hierarchical organization of modularity in metabolic networks

    Science

    (2002)
  • H. Salgado et al.

    RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12

    Nucl. Acids Res.

    (2004)
  • W.B. Alkema et al.

    Regulog analysis: detection of conserved regulatory networks across bacteria: application to Staphylococcus aureus

    Genome Res.

    (2004)
  • N. Rajewsky et al.

    The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons

    Genome Res.

    (2002)
  • Cited by (0)

    View full text