Journal of Molecular Biology
Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks
Introduction
Of the several steps at which the flow of information from a gene to its protein product is controlled, regulation at the transcriptional level is a fundamental mechanism observed in all organisms. This form of regulation is typically mediated by a DNA-binding protein (transcription factor) that binds to target sites in the genome and, either singly or in combination with other factors, regulates the expression of one or more target genes. The sum total of such transcriptional interactions in an organism can be conceptualised as a network, and is termed the transcriptional regulatory network.1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 In such a network, nodes represent genes and edges represent regulatory interactions. Studies on the transcriptional regulatory network at an abstract level have shown that they have architectures resembling scale-free networks, with striking structural and topological similarity to other networks from biological and non-biological systems. They are characterized by the recurrence of small patterns of interconnections, called network motifs,3, 4, 5, 12, 13, 14, 15, 16 which were first defined in Escherichia coli,3 and were subsequently found in yeast and other organisms.2, 4, 12
Even though the general structural properties of transcriptional networks are well understood, there are several fundamental questions regarding the provenance and evolution of transcriptional regulatory networks that remain unanswered: What are the trends of conservation of transcription factors, target genes and regulatory interactions in the network? How do interactions that specify topologically equivalent motifs within the network evolve? How does the global structure of the network evolve? We addressed these questions by using the experimentally determined transcriptional regulatory network of E. coli as a reference network,3, 17 and performing a comparative genomic analysis to predict components of the regulatory network for 175 prokaryotes with completely sequenced genomes, from diverse lineages of the bacterial and archaeal kingdoms (a list of the genomes is provided as Supplementary Data).
While there has been considerable progress in unravelling the regulatory networks of various model organisms such as E. coli, the extrapolation of this information to poorly studied organisms, whose complete genome sequences are now available, remains a major challenge. Target genes for a transcription factor can be identified by using sequence profiles of known binding sites across different organisms and by arriving at a set of genes with conserved regulatory sequences. However, this method requires prior knowledge about binding sites, and is applicable only to closely related genomes, since orthologous transcription factors may regulate orthologous target genes through very divergent binding sites in distantly related organisms.18, 19, 20, 21
Alternatively, using an experimentally characterized transcriptional network as template, one can infer transcriptional targets of a regulator in a genome of interest by identifying orthologues of transcription factors and their target genes. It is now generally accepted that in the majority of cases, orthologous transcription factors regulate orthologous target genes. This procedure of transferring information about transcriptional regulation from a genome with known regulatory interactions to another genome by identifying orthologous proteins was assessed recently by Yu et al.,22 and was found to be a fairly robust method for predicting such interactions in eukaryotes. In fact similar approaches, based on orthologue detection using a bi-directional best-hit procedure,23, 24, 25, 26, 27, 28 have been developed successfully to transfer information on interactions to other organisms and have proved to be useful in predicting new interactions.
Detecting orthologues is a non-trivial process. After testing various orthologue detection procedures (e.g. bi-directional best-hit and best hits with defined e-value cut-offs), we arrived at a hybrid procedure, which was used to identify orthologous proteins in a genome. Using this approach, we predicted components of the transcriptional networks. With the best-characterized transcriptional regulatory network currently available, that of E. coli with 755 genes (112 transcription factors) and 1295 transcriptional interactions, as a template, we used our orthologue detection procedure and predicted transcriptional interaction networks, for the first time, for 175 prokaryotic genomes (Figure 1; Supplementary Data M1, M2 and S3). Our method is based on identifying the orthologues of E. coli transcription factors and the orthologues of E. coli target genes in each of the prokaryote genomes using protein sequence comparisons, without considering conservation of the transcription factor-binding sites in the DNA. We chose the orthology-based method rather than using binding site information because using binding sites (i) would place an additional constraint and would drastically reduce the size of the E. coli network that we considered (only a few transcription factors in the network have enough experimentally characterized binding site information to build a reliable profile) and (ii) would limit the number of genomes that can be compared, as reliable detection of these short binding sites in distantly related organisms is not possible, and may hence bias our analysis.20 It should be noted that the method we develop can, in principle, be applied to any reference network, and we chose the E. coli network because it is the most comprehensive when compared to networks available for other prokaryotes.
The transcriptional regulatory network for E. coli is shown in Figure 1, along with the networks for a Gram-positive mammalian pathogen, Bacillus anthracis and a free-living organism, Streptomyces coelicolor, which were reconstructed using the E. coli network as the reference network. We wish to emphasize that we do not predict new regulatory interactions that have been gained in the other organisms, and no current computational methods allow us to do it in the absence of other external information such as gene expression data, etc. However, we can still predict potentially conserved interactions, apart from conserved regulatory components and genes, which can shed light on network evolution (see Supplementary Data S17).
Orthologue detection, which is the foundation for our network reconstruction procedure can be confounded by rapid duplication, divergence and loss of genes. Hence, in this study, we assessed our procedure using the expression data available for Vibrio cholerae and the known regulatory network of Bacillus subtilis. We studied the extent to which target genes with the same set of known and predicted transcription factors have similar expression profile.29 We found that co-regulated genes in E. coli, for which the transcriptional network is known, and in V. cholerae, for which predictions were based on the reconstructed network, tend to be strongly co-expressed (Supplementary Data S2). Ideally, one would want to carry out such an assessment for as many genomes as possible; however, the availability of meaningful gene expression data for other organisms limits us to restrict this analysis to V. cholerae. This result supports the validity of reconstructing transcriptional networks by inferring regulatory interactions between orthologous transcription factors and orthologous target genes in prokaryotes. Additionally, the experimentally determined transcriptional regulatory network of B. subtilis (a bacterium very distantly related to our reference organism E. coli) shows a good degree of congruence with the interactions predicted by our analysis, thereby lending support to the validity of our reconstruction procedure (Supplementary Data S11; note that the experimentally determined B. subtilis network is far from complete and is much smaller than the E. coli network and, hence, this comparison should not be treated as a gold standard to get false positive and false negative estimates).
Section snippets
Transcription factors evolve rapidly and independently of their target genes
To assess the evolutionary trends in the network conservation, we asked if transcriptional regulators and their targets are conserved differentially in evolution. When we quantified the extent of conservation of the 755 genes in the 175 genomes, we found that transcription factors are less conserved across genomes during evolution than their target genes (Figure 2(a)). We assessed the significance of this observed bias in the conservation patterns by simulating network evolution (Supplementary
Conclusions
We present the first comprehensive analysis of the evolution of transcriptional regulatory networks at three distinct levels of organization by comparing the conservation of an experimentally established reference network of 1295 interactions across 175 microbial genomes (computationally analyzing ∼500,000 protein sequences). At the level of individual genes, we show that target genes are more conserved across genomes than transcription factors, and the conservation of a target gene and its
Materials and Methods
Detailed descriptions of the methods are given in the Supplementary Data.
Acknowledgements
M.M.B. and L.A. gratefully acknowledge the Intramural research program of National Institutes of Health, USA for funding their research. M.M.B. acknowledges the MRC Laboratory of Molecular Biology, Trinity College, Cambridge, Cambridge Commonwealth Trust and the National Institute of Health Visitor Program for financial support. We thank Dr Nakai and Yuko Makita for sending us information on the B. subtilis network. We thank Drs N. Luscombe, C. Chothia, P. Ten Wolde, L. LoConte, L. M. Iyer, V.
References (60)
- et al.
Using a quantitative blueprint to reprogram the dynamics of the flagella gene network
Cell
(2004) - et al.
Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms
J. Biol. Chem.
(2002) - et al.
Comparative studies of transcriptional regulation mechanisms in a group of eight gamma-proteobacterial genomes
J. Mol. Biol.
(2005) - et al.
Reconstruction of microbial transcriptional regulatory networks
Curr. Opin. Biotechnol.
(2004) - et al.
Conservation of protein-protein interactions—lessons from ascomycota
Trends Genet.
(2004) Scaling laws in the functional content of genomes
Trends Genet.
(2003)- et al.
Evolution of protein superfamilies and bacterial genome size
J. Mol. Biol.
(2004) - et al.
The many faces of the helix-turn-helix domain: transcription regulation and beyond
FEMS Microbiol. Rev.
(2005) - et al.
Insights into the evolutionary process of genome degradation
Curr. Opin. Genet. Dev.
(1999) - et al.
Genome reduction in the alpha-Proteobacteria
Curr. Opin. Microbiol.
(2005)