Abstract
In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model selection attempts to find the most likely (MAP) model, and uses its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed order over network variables. This allows us to compute, for a given order, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orders rather than over network structures. The space of orders is smaller and more regular than the space of structures, and has much a smoother posterior “landscape”. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.
Article PDF
Similar content being viewed by others
References
Beinlich, I., Suermondt, G., Chavez, R., &; Cooper, G. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proc. 2nd European Conf. on AI and Medicine, Berlin: Springer-Verlag.
Buntine, W. L. (1991). Theory refinement on Bayesian networks. In B. D. D'Ambrosio, P. Smets, &; P. P. Bonissone (Eds.), Proc. Seventh Annual Conference on Uncertainty Artificial Intelligence (UAI '91) (pp. 52-60). San Francisco: Morgan Kaufmann.
Buntine, W. L. (1996). A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering, 8, 195-210.
Casella, G., &; Robert, C. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83:1, 81-94.
Cooper, G. F., &; Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309-347.
Friedman, N., Goldszmidt, M., &; Wyner, A. (1999). Data analysis with Bayesian networks: A bootstrap approach. In Proc. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI '99) (pp. 206-215). San Francisco: Morgan Kaufmann.
Friedman, N., Linial, M., Nachman, I., &; Pe'er, D. (2000). Using Bayesian networks to analyze expression data. J. Computational Biology, 7, 601-620.
Gelfand, A., &; Smith, A. (1990). Sampling based approaches to calculating marginal densities. Journal American Statistical Association, 85, 398-409.
Gilks, W., Richardson, S., &; Spiegelhalter, D. (1996). Markov chain Monte Carlo methods in practice. CRC Press.
Giudici, P., &; Green, P. (1999). Decomposable graphical Gaussian model determination. Biometrika, 86:4, 785-801.
Giudici, P., Green, P., &; Tarantola, C. (2000). Efficient model determination for discrete graphical models. Discussion Paper 99-93, Department of Statistics, Athens University of Economics and Business.
Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.
Heckerman, D. (1998). A tutorial on learning with Bayesian networks. In M. I. Jordan (Ed.), Learning in graphical models. Dordrecht, The Netherlands: Kluwer.
Heckerman, D., &; Geiger, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. In P. Besnard &; S. Hanks (Eds.), Proc. Eleventh Conference on Uncertainty in Artificial Intelligence (UAI'95) (pp. 274-284). San Francisco: Morgan Kaufmann.
Heckerman, D., Geiger, D., &; Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197-243.
Heckerman, D., Meek, C., &; Cooper, G. (1997). A Bayesian approach to causal discovery. Technical Report MSR-TR-97-05, Microsoft Research.
Lander, E. (1999). Array of hope. Nature Genetics, 21:1, 3-4.
Larrañaga, P., Kuijpers, C., Murga, R., &; Yurramendi, Y. (1996). Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on System, Man and Cybernetics 26:4, 487-493.
Liu, J., Wong, W., &; Kong, A. (1994). Coveriance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81:1, 27-40.
Madigan, D., Andersson, S., Perlman, M., &; Volinsky, C. (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic graphs. Communications in Statistics: Theory and Methods, 25, 2493-2519.
Madigan, D., &; Raftery, E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal Americal Statistical Association, 89, 1535-1546.
Madigan, D., &; York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215-232.
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., &; Teller, E. (1953). Equation of state calculation by fast computing machines. Journal of Chemical Physics, 21, 1087-1092.
Murphy, P. M., &; Aha, D. W. (1995). UCI repository of machine learning databases. Available at http://www.ics.uci.edu/~mlearn/MLRepository.html.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Francisco, CA: Morgan Kaufmann.
Pereira, F., &; Singer, Y. (1999). An efficient extension to mixture techniques for prediction and decision trees. Machine Learning, 36:3, 183-199.
Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., &; Futcher (1998). Comprehensive identification of cell cycle-regulated genes of the yeastSaccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273-3297.
Spirtes, P., Glymour, C., &; Scheines, R. (1993). Causation, prediction and search, Vol. 81 of Lecture Notes in Statistics. New York: Springer-Verlag.
Wallace, C., Korb, K., &; Dai, H. (1996). Causal discovery via MML. In Proc. 13th International Conference on Machine Learning (pp. 516-524).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Friedman, N., Koller, D. Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks. Machine Learning 50, 95–125 (2003). https://doi.org/10.1023/A:1020249912095
Issue Date:
DOI: https://doi.org/10.1023/A:1020249912095