Abstract
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.
Re-printed with kind permission of MIT Press and Kluwer books.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aliferis, C., Cooper, G.: An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. In: Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pp. 8–14. Morgan Kaufmann, San Francisco (1994)
Badsberg, J.: Model search in contingency tables by CoCo. In: Dodge, Y., Whittaker, J. (eds.) Computational Statistics, pp. 251–256, Physica Verlag, Heidelberg (1992)
Becker, S., LeCun, Y.: Improving the convergence of backpropagation learning with second order methods. In: Proceedings of the 1988 Connectionist Models Summer School, pp. 29–37. Morgan Kaufmann, San Francisco (1989)
Beinlich, I., Suermondt, H., Chavez, R., Cooper, G.: The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In: Proceedings of the Second European Conference on Artificial Intelligence in Medicine, London, pp. 247–256. Springer, Berlin (1989)
Bernardo, J.: Expected information as expected utility. Annals of Statistics 7, 686–690 (1979)
Bernardo, J., Smith, A.: Bayesian Theory. John Wiley and Sons, New York (1994)
Buntine, W.: Theory refinement on Bayesian networks. In: Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, pp. 52–60. Morgan Kaufmann, San Francisco (1991)
Buntine, W.: Learning classification trees. In: Artificial Intelligence Frontiers in Statistics: AI and statistics III. Chapman and Hall, New York (1993)
Buntine, W.: A guide to the literature on learning graphical models. IEEE Transactions on Knowledge and Data Engineering 8, 195–210 (1996)
Chaloner, K., Duncan, G.: Assessment of a beta prior distribution: PM elicitation. The Statistician 32, 174–180 (1983)
Cheeseman, P., Stutz, J.: Bayesian classification (Auto-Class): Theory and results. In: Fayyad, U., Piatesky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI Press, Menlo Park (1995)
Chib, S.: Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90, 1313–1321 (1995)
Chickering, D.: A transformational characterization of equivalent Bayesian network structures. In: Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pp. 87–98. Morgan Kaufmann, San Francisco (1995)
Chickering, D.: Learning equivalence classes of Bayesian-network structures. In: Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR. Morgan Kaufmann, San Francisco (1996)
Chickering, D., Geiger, D., Heckerman, D.: Learning Bayesian networks: Search methods and experimental results. In: Proceedings of Fifth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL. Society for Artificial Intelligence in Statistics pp. 112–128 (1995)
Chickering, D., Heckerman, D.: Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network. Technical Report MSR-TR-96-08, Microsoft Research, Redmond, WA (Revised, November 1996)
Cooper, G.: Computational complexity of probabilistic inference using Bayesian belief networks (Research note). Artificial Intelligence 42, 393–405 (1990)
Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine. Learning 9, 309–347 (1992)
Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Technical Report SMI-91-1, Section on Medical Informatics, Stanford University (January 1991)
Cox, R.: Probability, frequency and reasonable expectation. American Journal of Physics 14, 1–13 (1946)
Dagum, P., Luby, M.: Approximating probabilistic inference in bayesian belief networks is np-hard. Artificial Intelligence 60, 141–153 (1993)
D’Ambrosio, B.: Local expression languages for probabilistic dependence. In: Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, pp. 95–102. Morgan Kaufmann, San Francisco (1991)
Darwiche, A., Provan, G.: Query DAGs: A practical paradigm for implementing belief-network inference. In: Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pp. 203–210. Morgan Kaufmann, San Francisco (1996)
Dawid, P.: Statistical theory. The prequential approach (with discussion). Journal of the Royal Statistical Society A 147, 178–292 (1984)
Dawid, P.: Applications of a general propagation algorithm for probabilistic expert systmes. Statistics and Computing 2, 25–36 (1992)
de Finetti, B.: Theory of Probability. Wiley and Sons, New York (1970)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from in complete data via the EM algorithm. Journal of the Royal Statistical Society, B 39, 1–38 (1977)
DiCiccio, T., Kass, R., Raftery, A., Wasserman, L.: Computing Bayes factors by combining simulation and asymptotic approximations. Technical Report 630, Department of Statistics, Carnegie Mellon University, PA (July 1995)
Friedman, J.: Introduction to computational learning and statistical prediction. Technical report, Department of Statistics, Stanford University (1995)
Friedman, J.: On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining and Knowledge Discovery, 1 (1996)
Friedman, N., Godlszmidt, M.: Building classifiers using Bayesian networks. In: Proceedings AAAI 1996 Thirteenth National Conference on Artificial Intelligence, Portland, OR, pp. 1277–1284. AAAI Press, Menlo Park (1996)
Frydenberg, M.: The chain graph Markov property. Scandinavian Journal of Statistics 17, 333–353 (1990)
Geiger, D., Heckerman, D.: A characterization of the Dirichlet distribution applicable to learning Bayesian networks. Technical Report MSR-TR-94-16, Microsoft Research, Redmond, WA (Revised, February 1995)
Geiger, D., Heckerman, D., Meek, C.: Asymptotic model selection for directed networks with hidden variables. In: Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pp. 283–290. Morgan Kaufmann, San Francisco (1996)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–742 (1984)
Gilks, W., Richardson, S., Spiegelhalter, D.: Markov Chain Monte Carlo in Practice. Chapman and Hall, Boca Raton (1996)
Good, I.: Probability and the Weighing of Evidence. Hafners, New York (1950)
Heckerman, D.: A tractable algorithm for diagnosing multiple diseases. In: Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence, Windsor, ON, pp. 174–181. Association for Uncertainty in Artificial Intelligence, Mountain View, CA (1989); Also In: Henrion, M., Shachter, R., Kanal, L., Lemmer, J. (eds.) Uncertainty in Artificial Intelligence 5, pp. 163–171. North-Holland, New York (1990)
Heckerman, D.: A Bayesian approach for learning causal networks. In: Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pp. 285–295. Morgan Kaufmann, San Francisco (1995)
Heckerman, D., Geiger, D.: Likelihoods and priors for Bayesian networks. Technical Report MSR-TR-95-54, Microsoft Research, Redmond, WA (Revised, November 1996)
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995a)
Heckerman, D., Mamdani, A., Wellman, M.: Real-world applications of Bayesian networks. Communications of the ACM 38 (1995b)
Heckerman, D., Shachter, R.: Decision-theoretic foundations for causal reasoning. Journal of Artificial Intelligence Research 3, 405–430 (1995)
Højsgaard, S., Skjøth, F., Thiesson, B.: User’s guide to BIOFROST. Technical report, Department of Mathematics and Computer Science, Aalborg, Denmark (1994)
Howard, R.: Decision analysis: Perspectives on inference, decision, and experimentation. Proceedings of the IEEE 58, 632–643 (1970)
Howard, R., Matheson, J.: Influence diagrams. In: Howard, R., Matheson, J. (eds.) Readings on the Principles and Applications of Decision Analysis, Strategic Decisions Group, Menlo Park, CA, vol. II, pp. 721–762 (1981)
Howard, R., Matheson, J. (eds.): The Principles and Applications of Decision Analysis, Strategic Decisions Group, Menlo Park, CA (1983)
Humphreys, P., Freedman, D.: The grand leap. British Journal for the Philosphy of Science 47, 113–118 (1996)
Jaakkola, T., Jordan, M.: Computing upper and lower bounds on likelihoods in intractable networks. In: Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pp. 340–348. Morgan Kaufmann, San Francisco (1996)
Jensen, F.: An Introduction to Bayesian Networks. Springer, Heidelberg (1996)
Jensen, F., Andersen, S.: Approximations in Bayesian belief universes for knowledge based systems. Technical report, Institute of Electronic Systems, Aalborg University, Aalborg, Denmark (1990)
Jensen, F., Lauritzen, S., Olesen, K.: Bayesian updating in recursive graphical models by local computations. Computational Statisticals Quarterly 4, 269–282 (1990)
Kass, R., Raftery, A.: Bayes factors. Journal of the American Statistical Association 90, 773–795 (1995)
Kass, R., Tierney, L., Kadane, J.: Asymptotics in Bayesian computation. In: Bernardo, J., DeGroot, M., Lindley, D., Smith, A. (eds.) Bayesian Statistics, vol. 3, pp. 261–278. Oxford University Press, Oxford (1988)
Koopman, B.: On distributions admitting a sufficient statistic. Transactions of the American Mathematical Society 39, 399–409 (1936)
Korf, R.: Linear-space best-first search. Artificial Intelligence 62, 41–78 (1993)
Lauritzen, S.: Lectures on Contingency Tables. University of Aalborg Press, Aalborg (1982)
Lauritzen, S.: Propagation of probabilities, means, and variances in mixed graphical association models. Journal of the American Statistical Association 87, 1098–1108 (1992)
Lauritzen, S., Spiegelhalter, D.: Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B 50, 157–224 (1988)
Lauritzen, S., Thiesson, B., Spiegelhalter, D.: Diagnostic systems created by model selection methods: A case study. In: Cheeseman, P., Oldford, R. (eds.) AI and Statistics IV. Lecture Notes in Statistics, vol. 89, pp. 143–152. Springer, New York (1994)
MacKay, D.: Bayesian interpolation. Neural Computation 4, 415–447 (1992a)
MacKay, D.: A practical Bayesian framework for backpropagation networks. Neural Computation 4, 448–472 (1992b)
MacKay, D.: Choice of basis for the Laplace approximation. Technical report, Cavendish Laboratory, Cambridge, UK (1996)
Madigan, D., Garvin, J., Raftery, A.: Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Communications in Statistics: Theory and Methods 24, 2271–2292 (1995)
Madigan, D., Raftery, A.: Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association 89, 1535–1546 (1994)
Madigan, D., Raftery, A., Volinsky, C., Hoeting, J.: Bayesian model averaging. In: Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, Portland, OR (1996)
Madigan, D., York, J.: Bayesian graphical models for discrete data. International Statistical Review 63, 215–232 (1995)
Martin, J., VanLehn, K.: Discrete factor analysis: Learning hidden variables in bayesian networks. Technical report, Department of Computer Science, University of Pittsburgh, PA. (1995), http://bert.cs.pitt.edu/vanlehn
Meng, X., Rubin, D.: Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86, 899–909 (1991)
Neal, R.: Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto (1993)
Olmsted, S.: On representing and solving decision problems. PhD thesis, Department of Engineering-Economic Systems, Stanford University (1983)
Pearl, J.: Fusion, propagation, and structuring in belief networks. Artificial Intelligence 29, 241–288 (1986)
Pearl, J.: Causal diagrams for empirical research. Biometrika 82, 669–710 (1995)
Pearl, J., Verma, T.: A theory of inferred causation. In: Allen, J., Fikes, R., Sandewall, E. (eds.) Knowledge Representation and Reasoning: Proceedings of the Second International Conference, pp. 441–452. Morgan Kaufmann, New York (1991)
Pitman, E.: Sufficient statistics and intrinsic accuracy. Proceedings of the Cambridge Philosophy Society 32, 567–579 (1936)
Raftery, A.: Bayesian model selection in social research. In: Marsden, P. (ed.) Sociological Methodology. Blackwells, Cambridge (1995)
Raftery, A.: Hypothesis testing and model selection, ch. 10. Chapman and Hall, Boca Raton (1996)
Ramamurthi, K., Agogino, A.: Real time expert system for fault tolerant supervisory control. In: Tipnis, V., Patton, E. (eds.) Computers in Engineering, American Society of Mechanical Engineers, Corte Madera, CA, pp. 333–339 (1988)
Ramsey, F.: Truth and probability. In: Braithwaite, R. (ed.) The Foundations of Methamatics and other Logical Essays, Humanities Press, London (1931); (Reprinted in Kyburg and Smokler, 1964)
Richardson, T.: Extensions of undirected and acyclic, directed graphical models. In: Proceedings of Sixth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, pp. 407–419. Society for Artificial Intelligence in Statistics (1997)
Rissanen, J.: Stochastic complexity (with discussion). Journal of the Royal Statistical Society, Series B 49, 223–239, 253–265 (1987)
Robins, J.: A new approach to causal interence in mortality studies with sustained exposure results. Mathematical Modelling 7, 1393–1512 (1986)
Rubin, D.: Bayesian inference for causal effects: The role of randomization. Annals of Statistics 6, 34–58 (1978)
Russell, S., Binder, J., Koller, D., Kanazawa, K.: Local learning in probabilistic networks with hidden variables. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, QU, pp. 1146–1152. Morgan Kaufmann, San Mateo (1995)
Saul, L., Jaakkola, T., Jordan, M.: Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research 4, 61–76 (1996)
Savage, L.: The Foundations of Statistics. Dover, New York (1954)
Schervish, M.: Theory of Statistics. Springer, Heidelberg (1995)
Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Sewell, W., Shah, V.: Social class, parental encouragement, and educational aspirations. American Journal of Sociology 73, 559–572 (1968)
Shachter, R.: Probabilistic inference and influence diagrams. Operations Research 36, 589–604 (1988)
Shachter, R., Andersen, S., Poh, K.: Directed reduction algorithms and decomposable graphs. In: Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, Boston, MA, pp. 237–244. Association for Uncertainty in Artificial Intelligence, Mountain View, CA (1990)
Shachter, R., Kenley, C.: Gaussian influence diagrams. Management Science 35, 527–550 (1989)
Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York (1986)
Singh, M., Provan, G.: Efficient learning of selective Bayesian network classifiers. Technical Report MS-CIS-95-36, Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA (November 1995)
Spetzler, C., Stael von Holstein, C.: Probability encoding in decision analysis. Management Science 22, 340–358 (1975)
Spiegelhalter, D., Dawid, A., Lauritzen, S., Cowell, R.: Bayesian analysis in expert systems. Statistical Science 8, 219–282 (1993)
Spiegelhalter, D., Lauritzen, S.: Sequential updating of conditional probabilities on directed graphical structures. Networks 20, 579–605 (1990)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. Springer, New York (1993)
Spirtes, P., Meek, C.: Learning Bayesian networks with discrete variables from data. In: Proceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU. Morgan Kaufmann, San Francisco (1995)
Suermondt, H., Cooper, G.: A combination of exact algorithms for inference on Bayesian belief networks. International Journal of Approximate Reasoning 5, 521–542 (1991)
Thiesson, B.: Accelerated quantification of Bayesian networks with incomplete data. In: Proceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU, pp. 306–311. Morgan Kaufmann, San Francisco (1995a)
Thiesson, B: Score and information for recursive exponential models with incomplete data. Technical report, Institute of Electronic Systems, Aalborg University, Aalborg, Denmark (1995b)
Thomas, A., Spiegelhalter, D., Gilks, W.: Bugs: A program to perform Bayesian inference using Gibbs sampling. In: Bernardo, J., Berger, J., Dawid, A., Smith, A. (eds.) Bayesian Statistics, vol. 4, pp. 837–842. Oxford University Press, Oxford (1992)
Tukey, J.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)
Tversky, A., Kahneman, D.: Judgment under uncertainty: Heuristics and biases. Science 185, 1124–1131 (1974)
Verma, T., Pearl, J.: Equivalence and synthesis of causal models. In: Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence, Boston, MA, pp. 220–227. Morgan Kaufmann, San Francisco (1990)
Whittaker, J.: Graphical Models in Applied Multivariate Statistics. John Wiley and Sons, Chichester (1990)
Winkler, R.: The assessment of prior distributions in Bayesian analysis. American Statistical Association Journal 62, 776–800 (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Heckerman, D. (2008). A Tutorial on Learning with Bayesian Networks. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Bayesian Networks. Studies in Computational Intelligence, vol 156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85066-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-85066-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85065-6
Online ISBN: 978-3-540-85066-3
eBook Packages: EngineeringEngineering (R0)