A Tutorial on Learning with Bayesian Networks

Heckerman, David

doi:10.1007/978-3-540-85066-3_3

David Heckerman⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 156))

6953 Accesses
154 Citations
1 Altmetric

Abstract

A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.

Re-printed with kind permission of MIT Press and Kluwer books.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aliferis, C., Cooper, G.: An evaluation of an algorithm for inductive learning of Bayesian belief networks using simulated data sets. In: Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pp. 8–14. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Badsberg, J.: Model search in contingency tables by CoCo. In: Dodge, Y., Whittaker, J. (eds.) Computational Statistics, pp. 251–256, Physica Verlag, Heidelberg (1992)
Google Scholar
Becker, S., LeCun, Y.: Improving the convergence of backpropagation learning with second order methods. In: Proceedings of the 1988 Connectionist Models Summer School, pp. 29–37. Morgan Kaufmann, San Francisco (1989)
Google Scholar
Beinlich, I., Suermondt, H., Chavez, R., Cooper, G.: The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In: Proceedings of the Second European Conference on Artificial Intelligence in Medicine, London, pp. 247–256. Springer, Berlin (1989)
Google Scholar
Bernardo, J.: Expected information as expected utility. Annals of Statistics 7, 686–690 (1979)
Article MATH MathSciNet Google Scholar
Bernardo, J., Smith, A.: Bayesian Theory. John Wiley and Sons, New York (1994)
MATH Google Scholar
Buntine, W.: Theory refinement on Bayesian networks. In: Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, pp. 52–60. Morgan Kaufmann, San Francisco (1991)
Google Scholar
Buntine, W.: Learning classification trees. In: Artificial Intelligence Frontiers in Statistics: AI and statistics III. Chapman and Hall, New York (1993)
Google Scholar
Buntine, W.: A guide to the literature on learning graphical models. IEEE Transactions on Knowledge and Data Engineering 8, 195–210 (1996)
Article Google Scholar
Chaloner, K., Duncan, G.: Assessment of a beta prior distribution: PM elicitation. The Statistician 32, 174–180 (1983)
Article Google Scholar
Cheeseman, P., Stutz, J.: Bayesian classification (Auto-Class): Theory and results. In: Fayyad, U., Piatesky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI Press, Menlo Park (1995)
Google Scholar
Chib, S.: Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90, 1313–1321 (1995)
Article MATH MathSciNet Google Scholar
Chickering, D.: A transformational characterization of equivalent Bayesian network structures. In: Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pp. 87–98. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Chickering, D.: Learning equivalence classes of Bayesian-network structures. In: Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Chickering, D., Geiger, D., Heckerman, D.: Learning Bayesian networks: Search methods and experimental results. In: Proceedings of Fifth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL. Society for Artificial Intelligence in Statistics pp. 112–128 (1995)
Google Scholar
Chickering, D., Heckerman, D.: Efficient approximations for the marginal likelihood of incomplete data given a Bayesian network. Technical Report MSR-TR-96-08, Microsoft Research, Redmond, WA (Revised, November 1996)
Google Scholar
Cooper, G.: Computational complexity of probabilistic inference using Bayesian belief networks (Research note). Artificial Intelligence 42, 393–405 (1990)
Article MATH MathSciNet Google Scholar
Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine. Learning 9, 309–347 (1992)
MATH Google Scholar
Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Technical Report SMI-91-1, Section on Medical Informatics, Stanford University (January 1991)
Google Scholar
Cox, R.: Probability, frequency and reasonable expectation. American Journal of Physics 14, 1–13 (1946)
Article MATH MathSciNet Google Scholar
Dagum, P., Luby, M.: Approximating probabilistic inference in bayesian belief networks is np-hard. Artificial Intelligence 60, 141–153 (1993)
Article MATH MathSciNet Google Scholar
D’Ambrosio, B.: Local expression languages for probabilistic dependence. In: Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, pp. 95–102. Morgan Kaufmann, San Francisco (1991)
Google Scholar
Darwiche, A., Provan, G.: Query DAGs: A practical paradigm for implementing belief-network inference. In: Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pp. 203–210. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Dawid, P.: Statistical theory. The prequential approach (with discussion). Journal of the Royal Statistical Society A 147, 178–292 (1984)
MathSciNet Google Scholar
Dawid, P.: Applications of a general propagation algorithm for probabilistic expert systmes. Statistics and Computing 2, 25–36 (1992)
Article Google Scholar
de Finetti, B.: Theory of Probability. Wiley and Sons, New York (1970)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from in complete data via the EM algorithm. Journal of the Royal Statistical Society, B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
DiCiccio, T., Kass, R., Raftery, A., Wasserman, L.: Computing Bayes factors by combining simulation and asymptotic approximations. Technical Report 630, Department of Statistics, Carnegie Mellon University, PA (July 1995)
Google Scholar
Friedman, J.: Introduction to computational learning and statistical prediction. Technical report, Department of Statistics, Stanford University (1995)
Google Scholar
Friedman, J.: On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining and Knowledge Discovery, 1 (1996)
Google Scholar
Friedman, N., Godlszmidt, M.: Building classifiers using Bayesian networks. In: Proceedings AAAI 1996 Thirteenth National Conference on Artificial Intelligence, Portland, OR, pp. 1277–1284. AAAI Press, Menlo Park (1996)
Google Scholar
Frydenberg, M.: The chain graph Markov property. Scandinavian Journal of Statistics 17, 333–353 (1990)
MATH MathSciNet Google Scholar
Geiger, D., Heckerman, D.: A characterization of the Dirichlet distribution applicable to learning Bayesian networks. Technical Report MSR-TR-94-16, Microsoft Research, Redmond, WA (Revised, February 1995)
Google Scholar
Geiger, D., Heckerman, D., Meek, C.: Asymptotic model selection for directed networks with hidden variables. In: Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pp. 283–290. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–742 (1984)
MATH Google Scholar
Gilks, W., Richardson, S., Spiegelhalter, D.: Markov Chain Monte Carlo in Practice. Chapman and Hall, Boca Raton (1996)
MATH Google Scholar
Good, I.: Probability and the Weighing of Evidence. Hafners, New York (1950)
MATH Google Scholar
Heckerman, D.: A tractable algorithm for diagnosing multiple diseases. In: Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence, Windsor, ON, pp. 174–181. Association for Uncertainty in Artificial Intelligence, Mountain View, CA (1989); Also In: Henrion, M., Shachter, R., Kanal, L., Lemmer, J. (eds.) Uncertainty in Artificial Intelligence 5, pp. 163–171. North-Holland, New York (1990)
Google Scholar
Heckerman, D.: A Bayesian approach for learning causal networks. In: Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pp. 285–295. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Heckerman, D., Geiger, D.: Likelihoods and priors for Bayesian networks. Technical Report MSR-TR-95-54, Microsoft Research, Redmond, WA (Revised, November 1996)
Google Scholar
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995a)
MATH Google Scholar
Heckerman, D., Mamdani, A., Wellman, M.: Real-world applications of Bayesian networks. Communications of the ACM 38 (1995b)
Google Scholar
Heckerman, D., Shachter, R.: Decision-theoretic foundations for causal reasoning. Journal of Artificial Intelligence Research 3, 405–430 (1995)
MATH Google Scholar
Højsgaard, S., Skjøth, F., Thiesson, B.: User’s guide to BIOFROST. Technical report, Department of Mathematics and Computer Science, Aalborg, Denmark (1994)
Google Scholar
Howard, R.: Decision analysis: Perspectives on inference, decision, and experimentation. Proceedings of the IEEE 58, 632–643 (1970)
Article MathSciNet Google Scholar
Howard, R., Matheson, J.: Influence diagrams. In: Howard, R., Matheson, J. (eds.) Readings on the Principles and Applications of Decision Analysis, Strategic Decisions Group, Menlo Park, CA, vol. II, pp. 721–762 (1981)
Google Scholar
Howard, R., Matheson, J. (eds.): The Principles and Applications of Decision Analysis, Strategic Decisions Group, Menlo Park, CA (1983)
Google Scholar
Humphreys, P., Freedman, D.: The grand leap. British Journal for the Philosphy of Science 47, 113–118 (1996)
Article Google Scholar
Jaakkola, T., Jordan, M.: Computing upper and lower bounds on likelihoods in intractable networks. In: Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pp. 340–348. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Jensen, F.: An Introduction to Bayesian Networks. Springer, Heidelberg (1996)
Google Scholar
Jensen, F., Andersen, S.: Approximations in Bayesian belief universes for knowledge based systems. Technical report, Institute of Electronic Systems, Aalborg University, Aalborg, Denmark (1990)
Google Scholar
Jensen, F., Lauritzen, S., Olesen, K.: Bayesian updating in recursive graphical models by local computations. Computational Statisticals Quarterly 4, 269–282 (1990)
MathSciNet Google Scholar
Kass, R., Raftery, A.: Bayes factors. Journal of the American Statistical Association 90, 773–795 (1995)
Article MATH Google Scholar
Kass, R., Tierney, L., Kadane, J.: Asymptotics in Bayesian computation. In: Bernardo, J., DeGroot, M., Lindley, D., Smith, A. (eds.) Bayesian Statistics, vol. 3, pp. 261–278. Oxford University Press, Oxford (1988)
Google Scholar
Koopman, B.: On distributions admitting a sufficient statistic. Transactions of the American Mathematical Society 39, 399–409 (1936)
Article MATH MathSciNet Google Scholar
Korf, R.: Linear-space best-first search. Artificial Intelligence 62, 41–78 (1993)
Article MATH MathSciNet Google Scholar
Lauritzen, S.: Lectures on Contingency Tables. University of Aalborg Press, Aalborg (1982)
Google Scholar
Lauritzen, S.: Propagation of probabilities, means, and variances in mixed graphical association models. Journal of the American Statistical Association 87, 1098–1108 (1992)
Article MATH MathSciNet Google Scholar
Lauritzen, S., Spiegelhalter, D.: Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B 50, 157–224 (1988)
MATH MathSciNet Google Scholar
Lauritzen, S., Thiesson, B., Spiegelhalter, D.: Diagnostic systems created by model selection methods: A case study. In: Cheeseman, P., Oldford, R. (eds.) AI and Statistics IV. Lecture Notes in Statistics, vol. 89, pp. 143–152. Springer, New York (1994)
Google Scholar
MacKay, D.: Bayesian interpolation. Neural Computation 4, 415–447 (1992a)
Article Google Scholar
MacKay, D.: A practical Bayesian framework for backpropagation networks. Neural Computation 4, 448–472 (1992b)
Article Google Scholar
MacKay, D.: Choice of basis for the Laplace approximation. Technical report, Cavendish Laboratory, Cambridge, UK (1996)
Google Scholar
Madigan, D., Garvin, J., Raftery, A.: Eliciting prior information to enhance the predictive performance of Bayesian graphical models. Communications in Statistics: Theory and Methods 24, 2271–2292 (1995)
Article MATH MathSciNet Google Scholar
Madigan, D., Raftery, A.: Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association 89, 1535–1546 (1994)
Article MATH Google Scholar
Madigan, D., Raftery, A., Volinsky, C., Hoeting, J.: Bayesian model averaging. In: Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, Portland, OR (1996)
Google Scholar
Madigan, D., York, J.: Bayesian graphical models for discrete data. International Statistical Review 63, 215–232 (1995)
Article MATH Google Scholar
Martin, J., VanLehn, K.: Discrete factor analysis: Learning hidden variables in bayesian networks. Technical report, Department of Computer Science, University of Pittsburgh, PA. (1995), http://bert.cs.pitt.edu/vanlehn
Google Scholar
Meng, X., Rubin, D.: Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association 86, 899–909 (1991)
Article Google Scholar
Neal, R.: Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto (1993)
Google Scholar
Olmsted, S.: On representing and solving decision problems. PhD thesis, Department of Engineering-Economic Systems, Stanford University (1983)
Google Scholar
Pearl, J.: Fusion, propagation, and structuring in belief networks. Artificial Intelligence 29, 241–288 (1986)
Article MATH MathSciNet Google Scholar
Pearl, J.: Causal diagrams for empirical research. Biometrika 82, 669–710 (1995)
Article MATH MathSciNet Google Scholar
Pearl, J., Verma, T.: A theory of inferred causation. In: Allen, J., Fikes, R., Sandewall, E. (eds.) Knowledge Representation and Reasoning: Proceedings of the Second International Conference, pp. 441–452. Morgan Kaufmann, New York (1991)
Google Scholar
Pitman, E.: Sufficient statistics and intrinsic accuracy. Proceedings of the Cambridge Philosophy Society 32, 567–579 (1936)
Article Google Scholar
Raftery, A.: Bayesian model selection in social research. In: Marsden, P. (ed.) Sociological Methodology. Blackwells, Cambridge (1995)
Google Scholar
Raftery, A.: Hypothesis testing and model selection, ch. 10. Chapman and Hall, Boca Raton (1996)
Google Scholar
Ramamurthi, K., Agogino, A.: Real time expert system for fault tolerant supervisory control. In: Tipnis, V., Patton, E. (eds.) Computers in Engineering, American Society of Mechanical Engineers, Corte Madera, CA, pp. 333–339 (1988)
Google Scholar
Ramsey, F.: Truth and probability. In: Braithwaite, R. (ed.) The Foundations of Methamatics and other Logical Essays, Humanities Press, London (1931); (Reprinted in Kyburg and Smokler, 1964)
Google Scholar
Richardson, T.: Extensions of undirected and acyclic, directed graphical models. In: Proceedings of Sixth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, pp. 407–419. Society for Artificial Intelligence in Statistics (1997)
Google Scholar
Rissanen, J.: Stochastic complexity (with discussion). Journal of the Royal Statistical Society, Series B 49, 223–239, 253–265 (1987)
MATH MathSciNet Google Scholar
Robins, J.: A new approach to causal interence in mortality studies with sustained exposure results. Mathematical Modelling 7, 1393–1512 (1986)
Article MATH MathSciNet Google Scholar
Rubin, D.: Bayesian inference for causal effects: The role of randomization. Annals of Statistics 6, 34–58 (1978)
Article MATH MathSciNet Google Scholar
Russell, S., Binder, J., Koller, D., Kanazawa, K.: Local learning in probabilistic networks with hidden variables. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, QU, pp. 1146–1152. Morgan Kaufmann, San Mateo (1995)
Google Scholar
Saul, L., Jaakkola, T., Jordan, M.: Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research 4, 61–76 (1996)
MATH Google Scholar
Savage, L.: The Foundations of Statistics. Dover, New York (1954)
MATH Google Scholar
Schervish, M.: Theory of Statistics. Springer, Heidelberg (1995)
MATH Google Scholar
Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Article MATH MathSciNet Google Scholar
Sewell, W., Shah, V.: Social class, parental encouragement, and educational aspirations. American Journal of Sociology 73, 559–572 (1968)
Article Google Scholar
Shachter, R.: Probabilistic inference and influence diagrams. Operations Research 36, 589–604 (1988)
Article MATH Google Scholar
Shachter, R., Andersen, S., Poh, K.: Directed reduction algorithms and decomposable graphs. In: Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, Boston, MA, pp. 237–244. Association for Uncertainty in Artificial Intelligence, Mountain View, CA (1990)
Google Scholar
Shachter, R., Kenley, C.: Gaussian influence diagrams. Management Science 35, 527–550 (1989)
Article Google Scholar
Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York (1986)
MATH Google Scholar
Singh, M., Provan, G.: Efficient learning of selective Bayesian network classifiers. Technical Report MS-CIS-95-36, Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA (November 1995)
Google Scholar
Spetzler, C., Stael von Holstein, C.: Probability encoding in decision analysis. Management Science 22, 340–358 (1975)
Article Google Scholar
Spiegelhalter, D., Dawid, A., Lauritzen, S., Cowell, R.: Bayesian analysis in expert systems. Statistical Science 8, 219–282 (1993)
Article MATH MathSciNet Google Scholar
Spiegelhalter, D., Lauritzen, S.: Sequential updating of conditional probabilities on directed graphical structures. Networks 20, 579–605 (1990)
Article MATH MathSciNet Google Scholar
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. Springer, New York (1993)
MATH Google Scholar
Spirtes, P., Meek, C.: Learning Bayesian networks with discrete variables from data. In: Proceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Suermondt, H., Cooper, G.: A combination of exact algorithms for inference on Bayesian belief networks. International Journal of Approximate Reasoning 5, 521–542 (1991)
Article MathSciNet Google Scholar
Thiesson, B.: Accelerated quantification of Bayesian networks with incomplete data. In: Proceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU, pp. 306–311. Morgan Kaufmann, San Francisco (1995a)
Google Scholar
Thiesson, B: Score and information for recursive exponential models with incomplete data. Technical report, Institute of Electronic Systems, Aalborg University, Aalborg, Denmark (1995b)
Google Scholar
Thomas, A., Spiegelhalter, D., Gilks, W.: Bugs: A program to perform Bayesian inference using Gibbs sampling. In: Bernardo, J., Berger, J., Dawid, A., Smith, A. (eds.) Bayesian Statistics, vol. 4, pp. 837–842. Oxford University Press, Oxford (1992)
Google Scholar
Tukey, J.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)
MATH Google Scholar
Tversky, A., Kahneman, D.: Judgment under uncertainty: Heuristics and biases. Science 185, 1124–1131 (1974)
Article Google Scholar
Verma, T., Pearl, J.: Equivalence and synthesis of causal models. In: Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence, Boston, MA, pp. 220–227. Morgan Kaufmann, San Francisco (1990)
Google Scholar
Whittaker, J.: Graphical Models in Applied Multivariate Statistics. John Wiley and Sons, Chichester (1990)
MATH Google Scholar
Winkler, R.: The assessment of prior distributions in Bayesian analysis. American Statistical Association Journal 62, 776–800 (1967)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Advanced Technology Division, Microsoft Corporation, One Microsoft Way, Redmond, WA, 98052
David Heckerman

Authors

David Heckerman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, 93106, USA
Dawn E. Holmes
University of South Australia Adelaide, Mawson Lakes, SA, 5095, Australia
Lakhmi C. Jain (Professor of Knowledge-Based Engineering) (Professor of Knowledge-Based Engineering)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Heckerman, D. (2008). A Tutorial on Learning with Bayesian Networks. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Bayesian Networks. Studies in Computational Intelligence, vol 156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85066-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-85066-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85065-6
Online ISBN: 978-3-540-85066-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics