On the shape of posterior densities and credible sets in instrumental variable regression models with reduced rank: An application of flexible sampling methods using neural networks

https://doi.org/10.1016/j.jeconom.2006.06.009Get rights and content

Abstract

Likelihoods and posteriors of instrumental variable (IV) regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating posterior probabilities and marginal densities using Monte Carlo integration methods like importance sampling or Markov chain Monte Carlo procedures the speed of the algorithm and the quality of the results greatly depend on the choice of the importance or candidate density. Such a density has to be ‘close’ to the target density in order to yield accurate results with numerically efficient sampling. For this purpose we introduce neural networks which seem to be natural importance or candidate densities, as they have a universal approximation property and are easy to sample from. A key step in the proposed class of methods is the construction of a neural network that approximates the target density. The methods are tested on a set of illustrative IV regression models. The results indicate the possible usefulness of the neural network approach.

Introduction

There exist classes of statistical and econometric models where the conditional distribution of any parameter of interest, given the other parameters, has known analytical properties and elliptically shaped Bayesian HPD credible sets, see e.g. Berger (1985). However, the joint and marginal distributions of the parameters may have unknown analytical properties and non-elliptical HPD credible sets. Then it is not trivial to perform inference on the joint distribution. This may have strong effects on the measurement of uncertainty of forecasts and of certain policy measures. For instance, in labor market models it is important to know whether a certain credible set of the policy effects of training programs has a strongly asymmetric shape. In models of international financial markets, used for hedging currency risk, knowledge of a strongly non-elliptical credible set is important for the specification of an optimal hedging decision under risk. For details on econometric models we refer to e.g. Imbens and Angrist (1994) and Bos et al. (2000) and the references cited there. A canonical statistical model is given by Gelman and Meng (1991). A second issue is that one may have great difficulties when trying to simulate (pseudo-) random drawings from such a class of non-elliptical joint distributions; random drawings are required for inference on nonlinear functions of parameters of interest such as impulse responses, see Strachan and Van Dijk (2004). Even if it is relatively easy to simulate random drawings from the conditional distributions, multi-modality and/or high correlations may cause the Gibbs sampler to converge extremely slowly or even yield erroneous results.

A first contribution of this paper is to show that well-behaved conditional distributions of parameters of interest may occur together with ill-behaved marginals for the case of linear models with reduced rank. We focus on the class of instrumental variable (IV) regression models with possibly endogenous regressors. This class of models may exhibit reduced rank of the parameter matrix due to varying degrees of instrument quality and endogeneity. Under certain weak priors the conditional posterior distributions in this model are Student's t, that is, at least if they are proper. In the presence of weak instruments the joint and marginal posteriors may, however, display highly non-elliptical contours.

A second contribution of this paper is that we introduce a class of neural network sampling methods which allow for sampling from a target (posterior) distribution that may be multi-modal or skew, or exhibit strong correlation among the parameters. That is, a class of methods to sample from non-elliptical distributions. Neural network sampling algorithms consist of two main steps. In the first step a neural network is constructed that approximates the target density. In the second step this neural network is embedded in a Metropolis–Hastings (MH) or importance sampling (IS) algorithm.2 With respect to the first step we emphasize that an important advantage of neural network functions is their ‘universal approximation property’. That is, neural network functions can provide approximations of any square integrable function to any desired accuracy.3 In the second step this neural network is used as an importance function in IS or as a candidate density in MH. In a ‘standard’ case of Monte Carlo integration, the MH candidate density function or the importance function is unimodal. If the target (posterior) distribution is multi-modal then a second mode may be completely missed in the MH approach and some drawings may have huge weights in the IS approach. As a consequence the convergence behavior of these Monte Carlo integration methods is rather uncertain. Thus, an important problem is the choice of the candidate or importance density especially when little is known a priori about the shape of the target density.

The proposed methods are applied on a set of illustrative examples of posterior distributions in IV regression models. Our results indicate that the neural network approach is feasible in cases where a ‘standard’ MH, IS or Gibbs approach would fail or be rather slow.4

The outline of the paper is as follows. In Section 2 we consider the shape of posterior densities in a simple IV regression model for simulated data; it is shown that the shapes of HPD credible sets depend on the quality of instruments and the level of endogeneity. In Section 3 we discuss how to construct a neural network approximation to a density, how to sample from a neural network density, and how to use these drawings within the IS or MH algorithm. Section 4 illustrates the neural network approach in examples of IV regressions with simulated data. Conclusions are given in Section 5.

Section snippets

On the shape of posterior densities and Bayesian credible sets in IV regression models with several degrees of endogeneity and instrument quality

In this section we analyze a class of models, IV regression models with possibly endogenous regressors, where the conditional posterior distributions of parameters of interest have known properties but the joint does not. Consider the following possibly overidentified IV model, also known as the incomplete simultaneous equations model (INSEM). Following Zellner et al. (1988), let:y1=y2β+ε,y2=Xπ+v,where y1 is a (T×1) vector of observations on the endogenous variable that is to be explained, y2

Approximating with and sampling from neural networks

Consider a certain distribution, for example a posterior distribution, with density kernel p(θ) with θRn. In the case of the IV regression model in the previous section we considered θ=(β,π). Suppose the aim is to investigate some of the characteristics of p(θ), for example the mean and/or covariance matrix of a random vector θp(θ). The approach followed in this paper consists of the following steps:

  • 1.

    Find a neural network approximation nn:RnR to the target density kernel p(θ).

  • 2.

    Obtain a

Illustrative examples

In this section we consider the posterior distributions in IV regression models in order to compare the performance of the Type 3 (mixture of t densities) neural network sampling method (AdMit) with some other sampling methods.9

Conclusion

We have shown that the shape of Bayesian HPD credible sets is often non-elliptical in IV regression models with weak instruments and/or strong endogeneity. Structural inference is possible in the overidentified model but the credible sets may indicate large uncertainty. Unless one uses a truncated region of integration, reduced form inference is not possible due to an improper posterior. This has important implications for forecasting and policy analysis.

In order to accurately approximate

Acknowledgements

We thank Andrew Chesher, two anonymous referees, and participants of seminars at Econometric Institute Rotterdam, Tokyo Metropolitan University, CORE, ESEM, JSM, CEF, and EC2 meetings for comments on an earlier version of this paper. We are, in particular, indebted to Geert Dhaene for very helpful comments which led to substantial improvements. All remaining errors are the authors’ responsibility.

References (50)

  • J.D. Angrist et al.

    Does compulsory school attendance affect schooling and earnings?

    Quarterly Journal of Economics

    (1991)
  • L. Bauwens et al.

    Bayesian limited information analysis revisited

  • J.O. Berger

    Statistical Decision Theory and Bayesian Analysis

    (1985)
  • C.S. Bos et al.

    Daily exchange rate behaviour and hedging of currency risk

    Journal of Applied Econometrics

    (2000)
  • M.K. Cowles et al.

    Markov chain Monte Carlo convergence diagnostics: a comparative review

    Journal of the American Statistical Association

    (1996)
  • P. Damien et al.

    Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables

    Journal of the Royal Statistical Society B

    (1999)
  • J.H. Drèze

    Bayesian limited information analysis of the simultaneous equations model

    Econometrica

    (1976)
  • R.G. Edwards et al.

    Generalization of the Fortuin–Kasteleyn–Swendsen–Wang representation and Monte Carlo algorithm

    Physical Review D

    (1988)
  • S. Frühwirth-Schnatter

    Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques

    Econometrics Journal

    (2004)
  • A.R. Gallant et al.

    A nonparametric approach to nonlinear time series analysis: estimation and simulation

  • A.R. Gallant et al.

    There exists a neural network that does not make avoidable mistakes

  • A. Gelman et al.

    A note on bivariate distributions that are conditionally normal

    The American Statistician

    (1991)
  • S. Geman et al.

    Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1984)
  • J. Geweke

    Bayesian inference in econometric models using Monte Carlo integration

    Econometrica

    (1989)
  • J.D. Hamilton

    A new approach to the econometric analysis of nonstationary time series and business cycles

    Econometrica

    (1989)
  • Cited by (56)

    • Multi-objective optimization using statistical models

      2019, European Journal of Operational Research
      Citation Excerpt :

      This is left for future research. Another avenue for future research is the exploitation of different MCMC algorithms such those in Hoogerheide et al. (2007) or Bauwens et al. (2004). These MCMC algorithms can deal with multi-modal posteriors or disconnected parameter spaces (disconnected Pareto sets in the case of multi-objective optimization).

    • Sequentially adaptive Bayesian learning algorithms for inference and optimization

      2019, Journal of Econometrics
      Citation Excerpt :

      This section looks at the performance of SABL in the simplest possible linear simultaneous equations setting: a single, exactly identified equation; equivalently, a linear model with a single (endogenous) covariate and a single instrument. This is a much-examined setting in econometrics, including Bayesian inference (Dreze, 1976, 1977; Geweke, 1996; Kleibergen and van Dijk, 1998; Hoogerheide et al., 2007). Our examples all employ proper prior distributions for the structural parameters, but the priors are quite diffuse in order to highlight the contribution of the likelihood function and provide comparability with a long literature on recovering posterior distributions in this situation (in addition to the references just cited, on this point see Zellner et al., 2014; Bastürk et al., 2016, 2017).

    View all citing articles on Scopus
    1

    Part of this paper was written when the third author was visiting scholar at CORE, Université catholique de Louvain, Belgium.

    View full text