On the shape of posterior densities and credible sets in instrumental variable regression models with reduced rank: An application of flexible sampling methods using neural networks
Introduction
There exist classes of statistical and econometric models where the conditional distribution of any parameter of interest, given the other parameters, has known analytical properties and elliptically shaped Bayesian HPD credible sets, see e.g. Berger (1985). However, the joint and marginal distributions of the parameters may have unknown analytical properties and non-elliptical HPD credible sets. Then it is not trivial to perform inference on the joint distribution. This may have strong effects on the measurement of uncertainty of forecasts and of certain policy measures. For instance, in labor market models it is important to know whether a certain credible set of the policy effects of training programs has a strongly asymmetric shape. In models of international financial markets, used for hedging currency risk, knowledge of a strongly non-elliptical credible set is important for the specification of an optimal hedging decision under risk. For details on econometric models we refer to e.g. Imbens and Angrist (1994) and Bos et al. (2000) and the references cited there. A canonical statistical model is given by Gelman and Meng (1991). A second issue is that one may have great difficulties when trying to simulate (pseudo-) random drawings from such a class of non-elliptical joint distributions; random drawings are required for inference on nonlinear functions of parameters of interest such as impulse responses, see Strachan and Van Dijk (2004). Even if it is relatively easy to simulate random drawings from the conditional distributions, multi-modality and/or high correlations may cause the Gibbs sampler to converge extremely slowly or even yield erroneous results.
A first contribution of this paper is to show that well-behaved conditional distributions of parameters of interest may occur together with ill-behaved marginals for the case of linear models with reduced rank. We focus on the class of instrumental variable (IV) regression models with possibly endogenous regressors. This class of models may exhibit reduced rank of the parameter matrix due to varying degrees of instrument quality and endogeneity. Under certain weak priors the conditional posterior distributions in this model are Student's , that is, at least if they are proper. In the presence of weak instruments the joint and marginal posteriors may, however, display highly non-elliptical contours.
A second contribution of this paper is that we introduce a class of neural network sampling methods which allow for sampling from a target (posterior) distribution that may be multi-modal or skew, or exhibit strong correlation among the parameters. That is, a class of methods to sample from non-elliptical distributions. Neural network sampling algorithms consist of two main steps. In the first step a neural network is constructed that approximates the target density. In the second step this neural network is embedded in a Metropolis–Hastings (MH) or importance sampling (IS) algorithm.2 With respect to the first step we emphasize that an important advantage of neural network functions is their ‘universal approximation property’. That is, neural network functions can provide approximations of any square integrable function to any desired accuracy.3 In the second step this neural network is used as an importance function in IS or as a candidate density in MH. In a ‘standard’ case of Monte Carlo integration, the MH candidate density function or the importance function is unimodal. If the target (posterior) distribution is multi-modal then a second mode may be completely missed in the MH approach and some drawings may have huge weights in the IS approach. As a consequence the convergence behavior of these Monte Carlo integration methods is rather uncertain. Thus, an important problem is the choice of the candidate or importance density especially when little is known a priori about the shape of the target density.
The proposed methods are applied on a set of illustrative examples of posterior distributions in IV regression models. Our results indicate that the neural network approach is feasible in cases where a ‘standard’ MH, IS or Gibbs approach would fail or be rather slow.4
The outline of the paper is as follows. In Section 2 we consider the shape of posterior densities in a simple IV regression model for simulated data; it is shown that the shapes of HPD credible sets depend on the quality of instruments and the level of endogeneity. In Section 3 we discuss how to construct a neural network approximation to a density, how to sample from a neural network density, and how to use these drawings within the IS or MH algorithm. Section 4 illustrates the neural network approach in examples of IV regressions with simulated data. Conclusions are given in Section 5.
Section snippets
On the shape of posterior densities and Bayesian credible sets in IV regression models with several degrees of endogeneity and instrument quality
In this section we analyze a class of models, IV regression models with possibly endogenous regressors, where the conditional posterior distributions of parameters of interest have known properties but the joint does not. Consider the following possibly overidentified IV model, also known as the incomplete simultaneous equations model (INSEM). Following Zellner et al. (1988), let:where is a vector of observations on the endogenous variable that is to be explained,
Approximating with and sampling from neural networks
Consider a certain distribution, for example a posterior distribution, with density kernel with . In the case of the IV regression model in the previous section we considered . Suppose the aim is to investigate some of the characteristics of , for example the mean and/or covariance matrix of a random vector . The approach followed in this paper consists of the following steps:
- 1.
Find a neural network approximation to the target density kernel .
- 2.
Obtain a
Illustrative examples
In this section we consider the posterior distributions in IV regression models in order to compare the performance of the Type 3 (mixture of densities) neural network sampling method (AdMit) with some other sampling methods.9
Conclusion
We have shown that the shape of Bayesian HPD credible sets is often non-elliptical in IV regression models with weak instruments and/or strong endogeneity. Structural inference is possible in the overidentified model but the credible sets may indicate large uncertainty. Unless one uses a truncated region of integration, reduced form inference is not possible due to an improper posterior. This has important implications for forecasting and policy analysis.
In order to accurately approximate
Acknowledgements
We thank Andrew Chesher, two anonymous referees, and participants of seminars at Econometric Institute Rotterdam, Tokyo Metropolitan University, CORE, ESEM, JSM, CEF, and meetings for comments on an earlier version of this paper. We are, in particular, indebted to Geert Dhaene for very helpful comments which led to substantial improvements. All remaining errors are the authors’ responsibility.
References (50)
- et al.
Adaptive radial-based direction sampling: some flexible and robust Monte Carlo integration methods
Journal of Econometrics
(2004) - et al.
Bayesian posterior distributions in limited information analysis of the simultaneous equation model using Jeffreys’ prior
Journal of Econometrics
(1998) - et al.
An artificial neural network-GARCH model for international stock return volatility
Journal of Empirical Finance
(1997) Bayesian regression analysis using poly-t densities
Journal of Econometrics
(1977)- et al.
Multilayer feedforward networks are universal approximators
Neural Networks
(1989) - et al.
Bayesian and classical approaches to instrumental variable regression
Journal of Econometrics
(2003) - et al.
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function
Neural Networks
(1993) - et al.
Estimation of long-run relationships from dynamic heterogeneous panels
Journal of Econometrics
(1995) - et al.
Further experience in Bayesian analysis using Monte Carlo integration
Journal of Econometrics
(1980) - et al.
Bayesian specification analysis and estimation of simultaneous equation models using Monte Carlo methods
Journal of Econometrics
(1988)
Does compulsory school attendance affect schooling and earnings?
Quarterly Journal of Economics
Bayesian limited information analysis revisited
Statistical Decision Theory and Bayesian Analysis
Daily exchange rate behaviour and hedging of currency risk
Journal of Applied Econometrics
Markov chain Monte Carlo convergence diagnostics: a comparative review
Journal of the American Statistical Association
Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables
Journal of the Royal Statistical Society B
Bayesian limited information analysis of the simultaneous equations model
Econometrica
Generalization of the Fortuin–Kasteleyn–Swendsen–Wang representation and Monte Carlo algorithm
Physical Review D
Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques
Econometrics Journal
A nonparametric approach to nonlinear time series analysis: estimation and simulation
There exists a neural network that does not make avoidable mistakes
A note on bivariate distributions that are conditionally normal
The American Statistician
Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Bayesian inference in econometric models using Monte Carlo integration
Econometrica
A new approach to the econometric analysis of nonstationary time series and business cycles
Econometrica
Cited by (56)
Newspapers' Content Policy and the Effect of Paywalls on Pageviews
2020, Journal of Interactive MarketingMulti-objective optimization using statistical models
2019, European Journal of Operational ResearchCitation Excerpt :This is left for future research. Another avenue for future research is the exploitation of different MCMC algorithms such those in Hoogerheide et al. (2007) or Bauwens et al. (2004). These MCMC algorithms can deal with multi-modal posteriors or disconnected parameter spaces (disconnected Pareto sets in the case of multi-objective optimization).
Sequentially adaptive Bayesian learning algorithms for inference and optimization
2019, Journal of EconometricsCitation Excerpt :This section looks at the performance of SABL in the simplest possible linear simultaneous equations setting: a single, exactly identified equation; equivalently, a linear model with a single (endogenous) covariate and a single instrument. This is a much-examined setting in econometrics, including Bayesian inference (Dreze, 1976, 1977; Geweke, 1996; Kleibergen and van Dijk, 1998; Hoogerheide et al., 2007). Our examples all employ proper prior distributions for the structural parameters, but the priors are quite diffuse in order to highlight the contribution of the likelihood function and provide comparability with a long literature on recovering posterior distributions in this situation (in addition to the references just cited, on this point see Zellner et al., 2014; Bastürk et al., 2016, 2017).
Importance sampling from posterior distributions using copula-like approximations
2019, Journal of EconometricsBayesian estimation of smoothly mixing time-varying parameter GARCH models
2014, Computational Statistics and Data AnalysisThe two-sided Weibull distribution and forecasting financial tail risk
2013, International Journal of Forecasting
- 1
Part of this paper was written when the third author was visiting scholar at CORE, Université catholique de Louvain, Belgium.