(5) Multi-parameter models - Summarizing the posterior
Models with more than one parameter Thus far we have studied single-parameter models, but most analyses have several parameters For example, consider the normal model: Y i N(µ, σ 2 ) with priors µ N(µ 0, σ 2 0 ) and σ2 InvGamma(a, b) We want to study the joint posterior distribution p(µ, σ 2 Y) As another example, consider the simple linear regression model Y i N(β 0 + X 1i β 1, σ 2 ) We want to study the joint posterior f (β 0, β 1, σ 2 Y)
Models with more than one parameter How to compute high-dimensional (many parameters) posterior distributions? How to visualize the posterior? How to summarize them concisely?
Bayesian one-sample t-test In this section we will study the one-sample t-test in depth Likelihood: Y i µ, σ N(µ, σ 2 ) independent over i = 1,..., n Priors: µ N(µ 0, σ 2 0 ) independent of σ2 InvGamma(a, b) The joint (bivariate PDF) of (µ, σ 2 ) is proportional to { [ σ n exp n i=1 (Y i µ) 2 2σ 2 ]} exp [ (µ µ 0) 2 2σ 2 0 How to summarize this complicated function? ] (σ 2 ) a 1 exp( b σ 2 )
Plotting the posterior on a grid For models with only a few parameters we could simply plot the posterior on a grid That is, we compute p(µ, σ 2 Y 1,..., Y n ) for all combinations of m values of µ and m values of σ 2 The number of grid points is m p where p is the number of parameters in the model See http: //www4.stat.ncsu.edu/~reich/aba/code/nn
Summarizing the results in a table Typically we are interested in the marginal posterior f (µ Y) = where Y = (Y 1,..., Y n ) 0 p(µ, σ 2 Y)dσ 2 This accounts for our uncertainty about σ 2 We could also report the marginal posterior of σ 2 Results are usually given in a table with marginal mean, SD, and 95% interval for all parameters of interest The marginal posteriors can be computed using numerical integration See http: //www4.stat.ncsu.edu/~reich/aba/code/nn
Frequentist analysis of a normal mean In frequentist statistics the estimate of the mean is Ȳ If σ is known the 95% interval is Ȳ ± z 0.975 σ n where z is the quantile of a normal distribution If σ is unknown the 95% interval is s Ȳ ± t 0.975,n 1 n where t is the quantile of a t-distribution
Bayesian analysis of a normal mean The Bayesian estimate of µ is its marginal posterior mean The interval estimate is the 95% posterior interval If σ is known the posterior of µ Y is Gaussian and the 95% interval is E(µ Y) ± z 0.975 SD(µ Y) If σ is unknown the marginal (over σ 2 ) posterior of µ is t with ν = n + 2a degrees of freedom. Therefore the 95% interval is E(µ Y) ± t 0.975,ν SD(µ Y) See Marginal posterior of µ on http://www4.stat. ncsu.edu/~reich/aba/derivations5.pdf
Bayesian analysis of a normal mean The following two slides give the posterior of µ for a data set with sample mean 10 and sample variance 4 The Gaussian analysis assumes σ 2 = 4 is known The t analysis integrates over uncertainty in σ 2 As expected, the latter interval is a bit wider
Bayesian analysis of a normal mean n = 5 Density 0.00 0.01 0.02 0.03 0.04 Gaussian Student's t 6 8 10 12 14 µ
Bayesian analysis of a normal mean n = 25 Density 0.000 0.002 0.004 0.006 0.008 0.010 Gaussian Student's t 8 9 10 11 12 µ
Bayesian one sample t-test The one-sided test of H 1 : µ 0 versus H 2 : µ > 0 is conducted by computing the posterior probability of each hypothesis This is done with the pt function in R The two-sided test of H 1 : µ = 0 versus H 2 : µ 0 is conducted by either Determining if 0 is in the 95% posterior interval Bayes factor (later)
Methods for dealing with multiple parameters In this case, we were able to compute the marginal posterior in closed form (a t distribution) We were also able to compute the posterior on a grid For most analyses the marginal posteriors will not be a nice distributions, and a grid is impossible if there are many parameters We need new tools!
Methods for dealing with multiple parameters Some approaches to dealing with complicated joint posteriors: Just use a point estimate, ignore uncertainty Approximate the posterior as normal Numerical integration Monte Carlo sampling
MAP estimation Summarizing an entire joint distribution is challenging Sometimes you don t need an entire posterior distribution and a single point estimate will do Example: prediction in machine learning The Maximum a Posteriori (MAP) estimate is the posterior mode ˆθ MAP = argmin p(θ Y) θ This is similar to the maximum likelihood estimation but includes the prior
Univariate example Say Y θ Binomial(n, θ) and θ Beta(0.5, 0.5), find ˆθ MAP
Bayesian central limit theorem Another simplification is to approximate the posterior as Gaussian Berstein-Von Mises Theorem: As the sample size grows the posterior doesn t depend on the prior Frequentist result: As the sample size grows the likelihood function is approximately normal Bayesian CLT: For large n and some other conditions θ Y Normal
Bayesian central limit theorem Bayesian CLT: For large n and some other conditions θ Normal[ˆθ MAP, I(ˆθ MAP ) 1 ] I is Fisher s information matrix The (j, k) element of I is 2 log[p(θ Y)] θ j θ k evaluated at ˆθ MAP We have marginal and conditional means, standard deviations and intervals for the normal distribution
Univariate example Say Y θ Binomial(n, θ) and θ Beta(0.5, 0.5), find the Gaussian approximation for p(θ Y) http: //www4.stat.ncsu.edu/~reich/aba/code/bayes_clt
Numerical integration Many posterior summaries of interest are integrals over the posterior Ex: E(θ j Y) = θ j p(θ)dθ Ex: V(θ j Y) = [θ j E(θ Y)] 2 p(θ)dθ These are p dimensional integrals that we usually can t solve analytically A grid approximation is a crude approach Gaussian quadrature is better The Iteratively Nested Laplace Approximation (INLA) is an even more sophisticated method
Monte Carlo sampling MCMC is by far the most common method of Bayesian computing MCMC draws samples from the posterior to approximate the posterior This requires drawing samples from non-standard distributions It also requires careful analysis to be sure the approximation is sufficiently accurate
MCMC for the Bayesian t test In the one-parameter section we saw that if we knew either µ or σ 2, we can sample from the other parameter [ ] µ σ 2 nȳ, Y Normal σ 2 +µ 0 σ 2 0 1, nσ 2 +σ 2 0 nσ 2 +σ 2 0 σ 2 µ, Y InvGamma [ n 2 + a, 1 n 2 i 1 (Y i µ) 2 + b ] But how to draw from the joint distribution?