Part II: Computation for Bayesian Analyses

Similar documents
Chapter 7: Estimation Sections

Conjugate Models. Patrick Lam

Outline. Review Continuation of exercises from last time

Generating Random Numbers

Non-informative Priors Multiparameter Models

# generate data num.obs <- 100 y <- rnorm(num.obs,mean = theta.true, sd = sqrt(sigma.sq.true))

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

COS 513: Gibbs Sampling

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

(5) Multi-parameter models - Summarizing the posterior

Chapter 7: Estimation Sections

Calibration of Interest Rates

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

Bayesian Normal Stuff

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

Bayesian Multinomial Model for Ordinal Data

A Bayesian model for classifying all differentially expressed proteins simultaneously in 2D PAGE gels

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Bayesian course - problem set 3 (lecture 4)

Chapter 7: Estimation Sections

STA 114: Statistics. Notes 10. Conjugate Priors

Continuous Distributions

Adaptive Experiments for Policy Choice. March 8, 2019

STAT 425: Introduction to Bayesian Analysis

Application of MCMC Algorithm in Interest Rate Modeling

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

STA 532: Theory of Statistical Inference

CS 361: Probability & Statistics

The Bernoulli distribution

A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the Determination of Specifications in Food Industry

Extended Model: Posterior Distributions

Business Statistics 41000: Probability 3

Relevant parameter changes in structural break models

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

Modeling skewness and kurtosis in Stochastic Volatility Models

Extracting Information from the Markets: A Bayesian Approach

Statistical Computing (36-350)

Metropolis-Hastings algorithm

Strategies for Improving the Efficiency of Monte-Carlo Methods

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Lecture 17: More on Markov Decision Processes. Reinforcement learning

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Chapter 5. Statistical inference for Parametric Models

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 2 Uncertainty Analysis and Sampling Techniques

Chapter 4: Asymptotic Properties of MLE (Part 3)

IEOR E4602: Quantitative Risk Management

Estimation Appendix to Dynamics of Fiscal Financing in the United States

Model 0: We start with a linear regression model: log Y t = β 0 + β 1 (t 1980) + ε, with ε N(0,

Introduction to Sequential Monte Carlo Methods

Down-Up Metropolis-Hastings Algorithm for Multimodality

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

This homework assignment uses the material on pages ( A moving average ).

Supplementary Material: Strategies for exploration in the domain of losses

Chapter 8 Statistical Intervals for a Single Sample

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Computer Statistics with R

Simulation Wrap-up, Statistics COS 323

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom

Approximate Bayesian Computation using Indirect Inference

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Bayesian Linear Model: Gory Details

WC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology

PROBABILITY. Wiley. With Applications and R ROBERT P. DOBROW. Department of Mathematics. Carleton College Northfield, MN

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL

UPDATED IAA EDUCATION SYLLABUS

Computational Statistics Handbook with MATLAB

EE266 Homework 5 Solutions

Analysis of the Bitcoin Exchange Using Particle MCMC Methods

Monotonically Constrained Bayesian Additive Regression Trees

Evidence from Large Indemnity and Medical Triangles

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Statistical Inference and Methods

Efficiency Measurement with the Weibull Stochastic Frontier*

Option Pricing Using Bayesian Neural Networks

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Robust Regression for Capital Asset Pricing Model Using Bayesian Approach

Dealing with forecast uncertainty in inventory models

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

1. You are given the following information about a stationary AR(2) model:

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

ELEMENTS OF MONTE CARLO SIMULATION

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Chapter 8: Sampling distributions of estimators Sections

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Estimation after Model Selection

Comparison of Pricing Approaches for Longevity Markets

Analysis of truncated data with application to the operational risk estimation

IEOR E4602: Quantitative Risk Management

Estimating a Dynamic Oligopolistic Game with Serially Correlated Unobserved Production Costs. SS223B-Empirical IO

MVE051/MSG Lecture 7

Confidence Intervals Introduction

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

Oil Price Volatility and Asymmetric Leverage Effects

FREDRIK BAJERS VEJ 7 G 9220 AALBORG ØST Tlf.: URL: Fax: Monte Carlo methods

Transcription:

Part II: Computation for Bayesian Analyses 62 BIO 233, HSPH Spring 2015

Conjugacy In both birth weight eamples the posterior distribution is from the same family as the prior: Prior Likelihood Posterior Beta(a, b) Y i Bernoulli(θ) Beta(y + + a, n y + + b) Normal(m, V ) Y i Normal(µ, σ 2 ) Normal(m, V ) This is referred to as conjugacy arises because of specific choice of prior/likelihood combination Having the posterior be of a known distribution is very convenient we have the means to compute moments and quantiles for both the Beta and the Normal distribution 63 BIO 233, HSPH Spring 2015

Conjugacy was originally pursued and advocated because of this if you can fi it so that the posterior distribution is from a known family then life becomes much easier! particularly important in the past More generally, when the posterior is of a known form we say that the posterior is analytically tractable summary measures are available analytically For many problems, however, the posterior is not a known distribution 64 BIO 233, HSPH Spring 2015

Mean birth weight: µ and σ 2 unknown In Homework #1 you assumed that σ 2 was known More realistically, let s take σ 2 as unknown multiparameter setting, θ =(µ, σ 2 ) Likelihood remains the same Y i Normal(µ, σ 2 ), i =1,..., n L(y µ, σ 2 ) = ( ) { n 1 ep 1 2πσ 2 2σ 2 } n (y i µ) 2 i=1 How do we specify a bivariate prior distribution for (µ, σ 2 )? 65 BIO 233, HSPH Spring 2015

One option is the following noninformative prior π(µ, σ 2 ) 1 σ 2 uniform for (µ, logσ) on(, ) (, ) aprioriindependence Using Bayes Theorem, the joint posterior distribution is proportional to π(µ, σ 2 y) L(y µ, σ 2 )π(µ, σ 2 ) 66 BIO 233, HSPH Spring 2015

Unfortunately, this doesn t correspond to any commonly known joint distribution even if we performed the integration to get the normalizing constant, we d be stuck How do we proceed? How do summarize a distribution if it isn t available analytically? visualize the joint posterior distribution for (µ, σ 2 ) compute summary statistics We use simulation-based or Monte Carlo methods 67 BIO 233, HSPH Spring 2015

Monte Carlo methods Consider the Beta(85, 915) distribution posterior for the low birth weight eample To summarize this distribution, note that the mean and variance have closed form epressions but the median does not Suppose we could generate random deviates from a Beta distribution e.g., using rbeta() in R We could empirically estimate the median: > sampposterior <- rbeta(1000, 85, 915) > median(sampposterior) [1] 0.08486423 68 BIO 233, HSPH Spring 2015

More generally, one can estimate any summary measure by eploiting the duality between a distribution and samples generated from that distribution ## Mean > round(mean(sampposterior), digits=3) [1] 0.085 ## Standard deviation > round(sd(sampposterior), digits=3) [1] 0.009 ## 95% credible interval > round(quantile(sampposterior, c(0.025, 0.975)), digits=3) 2.5% 97.5% 0.069 0.104 ## P(theta > 0.10) > round(mean(sampposterior > 0.10), digits=3) [1] 0.061 69 BIO 233, HSPH Spring 2015

One can also visual the distribution > hist(sampposterior, breaks=seq(from=0.05, to=0.12, by=0.005), main="", lab=epression(theta * " = P(LBW)"), col="blue", freq=false) Density 0 10 20 30 40 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 θ = P(LBW) 70 BIO 233, HSPH Spring 2015

Monte Carlo refers to the use of samples or simulation as a means to learn about a distribution the term was coined by physicists working on the Manhattan Project in the 1940 s These ideas can be applied to any distribution the trick is being able to generate samples from π(θ y) Consider again the posterior for (µ, σ 2 ) π(µ, σ 2 y) 1 { ep 1 [ σn+2 (n 1)s 2 2σ 2 + n(y µ) 2] } At the outset, it isn t clear how to generate samples from this distribution 71 BIO 233, HSPH Spring 2015

We can, however, decompose the joint posterior distribution as π(µ, σ 2 y) = π(µ y,σ 2 )π(σ 2 y) where conditional posterior of µ y,σ 2 is a Normal(y, σ 2 /n) and the marginal posterior of σ 2 y is given by π(σ 2 y) ( σ 2) n+1 2 ep { 1σ [ ]} (n 1)s 2 2 2 which is the kernel for an inverse-gamma distribution: ( n 1 Inv-gamma, 2 (n 1)s 2 ) 2 72 BIO 233, HSPH Spring 2015

The decomposition suggests generating samples using the algorithm: (1) generate a random ( n 1 σ 2(r) Inv-gamma, 2 (2) generate a random µ (r) σ 2(r) Normal ( y, (n 1)s 2 ) 2 σ 2(r) n ) each cycle generates an independent random deviate from the joint posterior, π(µ, σ 2 y) (µ (1),σ 2(1) ) (µ (2),σ 2(2) ). 73 BIO 233, HSPH Spring 2015

One can also easily augment this algorithm to generate samples from the posterior predictive distribution f(ỹ y) = f(ỹ θ)π(θ y) θ This representation suggests imbedding a step where we generate random ỹ (r) Normal (µ (r), σ 2(r)) at the end of the r th cycle Marginally, the {ỹ (1),...,ỹ (R) } are a random sample from the target posterior predictive distribution 74 BIO 233, HSPH Spring 2015

Run the algorithm... > ## > load("northcarolina_data.dat") > n <- 100 > sampy <- sample(infants$weight, n) > ## > library(mcmcpack) >?rinvgamma > ## > R <- 1000 > sampposterior <- matri(na, nrow=r, ncol=3) > for(r in 1:R) + { + ## + sigmasq <- rinvgamma(1, (n-1)/2, ((n-1)*var(sampy))/2) + mu <- rnorm(1, mean(sampy), sqrt(sigmasq/n)) + ytilde <- rnorm(1, mu, sqrt(sigmasq)) + ## + sampposterior[r,] <- c(mu, sigmasq, ytilde) + } 75 BIO 233, HSPH Spring 2015

Visualize the results... > ## > library(ks) > fhat <- kde(=sampposterior[,1:2], H=Hpi(=sampPosterior[,1:2])) > ## > plot(fhat, lab=epression(mu), ylab=epression(sigma^2), lim=c(3,3.5), drawpoints=true, ptcol="red", pch="", lwd=3, aes=false) > ais(1, seq(from=3, to=3.5, by=0.1)) > ais(2, seq(from=0.5, to=1.2, by=0.1)) > ## > fhaty <- kde(=sampposterior[,3], h=hpi(sampposterior[,3])) > plot(fhaty, lim=c(0, 6), lab=epression("prediction, " * tilde(y)), ylab="", aes=false, col="red", lwd=3) > ais(1, seq(from=0, to=6, by=1)) > ais(2, seq(from=0, to=0.5, by=0.1)) 76 BIO 233, HSPH Spring 2015

Joint posterior distribution, π(µ, σ 2 y) µ σ 2 25 50 75 3.0 3.1 3.2 3.3 3.4 3.5 0.3 0.4 0.5 0.6 0.7 0.8 0.9 77 BIO 233, HSPH Spring 2015

Posterior predictive distribution, f(ỹ y) relative weight assigned to possible birth weight values, averaging over the uncertainty in our knowledge about µ and σ 2 density, f(y ~ y) 0.0 0.1 0.2 0.3 0.4 0.5 0 1 2 3 4 5 6 prediction, Y ~ 78 BIO 233, HSPH Spring 2015

Compute numerical summary measures... > ## > apply(sampposterior, 2, mean) [1] 3.2400460 0.5169768 3.1995524 > t(apply(sampposterior, 2, quantile, probs=c(0.5, 0.025, 0.975))) 50% 2.5% 97.5% [1,] 3.2402101 3.1011615 3.3802954 [2,] 0.5086291 0.3872956 0.6837703 [3,] 3.1964943 1.8347178 4.5187996 > cor(sampposterior[,1], sampposterior[,2]) [1] 0.001173572 79 BIO 233, HSPH Spring 2015

Sequential decomposition This strategy can be applied to a generic θ = (θ 1,...,θ p ) decompose the p-dimensional posterior distribution as π(θ y) = π(θ p y,θ 1,...,θ p 1 ) π(θ p 1 y,θ 1,...,θ p 2 )... π(θ 1 y) cycle through and sample from the p distributions sequentially each cycle generates an independent, random deviate from the multivariate posterior, π(θ y) Intuitive in that one breaks up the problem into a series of manageable pieces can then empirically summarize the joint distribution 80 BIO 233, HSPH Spring 2015

The Gibbs sampling algorithm For many problems, sequential decomposition doesn t yield a set of p distributions where each is of a known form Consider the set of p full conditionals π(θ 1 y,θ 1 ) π(θ 2 y,θ 2 ). π(θ p y,θ p ) it is often the case that these are each of a convenient form The Gibbs sampling algorithm generates samples by sequentially sampling from each these p full conditionals 81 BIO 233, HSPH Spring 2015

For eample, for p=2 the algorithm proceeds as follows: generate θ (1) 1 from π(θ 1 y,θ (0) 2 ) generate θ (1) 2 from π(θ 2 y,θ (1) 1 ) θ (1) = (θ (1) 1,θ(1) 2 ) θ (0) 2 is the starting value generate θ (2) 1 from π(θ 1 y,θ (1) 2 ) generate θ (2) 2 from π(θ 2 y,θ (2) 1 ) θ (2) = (θ (2) 1,θ(2) 2 ). 82 BIO 233, HSPH Spring 2015

Helpful to visualize the way the algorithm generates samples: 83 BIO 233, HSPH Spring 2015

Mean birth weight: Normal likelihood Suppose Y i i.i.d. Normal(µ, σ 2 )fori =1,..., n task is to learn about the unknown θ =(µ, σ 2 ) Further, suppose we adopt a flat prior for µ and an independent Inv-Gamma(a, b) priorforσ 2 the prior density is π(µ, σ 2 ) = ba Γ(a) ( ) σ 2 a 1 ep { b } σ 2 setting a=b=0 corresponds to the non informative prior we ve been using 84 BIO 233, HSPH Spring 2015

Applying Bayes Theorem, the joint posterior distribution is π(µ, σ 2 y) L(µ, σ 2 y)π(µ, σ 2 ) = ( ) { n 1 ep 1 2πσ 2 2σ 2 ba Γ(a) } n (y i µ) 2 i=1 ( σ 2 ) a 1 ep { b σ 2 } ( ) ( n/2+a+1 1 σ 2 ep { 1σ 1 2 2 )} n (y i µ) 2 + b i=1 It isn t immediately clear how to directly generate samples from this joint distribution 85 BIO 233, HSPH Spring 2015

Rather than directly sampling from π(µ, σ 2 y), wecanusethegibbs sampling algorithm and iterate between (i) (ii) ( µ σ 2 σ 2 ), y Normal y, n ( ) n σ 2 µ, y Inv-gamma 2 + a +1, D 2 + b where y is the sample mean and D = n i=1 (y i µ) 2 Result is a sequence of samples from the joint posterior: (µ (1),σ 2(1) ) (µ (2),σ 2(2) ). 86 BIO 233, HSPH Spring 2015

Markov Chain Monte Carlo The Gibbs algorithm is helpful in that a seemingly difficult problem is broken down into a series of manageable pieces But there are two problems! (1) The set of full conditionals does not jointly fully specify the target posterior distribution in contrast to the components obtained via a sequential decomposition: π(θ 1,θ 2 y) = π(θ 1 y) π(θ 2 y,θ 1 ) (2) The resulting samples are dependent dependency between θ (1) and θ (2) is generated by the value of θ (1) 2 say that the samples ehibit autocorrelation 87 BIO 233, HSPH Spring 2015

The result is that we don t have an independent sample from the joint posterior distribution empirical summary measures won t pertain to π(θ y) However, by construction, the sequence of samples constitute a Markov chain Further, it s possible to show that the stationary distribution for the Markov chain is the sought-after posterior distribution we say that once the Markov chain converges, oneisgeneratingsamples from the posterior distribution In practice, we need to run the chain long enough to ensure that it has converged to its stationary distribution make adjustments to remove autocorrelation in the samples 88 BIO 233, HSPH Spring 2015

The Gibbs algorithm is one member of a broader class of algorithms that generate samples from (arbitrary) posterior distributions The general technique is referred to as Markov Chain Monte Carlo MCMC each algorithm requires consideration of convergence and autocorrelation Other algorithms include, among many others: importance sampling adaptive rejection sampling the Metropolis algorithm the Metropolis-Hastings algorithm We are going to focus on the Metropolis-Hastings algorithm versatile and, as we ll see, generally a good algorithm in the contetof GLMs 89 BIO 233, HSPH Spring 2015

The Metropolis-Hastings Algorithm Mean birth weight: t-distribution likelihood So far we ve taken the continuous response, Y,tobeNormallydistributed An alternative is to use the t-distribution heavier tails provide an alternative that may be robust to unusual/outlier observations Specifically, suppose Y t ν (µ, σ 2 ) non-central, scaled t-distribution with ν degrees of freedom for y (, ), thedensityis f(y µ, σ 2,ν) = Γ ( ) ν+1 2 Γ ( ) ν 2 ( ) 1/2 1 [1+ πνσ 2 (y µ)2 νσ 2 ] ν+1 2 90 BIO 233, HSPH Spring 2015

Given an i.i.d sample of size n, thelikelihoodis L(µ, σ 2 y) = n i=1 Γ ( ) ν+1 2 Γ ( ) ν 2 ( ) 1/2 1 [1+ (y i µ) 2 ] ν+1 2 πνσ 2 νσ 2 here we take ν to be fied and known Again adopt a flat prior for µ and an independent Inv-Gamma(a, b) priorfor σ 2 : π(µ, σ 2 ) = b2 ( σ 2 ) a 1 ep { b } Γ(a) σ 2 Using Bayes Theorem, the posterior distribution is π(µ, σ 2 y) L(µ, σ 2 y)π(µ, σ 2 ) 91 BIO 233, HSPH Spring 2015

As with the posterior based on the Normal likelihood, this joint posterior doesn t belong to any known family We could again use the Gibbs sampling algorithm and iterate between the two full conditionals: π(µ σ 2, y) π(σ 2 µ, y) Unfortunately, neither of these are of a convenient form either! Q: How can we generate samples from a distribution that does not belong to any known family? 92 BIO 233, HSPH Spring 2015

Metropolis-Hastings Consider the general task of sampling from π(θ y) goal is to generate a sequence: θ (1), θ (2),... use these samples to evaluate summaries of the posterior Suppose the distribution corresponding to π(θ y) is unknown the functional form of the density doesn t correspond to a known distribution we only know the kernel of the density π(θ y) L(θ y)π(θ) integral in the denominator doesn t have a closed form epression software isn t readily available 93 BIO 233, HSPH Spring 2015

The Metropolis-Hastings algorithm proceeds as follows: let θ (r) be the current state in the sequence (i) generate a proposal for the net value of θ, denotedbyθ denote the density of the proposal distribution by q(θ θ (r) ) (ii) either reject the proposal θ (r+1) = θ (r) accept the proposal θ (r+1) = θ the decision to reject/accept the proposal is based on the flip ofacoin with probability ( = min 1, a r referred to as the acceptance ratio accept automatically if a r 1 π(θ y) π(θ (r) y) q(θ (r) θ ) ) q(θ θ (r) ) 94 BIO 233, HSPH Spring 2015

The algorithm boils down to being able to perform three tasks: (1) choose a proposal distribution (2) sample from the proposal distribution (3) compute the acceptance ratio Ideally the proposal distribution is as close to the target posterior distribution as possible intuitively, why does this make sense? mathematically, why does this make sense? But we also have to choose a proposal distribution that we can actually sample from! Q: Interpretation of the acceptance ratio? 95 BIO 233, HSPH Spring 2015

Mean birth weight: t-distribution likelihood For the birth weight eample, we couldn t sample directly from the two full conditionals: π(µ σ 2, y) n [1+ (y i µ) 2 i=1 νσ 2 ] ν+1 2 π(σ 2 µ, y) ( ) n/2+a+1 { 1 σ 2 ep b } n [1+ (y i µ) 2 ] ν+1 2 σ 2 νσ 2 i=1 Instead, we sample from each using the Metropolis-Hasting algorithm say: implement a Gibbs sampling algorithm with two Metropolis-Hastings steps or updates Need to choose proposal distributions for both updates 96 BIO 233, HSPH Spring 2015

Recall, for the Normal likelihood the two full conditionals were µ σ 2, y Normal ( y, σ 2 n ) σ 2 µ, y Inv-gamma (a, b ) where a = n/2+a +1and b = D/2+b Their densities are, respectively, q 1 (µ σ 2, y) = q 2 (σ 2 µ, y) = { } 1 2πσ2 /n ep n(µ y)2 2σ 2 ( ) } σ 2 a 1 ep { b b Γ(a ) σ 2 Q: Why might these be good proposal distributions? 97 BIO 233, HSPH Spring 2015

Suppose the current state in the Markov chain is (µ (r), σ 2(r) ) To sample µ (r+1) : generate a proposal, µ,fromanormal(y, σ 2(r) /n) calculate a r = min ( 1, π(µ σ 2(r), y) π(µ (r) σ 2(r), y) q 1 (µ (r) σ 2(r), y) q 1 (µ σ 2(r), y) ) generate a random U Uniform(0,1) if U<a r,acceptthemove set µ (r+1) equal to µ if U>a r,rejectthemove set µ (r+1) equal to µ (r) 98 BIO 233, HSPH Spring 2015

Whatever the decision, we have a value for µ (r+1) To sample σ 2(r+1) : generate a proposal, σ 2,fromanInv-gamma(a, b ) calculate a r = min ( 1, π(σ 2 µ (r+1), y) π(σ 2(r) µ (r+1), y) q 2 (σ 2(r) µ (r+1), y) q 2 (σ 2 µ (r+1), y) ) generate a random U Uniform(0,1) if U<a r,acceptthemove set σ 2(r+1) equal to σ 2 if U>a r,rejectthemove set σ 2(r+1) equal to σ 2(r) 99 BIO 233, HSPH Spring 2015

The Metropolis sampling algorithm The Metropolis algorithm is a special case of the Metropolis-Hastings algorithm where the proposal distribution is symmetric As such, q(θ (r) θ ) = q(θ θ (r) ) and the acceptance ratio reduces to ( π(θ ) y) a r = min 1, π(θ (r) y) However, if the target distribution is not symmetric then we might epect symmetric proposal distributions to not perform as well fewer proposals will be accepted Trade-off in terms of computational ease and efficiency of the sampling algorithm 100 BIO 233, HSPH Spring 2015

Practical issues for MCMC Following this algorithm yields a sequence of samples that form a Markov chain one whose stationary distribution is the desired posterior distribution Practical issues include: (1) specification of starting values, (µ (0), σ 2(0) ) (2) monitoring convergence of the chain (3) deciding how many samples to generate (4) accounting for correlation in the samples 101 BIO 233, HSPH Spring 2015

In practice, one often runs M chains simultaneously each with differing starting values pool samples across the chains when summarizing the posterior distribution Monitoring convergence is often done via visual inspection of the chains referred to as trace plots goal is to have good coverage of the parameter space eamine miing of the chains if M>1 102 BIO 233, HSPH Spring 2015

Posterior for µ based on a Normal likelihood Gibbs sampling algorithm µ 3.0 3.1 3.2 3.3 3.4 3.5 0 100 200 300 400 500 600 700 800 900 1000 Scan 103 BIO 233, HSPH Spring 2015

Posterior for σ 2 based on a Normal likelihood Gibbs sampling algorithm σ 2 0.2 0.4 0.6 0.8 1.0 0 100 200 300 400 500 600 700 800 900 1000 Scan 104 BIO 233, HSPH Spring 2015

Posterior for ψ = νσ 2 /(ν 2) based on a t-distribution likelihood Gibbs sampling algorithm with M-H updates ψ 0.7 0.9 1.1 1.3 1.5 1.7 0 100 200 300 400 500 600 700 800 900 1000 Scan 105 BIO 233, HSPH Spring 2015

If M>1, onecancalculatethepotential scale reduction (PSR) factor suppose each chain is run for R iterations for a given parameter, θ, them th chain is denoted θ m (1), θ m (2),...,θ m (R) let θ m and s 2 m denote the sample mean and variance of the m th chain calculate PSR for θ as B/R + W (R 1)/R PSR = W where W is the mean of the within-chain variance of θ M W = 1 M m=1 s 2 m 106 BIO 233, HSPH Spring 2015

and B/R is the between-chain variance of the chain means B R = 1 M 1 M (θ m θ) 2 m=1 typically set R such that PSR is less than 1.05 should be ensured for each parameter The number of samples should ideally be based on the Monte Carlo error see Homework #1 may not be clear cut if the algorithm is computationally epensive Autocorrelation arises because of the dependent nature of the sampling as one cycles through the full conditionals will also be a problem if the proposal distribution was poorly chosen typically handled by thinning several graphical and numerical diagnostics are also available 107 BIO 233, HSPH Spring 2015