ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Similar documents
(5) Multi-parameter models - Summarizing the posterior

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections

Common one-parameter models

CS340 Machine learning Bayesian statistics 3

Non-informative Priors Multiparameter Models

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

Chapter 7: Estimation Sections

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Stochastic Volatility (SV) Models

Chapter 8: Sampling distributions of estimators Sections

Outline. Review Continuation of exercises from last time

Bayesian Normal Stuff

Sampling Distribution

START HERE: Instructions. 1 Exponential Family [Zhou, Manzil]

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

Lecture 9 - Sampling Distributions and the CLT

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Machine Learning for Quantitative Finance

Modelling Returns: the CER and the CAPM

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

This is a open-book exam. Assigned: Friday November 27th 2009 at 16:00. Due: Monday November 30th 2009 before 10:00.

Objective Bayesian Analysis for Heteroscedastic Regression

Lecture 6: Confidence Intervals

Bivariate Birnbaum-Saunders Distribution

Part II: Computation for Bayesian Analyses

Conjugate Models. Patrick Lam

Estimation Appendix to Dynamics of Fiscal Financing in the United States

Chapter 8. Introduction to Statistical Inference

8.1 Estimation of the Mean and Proportion

Modeling skewness and kurtosis in Stochastic Volatility Models

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Statistics and Probability

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Chapter 7 - Lecture 1 General concepts and criteria

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Chapter 8: Sampling distributions of estimators Sections

Confidence Intervals Introduction

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

Applied Statistics I

Lecture 3: Probability Distributions (cont d)

1 Introduction 1. 3 Confidence interval for proportion p 6

CS340 Machine learning Bayesian model selection

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Maximum Likelihood Estimation

Business Statistics 41000: Probability 3

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

Kernel Conditional Quantile Estimation via Reduction Revisited

Qualifying Exam Solutions: Theoretical Statistics

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Calibrating Financial Models Using Consistent Bayesian Estimators

STAT 425: Introduction to Bayesian Analysis

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

1 Bayesian Bias Correction Model

Lecture 2. Probability Distributions Theophanis Tsandilas

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

Sampling and sampling distribution

Regret-based Selection

Application of MCMC Algorithm in Interest Rate Modeling

Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

CS 361: Probability & Statistics

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Simulation of Extreme Events in the Presence of Spatial Dependence

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Small Area Estimation of Poverty Indicators using Interval Censored Income Data

Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

The Bernoulli distribution

BIO5312 Biostatistics Lecture 5: Estimations

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Statistical Intervals (One sample) (Chs )

Chapter 5. Statistical inference for Parametric Models

Generating Random Numbers

Inverse reinforcement learning from summary data

1. Variability in estimates and CLT

STA Module 3B Discrete Random Variables

Introduction to Sequential Monte Carlo Methods

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Back to estimators...

A Macro-Finance Model of the Term Structure: the Case for a Quadratic Yield Model

Tests for Two Variances

Efficiency Measurement with the Weibull Stochastic Frontier*

Maximum Likelihood Estimation

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Lecture 17: More on Markov Decision Processes. Reinforcement learning

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Uncertainty in Economic Analysis

STA 371G Outline Fall 2018

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Robust Regression for Capital Asset Pricing Model Using Bayesian Approach

Section 0: Introduction and Review of Basic Concepts

Business Statistics 41000: Probability 4

Unit 5: Sampling Distributions of Statistics

Probability & Statistics

Transcription:

(5) Multi-parameter models - Summarizing the posterior

Models with more than one parameter Thus far we have studied single-parameter models, but most analyses have several parameters For example, consider the normal model: Y i N(µ, σ 2 ) with priors µ N(µ 0, σ 2 0 ) and σ2 InvGamma(a, b) We want to study the joint posterior distribution p(µ, σ 2 Y) As another example, consider the simple linear regression model Y i N(β 0 + X 1i β 1, σ 2 ) We want to study the joint posterior f (β 0, β 1, σ 2 Y)

Models with more than one parameter How to compute high-dimensional (many parameters) posterior distributions? How to visualize the posterior? How to summarize them concisely?

Bayesian one-sample t-test In this section we will study the one-sample t-test in depth Likelihood: Y i µ, σ N(µ, σ 2 ) independent over i = 1,..., n Priors: µ N(µ 0, σ 2 0 ) independent of σ2 InvGamma(a, b) The joint (bivariate PDF) of (µ, σ 2 ) is proportional to { [ σ n exp n i=1 (Y i µ) 2 2σ 2 ]} exp [ (µ µ 0) 2 2σ 2 0 How to summarize this complicated function? ] (σ 2 ) a 1 exp( b σ 2 )

Plotting the posterior on a grid For models with only a few parameters we could simply plot the posterior on a grid That is, we compute p(µ, σ 2 Y 1,..., Y n ) for all combinations of m values of µ and m values of σ 2 The number of grid points is m p where p is the number of parameters in the model See http: //www4.stat.ncsu.edu/~reich/aba/code/nn

Summarizing the results in a table Typically we are interested in the marginal posterior f (µ Y) = where Y = (Y 1,..., Y n ) 0 p(µ, σ 2 Y)dσ 2 This accounts for our uncertainty about σ 2 We could also report the marginal posterior of σ 2 Results are usually given in a table with marginal mean, SD, and 95% interval for all parameters of interest The marginal posteriors can be computed using numerical integration See http: //www4.stat.ncsu.edu/~reich/aba/code/nn

Frequentist analysis of a normal mean In frequentist statistics the estimate of the mean is Ȳ If σ is known the 95% interval is Ȳ ± z 0.975 σ n where z is the quantile of a normal distribution If σ is unknown the 95% interval is s Ȳ ± t 0.975,n 1 n where t is the quantile of a t-distribution

Bayesian analysis of a normal mean The Bayesian estimate of µ is its marginal posterior mean The interval estimate is the 95% posterior interval If σ is known the posterior of µ Y is Gaussian and the 95% interval is E(µ Y) ± z 0.975 SD(µ Y) If σ is unknown the marginal (over σ 2 ) posterior of µ is t with ν = n + 2a degrees of freedom. Therefore the 95% interval is E(µ Y) ± t 0.975,ν SD(µ Y) See Marginal posterior of µ on http://www4.stat. ncsu.edu/~reich/aba/derivations5.pdf

Bayesian analysis of a normal mean The following two slides give the posterior of µ for a data set with sample mean 10 and sample variance 4 The Gaussian analysis assumes σ 2 = 4 is known The t analysis integrates over uncertainty in σ 2 As expected, the latter interval is a bit wider

Bayesian analysis of a normal mean n = 5 Density 0.00 0.01 0.02 0.03 0.04 Gaussian Student's t 6 8 10 12 14 µ

Bayesian analysis of a normal mean n = 25 Density 0.000 0.002 0.004 0.006 0.008 0.010 Gaussian Student's t 8 9 10 11 12 µ

Bayesian one sample t-test The one-sided test of H 1 : µ 0 versus H 2 : µ > 0 is conducted by computing the posterior probability of each hypothesis This is done with the pt function in R The two-sided test of H 1 : µ = 0 versus H 2 : µ 0 is conducted by either Determining if 0 is in the 95% posterior interval Bayes factor (later)

Methods for dealing with multiple parameters In this case, we were able to compute the marginal posterior in closed form (a t distribution) We were also able to compute the posterior on a grid For most analyses the marginal posteriors will not be a nice distributions, and a grid is impossible if there are many parameters We need new tools!

Methods for dealing with multiple parameters Some approaches to dealing with complicated joint posteriors: Just use a point estimate, ignore uncertainty Approximate the posterior as normal Numerical integration Monte Carlo sampling

MAP estimation Summarizing an entire joint distribution is challenging Sometimes you don t need an entire posterior distribution and a single point estimate will do Example: prediction in machine learning The Maximum a Posteriori (MAP) estimate is the posterior mode ˆθ MAP = argmin p(θ Y) θ This is similar to the maximum likelihood estimation but includes the prior

Univariate example Say Y θ Binomial(n, θ) and θ Beta(0.5, 0.5), find ˆθ MAP

Bayesian central limit theorem Another simplification is to approximate the posterior as Gaussian Berstein-Von Mises Theorem: As the sample size grows the posterior doesn t depend on the prior Frequentist result: As the sample size grows the likelihood function is approximately normal Bayesian CLT: For large n and some other conditions θ Y Normal

Bayesian central limit theorem Bayesian CLT: For large n and some other conditions θ Normal[ˆθ MAP, I(ˆθ MAP ) 1 ] I is Fisher s information matrix The (j, k) element of I is 2 log[p(θ Y)] θ j θ k evaluated at ˆθ MAP We have marginal and conditional means, standard deviations and intervals for the normal distribution

Univariate example Say Y θ Binomial(n, θ) and θ Beta(0.5, 0.5), find the Gaussian approximation for p(θ Y) http: //www4.stat.ncsu.edu/~reich/aba/code/bayes_clt

Numerical integration Many posterior summaries of interest are integrals over the posterior Ex: E(θ j Y) = θ j p(θ)dθ Ex: V(θ j Y) = [θ j E(θ Y)] 2 p(θ)dθ These are p dimensional integrals that we usually can t solve analytically A grid approximation is a crude approach Gaussian quadrature is better The Iteratively Nested Laplace Approximation (INLA) is an even more sophisticated method

Monte Carlo sampling MCMC is by far the most common method of Bayesian computing MCMC draws samples from the posterior to approximate the posterior This requires drawing samples from non-standard distributions It also requires careful analysis to be sure the approximation is sufficiently accurate

MCMC for the Bayesian t test In the one-parameter section we saw that if we knew either µ or σ 2, we can sample from the other parameter [ ] µ σ 2 nȳ, Y Normal σ 2 +µ 0 σ 2 0 1, nσ 2 +σ 2 0 nσ 2 +σ 2 0 σ 2 µ, Y InvGamma [ n 2 + a, 1 n 2 i 1 (Y i µ) 2 + b ] But how to draw from the joint distribution?