Conjugate Models. Patrick Lam

Similar documents
Non-informative Priors Multiparameter Models

Bayesian Normal Stuff

Chapter 7: Estimation Sections

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Common one-parameter models

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Chapter 7: Estimation Sections

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

CS 361: Probability & Statistics

STAT 425: Introduction to Bayesian Analysis

CS340 Machine learning Bayesian statistics 3

Bayesian Linear Model: Gory Details

This is a open-book exam. Assigned: Friday November 27th 2009 at 16:00. Due: Monday November 30th 2009 before 10:00.

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

1 Bayesian Bias Correction Model

The Normal Distribution

Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom

Generating Random Numbers

Chapter 7: Estimation Sections

Exam STAM Practice Exam #1

Chapter 4: Asymptotic Properties of MLE (Part 3)

MATH 3200 Exam 3 Dr. Syring

Part II: Computation for Bayesian Analyses

Metropolis-Hastings algorithm

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Distributions and Intro to Likelihood

GOV 2001/ 1002/ E-200 Section 3 Inference and Likelihood

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

STA 114: Statistics. Notes 10. Conjugate Priors

Chapter 5: Statistical Inference (in General)

Rules and Models 1 investigates the internal measurement approach for operational risk capital

# generate data num.obs <- 100 y <- rnorm(num.obs,mean = theta.true, sd = sqrt(sigma.sq.true))

Chapter 7: Point Estimation and Sampling Distributions

Extended Model: Posterior Distributions

Chapter 5. Statistical inference for Parametric Models

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

CS340 Machine learning Bayesian model selection

Chapter 7. Sampling Distributions and the Central Limit Theorem

STAT 111 Recitation 4

What was in the last lecture?

Chapter 8: Sampling distributions of estimators Sections

M.Sc. ACTUARIAL SCIENCE. Term-End Examination

Statistics for Business and Economics

1. You are given the following information about a stationary AR(2) model:

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Random Variables Handout. Xavier Vilà

6. Genetics examples: Hardy-Weinberg Equilibrium

Continuous Distributions

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Objective Bayesian Analysis for Heteroscedastic Regression

Chapter 9: Sampling Distributions

Back to estimators...

Extracting Information from the Markets: A Bayesian Approach

Chapter 7. Sampling Distributions and the Central Limit Theorem

Useful Probability Distributions

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

(11) Case Studies: Adaptive clinical trials. ST440/540: Applied Bayesian Analysis

Practice Exam 1. Loss Amount Number of Losses

EE641 Digital Image Processing II: Purdue University VISE - October 29,

Financial Risk Management

Model 0: We start with a linear regression model: log Y t = β 0 + β 1 (t 1980) + ε, with ε N(0,

Outline. Review Continuation of exercises from last time

START HERE: Instructions. 1 Exponential Family [Zhou, Manzil]

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Statistical Tables Compiled by Alan J. Terry

Bayesian course - problem set 3 (lecture 4)

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Commonly Used Distributions

(5) Multi-parameter models - Summarizing the posterior

Appendix A. Selecting and Using Probability Distributions. In this appendix

Random Samples. Mathematics 47: Lecture 6. Dan Sloughter. Furman University. March 13, 2006

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Weight Smoothing with Laplace Prior and Its Application in GLM Model

Stochastic Components of Models

MVE051/MSG Lecture 7

Continuous random variables

Actuarial Society of India EXAMINATIONS

Calibration of Interest Rates

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Confidence Intervals Introduction

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

5.3 Interval Estimation

Central limit theorems

Business Statistics 41000: Probability 3

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

8.1 Estimation of the Mean and Proportion

Machine Learning for Quantitative Finance

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Lecture 18. Ingo Ruczinski. October 31, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

STAT 825 Notes Random Number Generation

Transcription:

Conjugate Models Patrick Lam

Outline Conjugate Models What is Conjugacy? The Beta-Binomial Model The Normal Model Normal Model with Unknown Mean, Known Variance Normal Model with Known Mean, Unknown Variance

Outline Conjugate Models What is Conjugacy? The Beta-Binomial Model The Normal Model Normal Model with Unknown Mean, Known Variance Normal Model with Known Mean, Unknown Variance

Outline Conjugate Models What is Conjugacy? The Beta-Binomial Model The Normal Model Normal Model with Unknown Mean, Known Variance Normal Model with Known Mean, Unknown Variance

Conjugacy Suppose we have a Bayesian model with a likelihood p(y θ) and a prior p(θ). If we multiply our likelihood and prior, we get our posterior p(θ y) up to a constant of proportionality. If our posterior is a distribution that is of the same family as our prior, then we have conjugacy. We say that the prior is conjugate to the likelihood. Conjugate models are great because we know the exact distribution of the posterior so we can easily simulate or derive quantities of interest analytically. In practice, we rarely have conjugacy.

Brief List of Conjugate Models Likelihood Prior Posterior Binomial Beta Beta Negative Binomial Beta Beta Poisson Gamma Gamma Geometric Beta Beta Exponential Gamma Gamma Normal (mean unknown) Normal Normal Normal (variance unknown) Inverse Gamma Inverse Gamma Normal (mean and variance unknown) Normal/Gamma Normal/Gamma Multinomial Dirichlet Dirichlet

Outline Conjugate Models What is Conjugacy? The Beta-Binomial Model The Normal Model Normal Model with Unknown Mean, Known Variance Normal Model with Known Mean, Unknown Variance

A Binomial Example Suppose we have vector of data on voter turnout for a random sample of n voters in the 004 US Presidential election. We can model the voter turnout with a binomial model. Y Binomial(n, π) Quantity of interest: π (voter turnout) Assumptions: Each voter s decision to vote follows the Bernoulli distribution. Each voter has the same probability of voting. (unrealistic) Each voter s decision to vote is independent. (unrealistic)

The Conjugate Beta Prior We can use the beta distribution as a prior for π, since the beta distribution is conjugate to the binomial distribution. p(π y) p(y π)p(π) = Binomial(n, π) Beta(α, β) ( ) n = π y (n y) Γ(α + β) ( π) y Γ(α)Γ(β) π(α ) ( π) (β ) π y ( π) (n y) π (α ) ( π) (β ) p(π y) π y+α ( π) n y+β The posterior distribution is simply a Beta(y + α, n y + β) distribution. Effectively, our prior is just adding α successes and β failures to the dataset.

The Uninformative (Flat) Uniform Prior Suppose we have no strong prior beliefs about the parameters. We can choose a prior that gives equal weight to all possible values of the parameters, essentially an uninformative or flat prior. for all values of π. p(π) = constant For the binomial model, one example of a flat prior is the Beta(,) prior: p(π) = Γ() Γ()Γ() π( ) ( π) ( ) = which is the Uniform distribution over the [0, ] interval.

Since we know that a Binomial likelihood and a Beta(,) prior produces a Beta(y +, n y + ) posterior, we can simulate the posterior in R. Suppose our turnout data had 500 voters, of which 85 voted. > table(turnout) turnout 0 5 85 Setting our prior parameters at α = and β =, > a <- > b <- we get the posterior > posterior.unif.prior <- rbeta(0000, shape = 85 + a, shape = 500 - + 85 + b)

Outline Conjugate Models What is Conjugacy? The Beta-Binomial Model The Normal Model Normal Model with Unknown Mean, Known Variance Normal Model with Known Mean, Unknown Variance

Outline Conjugate Models What is Conjugacy? The Beta-Binomial Model The Normal Model Normal Model with Unknown Mean, Known Variance Normal Model with Known Mean, Unknown Variance

Normal Model with Unknown Mean, Known Variance Suppose we wish to estimate a model where the likelihood of the data is normal with an unknown mean µ and a known variance σ. Our parameter of interest is µ. We can use a conjugate Normal prior on µ, with mean µ 0 and variance τ 0. p(µ y, σ ) p(y µ, σ )p(µ) Normal(µ, τ ) = Normal(µ, σ ) Normal(µ 0, τ 0 )

Let θ represent our parameter of interest, in this case µ. p(θ y) ny i= exp (y «i θ) πσ σ p exp πτ 0! «(θ µ0) τ0 nx (y i θ) (θ µ0) exp σ τ i= 0 "!# = exp nx (y i θ) (θ µ0) + σ τ i= 0 "!# = exp nx τ σ τ0 0 (y i θ) +σ (θ µ 0) i= "!# = exp nx τ σ τ0 0 (yi θy i + θ )+σ (θ θµ 0 + µ 0) i=

We can multiply the θy i term in the summation by n n in order to get the equations in terms of the sufficient statistic ȳ. "!# p(θ y) exp nx τ σ τ0 0 (yi θ n n y i + θ )+σ (θ θµ 0 + µ 0) i= "!# = exp nx τ σ τ0 0 yi τ0 θnȳ + τ0 nθ +θ σ θµ 0σ + µ 0σ i= We can then factor the terms into several parts. Since µ 0 σ and τ0 n i= y i do not contain θ, we can represent them as some constant k, which we will drop into the normalizing constant.» p(θ y) exp θ σ + τ0 n θ µ 0σ + τ0 nȳ + k» = exp σ τ 0 θ σ + τ0 n σ τ0» = exp θ τ0 «θ + n σ «θ µ0σ + τ 0 nȳ µ0 τ 0 ««+ k σ τ0 + nȳ ««+ k σ

τ Let s multiply by 0 τ 0 p(θ y) exp 4 τ0 = exp 4 + n σ «+ n σ «in order to simplify the θ term. τ 0 = exp 4 τ 0 0 0 + n «@θ @ σ 0 + n «σ + n τ0 σ + n τ0 σ 0 @θ θ @ 0 + n «@θ σ 0 @ 0 A θ @ µ 0 + nȳ τ0 σ + n τ0 σ µ 0 + nȳ τ0 σ + n τ0 σ µ 0 τ 0 τ 0 + nȳ σ + n σ 3 A + ka5 3 AA 5 3 A + ka5 Finally, we have something that looks like the density function of a Normal distribution!

p(θ y) exp 4 τ 0 0 + n «@θ σ 0 @ µ 0 + nȳ τ0 σ + n τ0 σ 3 AA 5 Posterior Mean: µ = µ 0 τ 0 τ 0 + nȳ σ «+ n σ «Posterior Variance: τ = ( τ 0 + n σ ) Posterior Precision: τ = τ 0 + n σ Posterior Precision is just the sum of the prior precision and the data precision.

We can also look more closely at how the prior mean µ 0 and the posterior mean µ relate to each other. µ0 + nȳ τ0 µ = σ + n σ = τ 0 µ 0 σ +τ 0 nȳ τ 0 σ σ +nτ 0 τ 0 σ = µ0σ + τ0 nȳ σ + nτ0 µ 0σ = + τ 0 nȳ σ + nτ0 σ + nτ0 As n increases, data mean dominates prior mean. As τ0 decreases (less prior variance, greater prior precision), our prior mean becomes more important.

A Simple Example Suppose we have some (fake) data on the heights (in inches) of a random sample of 00 individuals in the U.S. population. > known.sigma.sq <- 6 > unknown.mean <- 68 > n <- 00 > heights <- rnorm(n, mean = unknown.mean, sd = sqrt(known.sigma.sq)) We believe that the heights are normally distributed with some unknown mean µ and a known variance σ = 6. Suppose before we see the data, we have a prior belief about the distribution of µ. Let our prior mean µ 0 = 7 and our prior variance τ 0 = 36. > mu0 <- 7 > tau.sq0 <- 36

Our posterior is a Normal distribution with Mean Variance ( τ 0 + n σ ) µ 0 τ 0 τ 0 > post.mean <- (mu0/tau.sq0 + (n * mean(heights)/known.sigma.sq))/(/tau.sq0 + + n/known.sigma.sq) > post.mean [] 68.03969 > post.var <- /(/tau.sq0 + n/known.sigma.sq) > post.var [] 0.5990 + nȳ σ «+ n σ «and

Outline Conjugate Models What is Conjugacy? The Beta-Binomial Model The Normal Model Normal Model with Unknown Mean, Known Variance Normal Model with Known Mean, Unknown Variance

Normal Model with Known Mean, Unknown Variance Now suppose we wish to estimate a model where the likelihood of the data is normal with a known mean µ and an unknown variance σ. Now our parameter of interest is σ. We can use a conjugate inverse gamma prior on σ, with shape parameter α 0 and scale parameter β 0. p(σ y, µ) p(y µ, σ )p(σ ) Invgamma(α, β ) = Normal(µ, σ ) Invgamma(α 0, β 0 )

Let θ represent our parameter of interest, in this case σ. ny p(θ y, µ) exp (y ««i µ) βα 0 0 i= πθ θ Γ(α 0) θ (α 0+) exp β0 θ ny θ exp (y ««i µ) θ (α0+) exp β0 θ θ i= P n = θ n exp i= (y i µ) ««θ (α0+) exp β0 θ θ» P n = θ (α 0+ n +) β0 exp θ + i= (y i µ) «θ 0 Pn = θ (α 0+ n +) exp 4 @ β0 + 3 i= (y i µ) A5 θ 0 Pn 3 = θ (α 0+ n +) exp 4 @ β0 + i= (y i µ) A5 θ This looks like the density of an inverse gamma distribution!

0 Pn 3 p(θ y, µ) θ (α 0+ n +) exp 4 @ β0 + i= (y i µ) A5 θ α = α 0 + n β = P n i= β 0 + i µ) Our posterior is an Invgamma(α 0 + n P, β n i= 0 + i µ) ) distribution.

A Simple Example Again suppose we have some (fake) data on the heights (in inches) of a random sample of 00 individuals in the U.S. population. > known.mean <- 68 > unknown.sigma.sq <- 6 > n <- 00 > heights <- rnorm(n, mean = known.mean, sd = sqrt(unknown.sigma.sq)) We believe that the heights are normally distributed with a known mean µ = 68 and some unknown variance σ. Suppose before we see the data, we have a prior belief about the distribution of σ. Let our prior shape α 0 = 5 and our prior scale β 0 = 0. > alpha0 <- 5 > beta0 <- 0

Our posterior is a inverse gamma distribution with shape α 0 + n and scale β 0 + P n i= (y i µ) > alpha <- alpha0 + n/ > beta <- beta0 + sum((heights - known.mean)^)/ > library(mcmcpack) > posterior <- rinvgamma(0000, alpha, beta) > post.mean <- mean(posterior) > post.mean [].8839 > post.var <- var(posterior) > post.var [] 3.36047 Hmm... what if we increased our sample size?

> n <- 000 > heights <- rnorm(n, mean = known.mean, sd = sqrt(unknown.sigma.sq)) > alpha <- alpha0 + n/ > beta <- beta0 + sum((heights - known.mean)^)/ > posterior <- rinvgamma(0000, alpha, beta) > post.mean <- mean(posterior) > post.mean [] 5.98 > post.var <- var(posterior) > post.var [] 0.505895