STAT 425: Introduction to Bayesian Analysis

Similar documents
Non-informative Priors Multiparameter Models

Bayesian Normal Stuff

Bayesian Linear Model: Gory Details

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

Conjugate Models. Patrick Lam

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Chapter 7: Estimation Sections

1 Bayesian Bias Correction Model

Chapter 7: Estimation Sections

Chapter 8: Sampling distributions of estimators Sections

Chapter 7: Estimation Sections

CS340 Machine learning Bayesian statistics 3

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Objective Bayesian Analysis for Heteroscedastic Regression

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

START HERE: Instructions. 1 Exponential Family [Zhou, Manzil]

The Normal Distribution

6. Genetics examples: Hardy-Weinberg Equilibrium

GOV 2001/ 1002/ E-200 Section 3 Inference and Likelihood

Extended Model: Posterior Distributions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

(5) Multi-parameter models - Summarizing the posterior

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Common one-parameter models

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

MAS6012. MAS Turn Over SCHOOL OF MATHEMATICS AND STATISTICS. Sampling, Design, Medical Statistics

Random Variables Handout. Xavier Vilà

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Part II: Computation for Bayesian Analyses

Business Statistics 41000: Probability 3

Regret-based Selection

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

Efficiency Measurement with the Weibull Stochastic Frontier*

MATH 3200 Exam 3 Dr. Syring

Estimation Appendix to Dynamics of Fiscal Financing in the United States

Hierarchical Bayes Analysis of the Log-normal Distribution

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 4: Asymptotic Properties of MLE (Part 3)

Weight Smoothing with Laplace Prior and Its Application in GLM Model

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Lecture 10: Point Estimation

Modeling skewness and kurtosis in Stochastic Volatility Models

Continuous Distributions

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Lecture III. 1. common parametric models 2. model fitting 2a. moment matching 2b. maximum likelihood 3. hypothesis testing 3a. p-values 3b.

Confidence Intervals Introduction

Statistical Tables Compiled by Alan J. Terry

Statistics for Business and Economics

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

χ 2 distributions and confidence intervals for population variance

Practice Exam 1. Loss Amount Number of Losses

Normal Inverse Gaussian (NIG) Process

Exam STAM Practice Exam #1

A Saddlepoint Approximation to Left-Tailed Hypothesis Tests of Variance for Non-normal Populations

Statistics and Probability

What was in the last lecture?

Commonly Used Distributions

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Machine Learning for Quantitative Finance

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

CSC 411: Lecture 08: Generative Models for Classification

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

IEOR 165 Lecture 1 Probability Review

Computer Statistics with R

1. You are given the following information about a stationary AR(2) model:

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Valuing volatility and variance swaps for a non-gaussian Ornstein-Uhlenbeck stochastic volatility model

Lecture 2. Probability Distributions Theophanis Tsandilas

Comparison of Pricing Approaches for Longevity Markets

The Bernoulli distribution

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Equity correlations implied by index options: estimation and model uncertainty analysis

Bivariate Birnbaum-Saunders Distribution

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Generating Random Numbers

An Information Based Methodology for the Change Point Problem Under the Non-central Skew t Distribution with Applications.

Applied Statistics I

Qualifying Exam Solutions: Theoretical Statistics

Model 0: We start with a linear regression model: log Y t = β 0 + β 1 (t 1980) + ε, with ε N(0,

Application of MCMC Algorithm in Interest Rate Modeling

# generate data num.obs <- 100 y <- rnorm(num.obs,mean = theta.true, sd = sqrt(sigma.sq.true))

Simple Random Sampling. Sampling Distribution

Statistical Inference and Methods

Continuous random variables

ECSE B Assignment 5 Solutions Fall (a) Using whichever of the Markov or the Chebyshev inequalities is applicable, estimate

2 of PU_2015_375 Which of the following measures is more flexible when compared to other measures?

Confidence Intervals for an Exponential Lifetime Percentile

Transcription:

STAT 45: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 018 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 1 / 37

Lectures 9-11: Multi-parameter models The Normal model Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 / 37

Parameterizations of the Normal Distribution Mean and deviation: f(x µ, σ ) = 1 πσ e (x µ) σ, x R, σ > 0. Mean and precision: f(x µ, τ) = τ τ(x µ) π e, x R, τ = 1 σ > 0. The latter has advantages in numerical computations when σ 0 and simplify formulas. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 3 / 37

Summary pdf/pmf Domain Mean Variance Bern P (x) = p x (1 p) 1 x {0, 1} p p(1 p) Bin P (x) = N p x (1 p) N x x {0,..., N} Np Np(1 p) λ Poi P (x) = e λx N λ λ x! NB P (x) = r + x 1 p r (1 p) x N r 1 p r 1 p p p x { M P (x 1,..., x k ) = N! k x k! k px k {0,..., N} K Np k U f(x) = 1 b a [a, b] a+b Be f(x) = Γ(a+b) Γ(a)Γ(b) xa 1 (1 x) b 1 [0, 1] a a+b Ga f(x) = ba Γ(a) xa 1 e bx R + a b Np k (1 p k ) Np k p k (b a) 1 ab (a+b) (a+b+1) N f(x) = 1 e (x µ) σ R µ σ πσ a b MN f(x) = (π) p Σ 1 e 1 (X µ)t Σ 1 (X µ) R p µ Σ Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 4 / 37

Summary Model parameters MOM MLE UMVUE Bern p X X X X n 1 Bin p n S X X X nn nn Poi λ X X X NB r p X X n 1 with known r n S ˆr ˆr+ X r r+ X U a X 3 n 1 n S X (1) with a = 0 Ga b a b X + 3 n 1 n S n+1 X (n) n X (n) X n 1 with known a n S X ā n 1 n S X N µ X X X σ n 1 n 1 S n n S with known σ Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 5 / 37

Related Distributions Normal distribution X N(µ, σ ): Truncated normal distribution: f(x µ, σ, a, b) = Φ Standardized t-distribution: X µ s/ n t n 1(0, 1), X = 1 n n i=1 f(x µ, σ ) ) ( b µ σ Φ ( a µ σ X i, s = 1 n 1 Standard normal distribution X N(0, 1): Log-normal distribution: e µ+σx LN(µ, σ ); Cauchy distribution: X 1 /X Cauchy(0, 1); ); n (X i X). i=1 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 6 / 37

Bell-shaped Distributions Laplace distribution (double exponential distribution): f(x µ, b) = 1 x µ be b, x R, b > 0. Cauchy distribution: f(x µ, γ) = [ πγ 1 + 1 ( x µ γ ) ], x R, b > 0. t-distribution: f(x ν, µ, σ) = Γ ( ) [ ν+1 ( νπσγ ν ) 1 + 1 ( ) ] x µ ν+1 ν σ Logistic distribution: f(x µ, s) = s e x µ s (1 + e x µ s ), x R, s > 0., x R, ν > 0, σ > Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 7 / 37

Laplace, Cauchy, Standardized t and logistic Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 8 / 37

The Gamma distribution - a refresher The Gamma distribution is often used to model parameters that can only take positive values. In turn, this has been motivated by the fact that the Gamma distribution acts as a conjugate prior in many models θ Gamma(α, β) Gamma(5, 1) p(θ) = βα Γ(α) θα 1 e βθ α, β > 0 Gamma(1, β) Exp(β) (exponential density) dgamma(sort(x), shape = 5, rate = 1) 0.00 0.05 0.10 0.15 0.0 0 5 10 15 Gamma( ν, 1 ) χ ν (chi-square density) x Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 9 / 37

The Gamma distribution The Gamma distribution is often used to model parameters that can only take positive values. In turn, this has been motivated by the fact that the Gamma distribution acts as a conjugate prior in many models θ Gamma(α, β) Gamma(5, ) p(θ) = βα Γ(α) θα 1 e βθ α, β > 0 E(θ) = α β dgamma(sort(x), shape = 5, rate = ) 0.0 0.1 0. 0.3 0.4 Mode(θ) = α 1 β, α > 1 0 4 6 8 x V (θ) = α β Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 10 / 37

Possible models Data likelihood: f(x 1,..., x n µ, σ ) = Models: µ is unknown, σ is known; µ is known, σ is unknown; Both µ and σ are unknown: µ is dependent on σ ; µ and σ are independent. = n f(x i µ, σ ) i=1 n i=1 1 e (x µ) σ πσ = ( πσ ) n e ni=1 (x i µ) σ. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 11 / 37

Useful facts for derivations Normal component: if π(θ) e 1 (aθ bθ), then ( b θ N a, 1 ) a and 1 π a 1 e b a e 1 (aθ bθ) dθ = 1. Gamma component: if π(θ) θ a 1 e bθ, then θ Ga (a, b) and b a Γ(a) θa 1 e bθ dθ = 1. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 1 / 37

Student component: if π(θ) (δ + (θ l) S θ t δ (l, S) ) δ+1, then and 1 Γ ( ) δ+1 πs Γ ( ) δ δ δ (δ + ) δ+1 (θ l) S dθ = 1. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 13 / 37

The Normal Model x = (x 1,..., x n ) N(µ, σ ) i.i.d., with both µ and σ unknown. The likelihood is: L(µ, σ ) n 1 ( σ π exp 1 σ (x i µ) ) i=1 ( 1 ) n/ ( exp σ 1 σ (x i µ) ) For inference, focus is on p(µ, σ x) = p(µ σ, x)p(σ x). From a Bayesian perspective, it is easier to work with the precision, τ = 1. σ The likelihood becomes: n 1 ( L(µ, τ) τ 1/ exp 1 π τ(x i µ) ) i=1 τ n/ exp ( 1 τ i i (x i µ) ) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 14 / 37

Likelihood factorization: ( L(µ, τ) τ n/ exp 1 τ i ( τ n/ exp 1 τ i (x i µ) ) [(x i x) (µ x)] ) ( τ n/ exp 1 [ τ (x i x) + n(µ x) ]) ( τ n/ exp τ n/ exp 1 ) τs (n 1) ( 1 ) τss exp i ( exp 1 τn(µ x)) ( 1 τn(µ x)) with s = i (x i x) /(n 1) and SS = i (x i x) sample variance and sum of squares [SS and x sufficient statistics] Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 15 / 37

Non-informative Prior Non-informative prior: π(µ, σ ) 1 σ. This arises by considering µ and σ a priori independent and taking the product of the standard non-inf priors. This is not a conjugate setting (the posterior does not factor into a product of two independent distributions). Prior is improper but posterior is proper. This is also the Jeffreys prior. Joint posterior distribution of µ and σ is { p(µ, σ x) (σ ) (n/+1) exp 1 } σ [(n 1)s + n( x µ) ] where s = 1 n 1 n (x i x) i=1 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 16 / 37

The conditional posterior distribution, p(µ σ, x), is equivalent to deriving the posterior for µ when σ is known ) µ σ, x N ( x, σ n The marginal posterior p(σ x), is obtained integrating p(µ, σ x) over µ [Hint: integral of a Gaussian function c π = exp( 1 (µ + b) )dµ] c { p(σ x) (σ ) (n/+1) exp 1 } µ σ [(n 1)s + n( x µ) ] dµ } (σ ) [(n 1)/+1] (n 1)s exp { σ which is an inverse-gamma density, i.e. ( n 1 σ x Inv-Gamma, n 1 ) s Inv-χ (n 1, s ) or, equivalently, τ x Ga. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 17 / 37

Sampling from the joint posterior distribution One can simulate a value of (µ, σ ) from the joint posterior density by 1 simulating σ from an inverse-gamma ( ) n 1 n 1, s distribution [take the inverse of random samples from a Gamma ( ) n 1 n 1, s ] ( ) then simulating µ from N x, σ n distribution. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 18 / 37

Marginal posterior distribution p(µ x) of µ As µ is typically the parameter of interest (σ nuisance parameter) it is useful to calculate its marginal posterior distribution [Hint: integral of a Gamma function Γ(a)a p(µ x) = 0 0 p(µ, σ x)dσ (σ ) (n/+1) exp = A n/ z (n )/ exp( z)dz, 0 b a = 0 z a 1 exp( zb )dz] { 1 } σ [(n 1)s + n( x µ) ] dσ with A = (n 1)s + n( x µ), z = A σ [ A n/ = 1 + 1 ( ) ] µ x [(n 1)+1]/ n 1 s/ n that is, µ x t(n 1, x, s /n), or µ x s/ n x t n 1 with t n 1 the standard t-distribution with n 1 degrees of freedom Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 19 / 37

Conjugate Prior Model A conjugate prior must be of the form π(µ, σ ) = π(µ σ )π(σ ), e.g., µ σ N(µ 0, σ /τ 0 ), which corresponds to the joint prior density ( σ p(µ, σ ) 1/ ) exp τ 0 ( σ ν0 IG, SS ) [ ] 0 or τ Ga, { 1 } σ (µ µ 0 ) /τ 0 = (σ ) ( ν0 +1 +1 we call this a Normal-Inverse-Gamma prior, (µ, σ ) NIG(µ 0, τ 0, ν 0 /, SS 0 /) ) { (σ ) (ν0/+1) exp SS 0 σ { exp τ ( 0 SS 0 σ + (µ µ 0 ) τ 0 } )} Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 0 / 37

Joint Posterior p(µ, σ y) with ( ) p(µ, σ x) (σ ) ν0 +1 +1 (σ ) n/ exp { exp { 1 σ ( SS σ 0 + τ 0 (µ µ 0 ) ) } } n (y i µ) 1 i=1 (σ νn+1 ) ( +1) exp { τ ( n SS n σ + (µ µ n ) τ n µ σ, x N(µ n, σ /τ n ), µ n = µ 0 τ 0 σ + x n σ τ 0 σ + n σ )} = τ 0µ 0 + n x, τ n = τ 0 + n τ n ( σ νn x IG, SS ) n, ν n = ν 0 + n, SS n = SS 0 + SS + τ 0n ( x µ 0 ) τ n Thus, µ, σ y Normal-Inverse Gamma(µ n, τ n ; ν n /, SS n/). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 1 / 37

Also µ x t νn (µ n, σ n/τ n ), σ n = SS n/ν n [Note: Again N(m, σ /τ)ig(ν/, SS/)dσ = t ν (m, SS/(ντ)] Comments: µ n expected value for µ after seeing the data µ n = n τ n x + τ 0 τ n µ 0, weighted average τ n precision for estimating µ after n observations. ν n degrees of freedom [τ Ga(α/, β/) βτ χ α, with α degrees of freedom] SS n posterior variation as prior variation+observed variation+variation between prior mean and sample mean. Limiting case τ 0 0, ν 0 1 (and SS 0) then µ x t n 1 ( x, s /n) (same as improper prior!) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 / 37

Example on SPF (from Merlise Clyde) A Sunlight Protection Factor (SPF) of 5 means an individual that can tolerate X minutes of sunlight without any sunscreen can tolerate 5X minutes with sunscreen. Data on 13 individual (tolerance, in min, with and without sunscreen). Analysis should take into account pairing which induces dependence between observations (take differences and use ratios or log(ratios) = difference in logs). Ratios make more sense given the goals: how much longer can a person be exposed to the sun relative to their baseline. Model: Y = log(t RT ) log(cont ROL) N(µ, τ). Then E(log(T RT/CONT ROL)) = µ = log(sp F ). Interested in exp(µ) = SP F. Summary statistics: ȳ = 1.998, s = 0.55, n = 13 [make boxplots and Q-Q normal plots to check on normality] Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 3 / 37

Model formulation: Y = log(trt) log(control) N(µ, σ ), n = 13, ȳ = 1.998, SS = 0.55. Question: π(µ y 1,..., y n ) =? Bayesian model: Data likelihood: f(y 1,..., y n µ, σ ) = n i=1 N(y i; µ, σ ); Non-informative Prior: (µ, σ ) 1/σ ; Posterior: (µ, σ y 1,..., y n ) N(ȳ, σ /n)ig( n 1 n 1, s ) Posterior: µ y 1,..., y n t n 1 (ȳ, s n ); Prediction: y f y 1,..., y n t n 1 (ȳ, s (n 1)/n). Coding in R: rgamma(), rnorm() and rt(). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 4 / 37

With non-informative prior. Posterior: (µ, σ y 1,..., y n ) N(ȳ, σ /n)ig( n 1 n 1, s ) Posterior: µ y 1,..., y n t n 1 (ȳ, s n ) Define: vn = (n 1) = 1, SSn = s (n 1) = 0.55, mn = 1.998 Sampling from posterior: Draw τ Y tau = rgamma(10000, vn/, rate=ssn/) Draw µ τ, Y mu = rnorm(10000, mn, 1/sqrt(phi*n)) or draw µ Y directly mu = rt(10000,vn)*sqrt(ssn/(n*vn))+ mn Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 5 / 37

Model formulation: Y = log(trt) log(control) N(µ, σ ), n = 13, ȳ = 1.998, SS = 0.55. Question: π(µ y 1,..., y n ) =? Bayesian model: Data likelihood: f(y 1,..., y n µ, σ ) = n i=1 N(y i; µ, σ ); Conjugate Prior: µ σ N(µ 0, σ τ 0 ), σ IG( ν0, SS0 ); Posterior: (µ, σ y 1,..., y n ) NIG(µ n, τ n ; ν n /, SS n) Posterior: µ y 1,..., y n t νn (µ n, SSn τ nν n ); Prediction: y f y 1,..., y n t νn (µ n, SSn ν n τ n+1 τ n ). Coding in R: rgamma(), rnorm() and rt(). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 6 / 37

Expert opinions on µ: Best guess on median SPF is 16 P (µ > 64) = 0.01 information in prior is worth 5 observations Possible subjective prior: µ 0 = log(16), τ 0 = 5, ν 0 = τ 0 1 P (µ < log(64)) =.99 implies SS 0 = 185.7 Posterior hyperpar: τ n = 38, µ n =.508, ν u = 37, SS n = 197.134 Sampling from posterior: Draw τ Y tau = rgamma(10000, vn/, rate=ssn/) Draw µ τ, Y mu = rnorm(10000, mn, 1/sqrt(phi*tn)) or draw µ Y directly mu = rt(10000,vn)*sqrt(ssn/(tn*vn))+ mn Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 7 / 37

Transform to exp(µ). Find 95% C.I. of 4.54 to 3.758 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 8 / 37

Predictive Distribution of future z Posterior predictive distribution (given x = (x 1,..., x n )): p(z x) = p(z µ, σ, x)p(µ, σ x)dµdσ [Use assumption that z is independent of x given µ and σ, then integrate µ using the normal integral, then integrate σ using the Gamma integral] ) Reference prior: z x t n 1 ( x, s (n + 1)/n ( Conjugate prior: z x t νn µ n, σn(τ n + 1)/τ n ), σn = SSn/ν n [Can use the normal trick to integrate µ: If z N(µ, σ ) and µ N(µ 0, σ /τ 0 ) then y = z µ σ N(0, 1), that is z = d σy + µ and therefore z σ N(µ 0, σ (1 + 1 τ 0 )) since a linear comb of (independent) normals is normal with added mean and variance.] Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 9 / 37

Prior predictive distribution: What we expect the distribution to be before we observe the data, p(z) = p(z µ, σ )π(µ, σ )dµdσ z t ν0 (µ 0, SS 0 ν 0 (1 + 1 τ 0 )) [as above] [ N(µ, σ )N(µ 0, σ /τ 0 )IG(ν/, SS/)dµdσ = t ν (µ 0, SS ν (1 + 1 τ 0 ))] Note: This is what we used in the example to specify our subjective prior. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 30 / 37

Back to example Prior predictive distribution: z t 4 ( log(16), 185.7 4 (1 + 1 5 ) ) Posterior predictive distribution: z t 37 (.5, 5.3(1 + 1 38 ) ) Y=rt(10000,4)*sqrt((1+1/5)*187.5/4)+log(16) quantile(exp(y)) 0% 5% 50% 75% 100% 4.57e-06.3 16.78 114.98 370966. Sampling from posterior predictive leads to 50% C.I. (0.0003,1.4) - with sunscreen, 50% chance that next individual can be exposed from 0 to 1 times longer than without sunscreen. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 31 / 37

Semi-conjugate prior A semi-conjugate setting is obtained with independent priors π(µ, σ ) = π(µ)π(σ ) ( µ N(µ 0, σ0), σ ν0 IG, SS ) 0 then µ σ, x N(µ n, τ n), µ n = σ x not in closed form µ 0 σ0 + x n σ 1 + n, τn = σ0 σ 1 1 + n σ0 σ We will solve this with MCMC methods! Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 3 / 37

Summary of Conjugate Priors for the Normal Model Conjugate priors for normal data with unknown precision are τ Gamma( a, b ) µ τ N(µ 0, 1 τ 0 τ ) Here a, b, µ 0, and τ 0 are known hyper-parameters chosen to characterize the prior information. The problem with using this prior in practical data analysis is the difficulty of specifying a distribution for µ that is conditional on τ (which is also unknown). Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 33 / 37

Summary of Independence prior Here we assume that information about µ can be elicited independently of information on τ or σ, so p(µ, τ) = p(µ) p(τ) This makes elicitation relatively easy. Although the primary goal is to get a prior that reasonably captures the expert s information, independence priors work generally well. Usually, one considers Gamma priors for τ, since they are conjugate. But there s really no need, as long as the prior is defined on the positive real line. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 34 / 37

Proper (semi-conjugate) Reference Priors More recently priors such as µ N(0, b) τ Gamma(c, c) have been used as proper reference priors. In this case, b and c are chosen so that the prior precision for µ, 1/b, and both hyperparameters c in the Gamma distribution are near zero. Such priors are seen as approximation of the p(µ, τ) 1/τ improper default prior. Common choices are b = 10 6 and c = 0.001. Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 35 / 37

Back to the example We need to identify a prior distribution that gives information/no-information about the unknown parameters µ and τ = 1/σ. µ N(0, 10 6 ) as proper non-informative prior. Expert opinion that µ should be centered at 16. Then, µ N(16, 10 6 ) as diffuse prior. Expert 95% certain that the mean SPF should be µ should be between 10 and 75, that is, P r(10 < µ < 75) = 0.95. Then µ N(10, 0.0163) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 36 / 37

Back to the example We have no good information on σ, the variance of an observation So we can specify a reference (vague) prior on τ, which is independent of µ: τ Gamma(0.001, 0.001) Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 018 37 / 37