Objective Bayesian Analysis for Heteroscedastic Regression

Analysis for Heteroscedastic Regression & Esther Salazar Universidade Federal do Rio de Janeiro Colóquio Inter-institucional: Modelos Estocásticos e Aplicações 2009 Collaborators: Marco Ferreira and Thais Fonseca

Summary 1 2 Location-scale 3 Application: School spending 4 Student-t regression Exponential power 5

: Example 1 : Per capita income and per capita spending in public schools by state in the United States in 1979. Per capita spending 300 400 500 600 700 800 Student t LM with Alaska LM without Alaska outlier 6000 7000 8000 9000 10000 11000 Per capita income Linear and quadratic fits under Gaussian errors, dashed, and Student-t errors, solid.

: Example 1 Model: Student-t linear regression : Large values are associated with possible outliers 0 5 10 15 20 25 30 λ 1 i 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

: Example 2 Darwin s data: Differences in heights of 15 pairs of self- and cross-fertilized plants. Dot plot of Darwin s data and the sampling distributions of the data.

: Example 2 Model: Student-t distribution Large values are associated with possible outliers λ 1 i 0 10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Boxplots of λ 1 i, i = 1,..., 15

Location-scale

Location-scale Models based on the normal distribution are not robust to outliers Alternative? Location-scale s with heavy-tailed prior distributions Student t-distribution exponential power distribution between others

Influence function: Ad-Hoc S(x) 2 1 0 1 2 Normal Ad Hoc 1 Ad Hoc 2 10 5 0 5 10 x

Influence function where x = y µ s(x) = log p(x) x Examples: Normal (0, 1): s(x) = x Exponential Power (0, 1, p): s(x) = ( x) p 1 1 (x 0) + x p 1 1 (x>0) Student-t (0, 1, ν): s(x) = (ν+1)x ν+x 2, ν: d.f. Mixture of normals πn(0, 1) + (1 π)n(0, σ 2 ): s(x) = πx exp( x2 /2) + (1 π)xσ 3 exp( x 2 /(2σ 2 )) π exp( x 2 /2) + (1 π)σ 1 exp( x 2 /(2σ 2 ))

Influence function S(x) 10 5 0 5 10 Normal Exp. Power Student t Normal mixture 10 5 0 5 10 x

Normal scale mixture distributions Let X be a continuous random variable with location µ and scale σ. The pdf of X has the scale mixture of normal representation if f X (x µ, σ) = 0 N(x µ, κ(λ)σ 2 )π(λ)dλ where κ(λ) is a positive function and π( ) is a density function on R +. Example: The Student t-distribution t ν(x µ, σ) = Z 0 N(x µ, σ2 λ )Ga(λ ν 2, ν 2 )dλ that is, X t ν(µ, σ) follows the hierarchical form: X µ, σ, ν, λ N(µ, σ2 λ ) and λ ν Ga( ν 2, ν 2 )

Uniform scale mixture distributions Let X be a continuous random variable with location µ and scale σ. The pdf of X has the scale mixture of uniform representation if f X (x µ, σ) = 0 U(x µ κ(λ)σ 2, µ + κ(λ)σ 2 )π(λ)dλ where κ( ) is a positive function and π( ) is a density function on R +

Uniform scale mixture distributions Let X be a continuous random variable with location µ and scale σ. The pdf of X has the scale mixture of uniform representation if f X (x µ, σ) = U(x µ κ(λ)σ 2, µ + κ(λ)σ 2 )π(λ)dλ 0 where κ( ) is a positive function and π( ) is a density function on R + Example: The exponential power (EP) distribution 0 f X(x µ, σ, β) = c1 σ exp c @ 1/2 0 (x µ) 1 2/β A σ where β (0, 2] controls the kurtosis. The EP distribution follows the hierarchical form: X µ, σ, β, λ U µ σ λ β/2, µ + σ «λ β/2 2c0 2c0 λ β Ga(1 + β 2, 2 1/β )

Application: School spending

Application: School spending Application: School spending Student-t linear Dependent variable (y i ): per capita spending Regressor variables (x i ): per capita income and its square Student-t y i θ, σ, ν, λ i N(x iθ, σ2 λ i ) λ i ν Ga( ν 2, ν 2 ) Exponential power ( y i θ, σ, β, λ i U x iθ σ λ β/2 i, x 2c0 iθ + σ ) λ β/2 i 2c0 λ i β Ga(1 + β 2, 2 1/β )

Application: School spending (Student-t ) Linear Linear fit Boxplots of λ 1 i, i = 1,..., 50 Application: School spending Per capita spending 300 400 500 600 700 800 Student t LM with Alaska LM without Alaska outlier 0 5 10 15 20 25 30 6000 7000 8000 9000 10000 11000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Per capita income

Application: School spending (Student-t ) Linear Application: School spending Posterior summaries based on the independent Jeffreys prior (acceptance rate of ν: 0.42) and coefficients of linear with and without Alaska. Parameter Median 95% C.I. LM with LM without Alaska Alaska θ 1-74.26 ( -205.89, 54.57) -151.27-26.80 θ 2 578.25 (407.44, 752.98) 689.39 518.31 σ 46.68 (33.44, 62.88) 61.41 49.90 ν 4.36 (1.81, 16.67) - ν: degrees of freedom

Application: School spending (Exponential power ) Application: School spending 0 5 10 15 20 25 Linear Boxplots of λ i, i = 1,..., 50 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Posterior summaries: Parameter Median 95% C.I. θ 1-103.0 (-258.0, 13.5) θ 2 618.0 (464.0, 830.0) σ 51.9 (38.9, 71.4) β 1.32 (1.03, 2.19)

Application: School spending (quadratic ) Student-t vs EP Application: School spending Student-t : Parameter Median 95% C.I. θ 1 891.58 (-80.69, 1591.13) θ 2-2051.82 (-3842.39, 612.12) θ 3 1771.90 (-38.06, 2915.73) σ 46.90 (32.12, 63.54) ν 5.25 (1.83, 43.14) ν: degrees of freedom Exponential power : Parameter Median 95% C.I. θ 1 1188.83 (923.71, 1561.89) θ 2-2796.18 (-3760.45, -2140.38) θ 3 2225.43 (1801.55, 2863.88) σ 49.87 (36.88, 73.33) β 1.35 (1.03, 3.46)

Student-t regression Exponential power

Student-t regression : objective Student-t regression Exponential power Fonseca, Ferreira and Migon (2008) Consider the linear regression y = Xβ + ɛ ɛ = (ɛ 1,..., ɛ n ) is the error vector ɛ i s are i.i.d according to the Student-t distribution (0, σ, ν) X = (x 1,..., x n ) is the n p matrix of explanatory variables and of full-rank p Model parameters: θ = (β, σ, ν) R p (0, ) 2

Jeffreys priors: Student-t regression Student-t regression Exponential power Class of improper prior distributions π(θ) π(p) σ a The independence Jeffreys prior and the Jeffreys-rule prior for Θ = (β, σ, ν), denoted by π I (Θ) and π R (Θ) are given by π I (Θ) : a = 1, π I (ν) ( ν ν+3 π R (Θ) : a = p + 1, ) 1/2 { Ψ ( ν 2 ) Ψ ( ν+1 2 ) 2(ν+3) ν(ν+1) 2 } 1/2 π R (ν) π I (ν) ( ν+1 ν+3 ) p/2

Jeffreys priors: Student-t regression Student-t regression Exponential power Corollary The marginal independence Jeffreys prior for ν given by π I (θ) is a continuous function in [0, ) and is such that π I (ν) = O(ν 1/2 ) as ν 0 and π I (ν) = O(ν 2 ) as ν. Corollary Provided that n p + 1, (i) the independence Jeffreys prior π I (θ) and the Jeffreys-rule prior π R (θ) yield proper posterior densities, and (ii) the marginal posteriors π I (ν y, x) and π R (ν y, x) do not have any positive integer moments.

Exponential power : objective Student-t regression Exponential power Density: EP(µ, σ p, p) p(y µ, σ p, p) = [ ] 1 2p 1/p [ σ p Γ(1 + 1/p) exp (pσ p p ) 1 x µ p] with < y <, < µ <, σ > 0 and p > 0 µ = E(y), the location parameter σ p = [E( y µ p )] 1/p, the scale parameter p the shape parameter Reparametrization, similar to Zhu & Zinde-Walsh (2009): p(y µ, σ, p) = 1» «p Γ(1 + 1/p) y µ 2σ exp σ where σ = p 1/p σ pγ(1 + 1/p)

Characteristics Student-t regression Exponential power p = 1 Laplace distribution p = 2 Normal p Uniform distribution 0 < p < 2 leptokurtic distributions p > 2 platikurtic distributions f(x) 0.0 0.1 0.2 0.3 0.4 0.5 β = 1 (Laplace dist.) β = 1.5 β = 2 (Normal dist.) β = 3.5 4 2 0 2 4 x

Kurtosis Student-t regression Exponential power kurtosis 2 3 4 5 6 1.8 1 2 3 4 5 6 7 8 Figure: Kurtosis function for values of p between 1 and 8. Dashed lines represent especial cases: Laplace distribution (p = 1, κ = 6), normal distribution (p = 2, κ = 3). Horizontal line represents the kurtosis value for uniform distribution (p, κ = 1.8). p

EP regression Student-t regression Exponential power The data consist of n observations y = (y 1,..., y n ) satisfying where y = xβ + ɛ ɛ = (ɛ 1,..., ɛ n ) is the error vector ɛ i s are i.i.d according to the EP(0, σ, p) x is the known n k matrix of explanatory variables assumed to have full rank and β = (β 1,..., β k ) R k are unknown regression parameters

Jeffreys prior: EP Student-t regression Exponential power The Fisher information matrix on Θ = (θ, σ, p) is given by I(θ) = 1 σ Γ( 1 2 p )Γ(2 1 p ) n i=1 x ix i 0 0 np 0 σ 2 0 n σp n σp n p 3 (1 + 1 p )Ψ (1 + 1 p ) Ψ( ) and Ψ ( ) are the digamma and trigamma functions, respectively. Note that: Jeffreys-rule prior is that associated with the single group {(β, σ, p)} and Independence Jeffreys prior is that associated with {β, (σ, p)}

Jeffreys prior: EP Student-t regression Exponential power The reference prior for {β, (σ, p)} (independence Jeffreys prior) and for {(β, σ, p)} (Jeffreys-rule prior) denoted by π I (Θ) and π R (Θ), belong to the class of improper prior distributions given by π(θ) π(p) σ a where a R and π(p) is the marginal prior of the shape parameter p π I (Θ) : a = 1, π R (Θ) : a = k + 1, π I (p) 1 p π R (p) [( ) ( ) ] 1/2 1 + 1 p Ψ 1 + 1 p 1 [ Γ ( 1 p ) ( k/2 Γ 2 p)] 1 π I (p) Another independence prior ) 1/2 ( )] 1/2 π I2 (Θ) : a = 1, π I2 (p) (1 1 + 1 p 3/2 p [Ψ 1 + 1 p

Jeffreys prior: EP Student-t regression Exponential power For p L(p; y) = O(1). Moreover: π I (p) = O(p 1 ) for p π R (p) = O(p k/2 1 ) for p Due to the tail behavior of π I (p) and π R (p) both priors lead to improper posterior distributions π I2 (p) = O(p 3/2 ) for p This prior leads to a proper posterior distribution π(p) 0.0 0.2 0.4 0.6 0.8 1.0 2 4 6 8 10 12 14 p Proposed Ind. prior Ind. Jeffreys Jeffreys rule

Proposed proper prior as a calibration tool Student-t regression Exponential power p 2 4 6 8 10 12 * 0.5 1.0 1.5 2.0 σ (a) p 2 4 6 8 10 12 * 0.5 1.0 1.5 2.0 σ Figure: (a) Contour plot of the likelihood function for (σ, p) considering a data set of size n = 50 with parameters β = 0 (fixed), σ = 1 and p = 1.8. (b) Contour plot of the joint posterior distribution of (σ, p) based on the proposed prior. The symbol indicates the position of the true values. (b)

Frequentist properties of estimators Student-t regression Exponential power 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.4 1.7 2.0 2.3 2.6 2.9 p (a) Coverage of 95%: p 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.4 1.7 2.0 2.3 2.6 2.9 p (b) Coverage of 95%: σ Figure: Frequentist coverage probability of 95% HPD credible intervals (solid line) and 95% confidence interval (dashed line) for p and σ based on proposed prior for n = 50 (circle) and n = 100 (triangle). Horizontal line indicates the 95% nominal level.

Frequentist properties of estimators Student-t regression Exponential power 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.1 1.4 1.7 2.0 2.3 2.6 2.9 p (a) MSE/p 0.00 0.05 0.10 0.15 0.20 0.25 0.30 1.1 1.4 1.7 2.0 2.3 2.6 2.9 p (b) MSE/σ Figure: Square root of the relative mean square error of estimators of p (left panel) and σ (right panel) based on the independence Jeffreys prior π I2 (solid line) and maximum likelihood estimation (dashed line) for n = 50 (circle) and n = 100 (triangle).

Application 1: Excess returns for Martin Marietta company 60 monthly observations (January,1982 to December, 1986. Variables: The excess rate of return for the Martin Marietta company 20 (y) and the index for the excess rate returns (x) for the New York stock exchange (CRSP). Excess return for Martin Marietta 0.0 0.2 0.4 0.6 0.05 0.00 0.05 0.10 CRSP (a) 0 1 2 3 4 5 6 7 0.2 0.0 0.2 0.4 (b) Figure: (a) Scatterplot of the data and fitted EP regression. (b) Histogram of the residuals from the fitted EP regression and fitted density (solid line).

Application 1: Excess returns for Martin Marietta company 0 1 2 3 4 5 6 7 1.0 1.2 1.4 1.6 1.8 p (a) 0 10 20 30 40 0.05 0.06 0.07 0.08 0.09 0.10 0.11 σ (b) Figure: Marginal posterior densities for (a) p and (b) σ. Vertical dashed lines are the posterior 95% HPD credible intervals.

Application 1: Excess returns for Martin Marietta company 0 10 20 30 0.04 0.02 0.00 0.01 0.02 0.03 1.0 1.5 2.0 β 1 β 2 (a) (b) 0.0 0.5 1.0 1.5 Figure: Marginal posterior densities for (a) β 1 and (b) β 2. Vertical dashed lines are the posterior 95% HPD credible intervals.

Application 1: Excess returns for Martin Marietta company Table: Posterior summaries: median, mode and 95% HPD credible interval, based on the independence Jeffreys prior π I 2. Parameter Median Mode 95% C.I. β 1-0.006-0.006 (-0.027, 0.014) β 2 1.327 1.295 (0.891, 1.844) σ 0.064 0.062 (0.047, 0.085) p 1.092 1.000 (1.000, 1.314)

Thank you migon@im.ufrj.br