Objective Bayesian Analysis for Heteroscedastic Regression

Similar documents
Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Non-informative Priors Multiparameter Models

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

CS340 Machine learning Bayesian statistics 3

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Business Statistics 41000: Probability 3

Bayesian Linear Model: Gory Details

Chapter 8: Sampling distributions of estimators Sections

STAT 425: Introduction to Bayesian Analysis

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm

discussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

1 Bayesian Bias Correction Model

Statistical Inference and Methods

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Lecture 10: Point Estimation

Chapter 7: Estimation Sections

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

Bivariate Birnbaum-Saunders Distribution

Chapter 4: Asymptotic Properties of MLE (Part 3)

ROM SIMULATION Exact Moment Simulation using Random Orthogonal Matrices

(5) Multi-parameter models - Summarizing the posterior

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

DYNAMIC ECONOMETRIC MODELS Vol. 8 Nicolaus Copernicus University Toruń Mateusz Pipień Cracow University of Economics

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Commonly Used Distributions

1. You are given the following information about a stationary AR(2) model:

Chapter 7: Estimation Sections

Reliability and Risk Analysis. Survival and Reliability Function

Random variables. Contents

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Lecture 2. Probability Distributions Theophanis Tsandilas

Statistics for Business and Economics

12. Conditional heteroscedastic models (ARCH) MA6622, Ernesto Mordecki, CityU, HK, 2006.

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Window Width Selection for L 2 Adjusted Quantile Regression

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

This is a open-book exam. Assigned: Friday November 27th 2009 at 16:00. Due: Monday November 30th 2009 before 10:00.

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

FINITE SAMPLE DISTRIBUTIONS OF RISK-RETURN RATIOS

IEOR 165 Lecture 1 Probability Review

Unit 5: Sampling Distributions of Statistics

Simulation of Extreme Events in the Presence of Spatial Dependence

Unit 5: Sampling Distributions of Statistics

Regret-based Selection

Statistical Intervals (One sample) (Chs )

Lecture Note 9 of Bus 41914, Spring Multivariate Volatility Models ChicagoBooth

Continuous Distributions

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

The rth moment of a real-valued random variable X with density f(x) is. x r f(x) dx

Conjugate Models. Patrick Lam

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Statistical Tables Compiled by Alan J. Terry

Random Variables Handout. Xavier Vilà

Monotonically Constrained Bayesian Additive Regression Trees

ECE 295: Lecture 03 Estimation and Confidence Interval

Hierarchical Bayes Analysis of the Log-normal Distribution

χ 2 distributions and confidence intervals for population variance

Modeling Co-movements and Tail Dependency in the International Stock Market via Copulae

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

2. The sum of all the probabilities in the sample space must add up to 1

Stochastic Volatility (SV) Models

Probability and Random Variables A FINANCIAL TIMES COMPANY

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Random Variables and Probability Distributions

An Information Based Methodology for the Change Point Problem Under the Non-central Skew t Distribution with Applications.

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Stochastic Models. Statistics. Walt Pohl. February 28, Department of Business Administration

A Saddlepoint Approximation to Left-Tailed Hypothesis Tests of Variance for Non-normal Populations

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Gamma Distribution Fitting

Chapter 7 1. Random Variables

Applications of Good s Generalized Diversity Index. A. J. Baczkowski Department of Statistics, University of Leeds Leeds LS2 9JT, UK

Chapter 7: Estimation Sections

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Exam STAM Practice Exam #1

Estimation Appendix to Dynamics of Fiscal Financing in the United States

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Normal Probability Distributions

Modeling skewness and kurtosis in Stochastic Volatility Models

Experimental Design and Statistics - AGA47A

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Occasional Paper. Risk Measurement Illiquidity Distortions. Jiaqi Chen and Michael L. Tindall

σ e, which will be large when prediction errors are Linear regression model

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

MATH 3200 Exam 3 Dr. Syring

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Financial Risk Management

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Solutions to Midterm

Data Distributions and Normality

Probability. An intro for calculus students P= Figure 1: A normal integral

The Two Sample T-test with One Variance Unknown

2 of PU_2015_375 Which of the following measures is more flexible when compared to other measures?

I. Return Calculations (20 pts, 4 points each)

Transcription:

Analysis for Heteroscedastic Regression & Esther Salazar Universidade Federal do Rio de Janeiro Colóquio Inter-institucional: Modelos Estocásticos e Aplicações 2009 Collaborators: Marco Ferreira and Thais Fonseca

Summary 1 2 Location-scale 3 Application: School spending 4 Student-t regression Exponential power 5

: Example 1 : Per capita income and per capita spending in public schools by state in the United States in 1979. Per capita spending 300 400 500 600 700 800 Student t LM with Alaska LM without Alaska outlier 6000 7000 8000 9000 10000 11000 Per capita income Linear and quadratic fits under Gaussian errors, dashed, and Student-t errors, solid.

: Example 1 Model: Student-t linear regression : Large values are associated with possible outliers 0 5 10 15 20 25 30 λ 1 i 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

: Example 2 Darwin s data: Differences in heights of 15 pairs of self- and cross-fertilized plants. Dot plot of Darwin s data and the sampling distributions of the data.

: Example 2 Model: Student-t distribution Large values are associated with possible outliers λ 1 i 0 10 20 30 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Boxplots of λ 1 i, i = 1,..., 15

Location-scale

Location-scale Models based on the normal distribution are not robust to outliers Alternative? Location-scale s with heavy-tailed prior distributions Student t-distribution exponential power distribution between others

Influence function: Ad-Hoc S(x) 2 1 0 1 2 Normal Ad Hoc 1 Ad Hoc 2 10 5 0 5 10 x

Influence function where x = y µ s(x) = log p(x) x Examples: Normal (0, 1): s(x) = x Exponential Power (0, 1, p): s(x) = ( x) p 1 1 (x 0) + x p 1 1 (x>0) Student-t (0, 1, ν): s(x) = (ν+1)x ν+x 2, ν: d.f. Mixture of normals πn(0, 1) + (1 π)n(0, σ 2 ): s(x) = πx exp( x2 /2) + (1 π)xσ 3 exp( x 2 /(2σ 2 )) π exp( x 2 /2) + (1 π)σ 1 exp( x 2 /(2σ 2 ))

Influence function S(x) 10 5 0 5 10 Normal Exp. Power Student t Normal mixture 10 5 0 5 10 x

Normal scale mixture distributions Let X be a continuous random variable with location µ and scale σ. The pdf of X has the scale mixture of normal representation if f X (x µ, σ) = 0 N(x µ, κ(λ)σ 2 )π(λ)dλ where κ(λ) is a positive function and π( ) is a density function on R +.

Normal scale mixture distributions Let X be a continuous random variable with location µ and scale σ. The pdf of X has the scale mixture of normal representation if f X (x µ, σ) = 0 N(x µ, κ(λ)σ 2 )π(λ)dλ where κ(λ) is a positive function and π( ) is a density function on R +. Example: The Student t-distribution t ν(x µ, σ) = Z 0 N(x µ, σ2 λ )Ga(λ ν 2, ν 2 )dλ that is, X t ν(µ, σ) follows the hierarchical form: X µ, σ, ν, λ N(µ, σ2 λ ) and λ ν Ga( ν 2, ν 2 )

Uniform scale mixture distributions Let X be a continuous random variable with location µ and scale σ. The pdf of X has the scale mixture of uniform representation if f X (x µ, σ) = 0 U(x µ κ(λ)σ 2, µ + κ(λ)σ 2 )π(λ)dλ where κ( ) is a positive function and π( ) is a density function on R +

Uniform scale mixture distributions Let X be a continuous random variable with location µ and scale σ. The pdf of X has the scale mixture of uniform representation if f X (x µ, σ) = U(x µ κ(λ)σ 2, µ + κ(λ)σ 2 )π(λ)dλ 0 where κ( ) is a positive function and π( ) is a density function on R + Example: The exponential power (EP) distribution 0 f X(x µ, σ, β) = c1 σ exp c @ 1/2 0 (x µ) 1 2/β A σ where β (0, 2] controls the kurtosis. The EP distribution follows the hierarchical form: X µ, σ, β, λ U µ σ λ β/2, µ + σ «λ β/2 2c0 2c0 λ β Ga(1 + β 2, 2 1/β )

Application: School spending

Application: School spending Application: School spending Student-t linear Dependent variable (y i ): per capita spending Regressor variables (x i ): per capita income and its square Student-t y i θ, σ, ν, λ i N(x iθ, σ2 λ i ) λ i ν Ga( ν 2, ν 2 ) Exponential power ( y i θ, σ, β, λ i U x iθ σ λ β/2 i, x 2c0 iθ + σ ) λ β/2 i 2c0 λ i β Ga(1 + β 2, 2 1/β )

Application: School spending (Student-t ) Linear Linear fit Boxplots of λ 1 i, i = 1,..., 50 Application: School spending Per capita spending 300 400 500 600 700 800 Student t LM with Alaska LM without Alaska outlier 0 5 10 15 20 25 30 6000 7000 8000 9000 10000 11000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Per capita income

Application: School spending (Student-t ) Linear Application: School spending Posterior summaries based on the independent Jeffreys prior (acceptance rate of ν: 0.42) and coefficients of linear with and without Alaska. Parameter Median 95% C.I. LM with LM without Alaska Alaska θ 1-74.26 ( -205.89, 54.57) -151.27-26.80 θ 2 578.25 (407.44, 752.98) 689.39 518.31 σ 46.68 (33.44, 62.88) 61.41 49.90 ν 4.36 (1.81, 16.67) - ν: degrees of freedom

Application: School spending (Exponential power ) Application: School spending 0 5 10 15 20 25 Linear Boxplots of λ i, i = 1,..., 50 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Posterior summaries: Parameter Median 95% C.I. θ 1-103.0 (-258.0, 13.5) θ 2 618.0 (464.0, 830.0) σ 51.9 (38.9, 71.4) β 1.32 (1.03, 2.19)

Application: School spending (quadratic ) Student-t vs EP Application: School spending Student-t : Parameter Median 95% C.I. θ 1 891.58 (-80.69, 1591.13) θ 2-2051.82 (-3842.39, 612.12) θ 3 1771.90 (-38.06, 2915.73) σ 46.90 (32.12, 63.54) ν 5.25 (1.83, 43.14) ν: degrees of freedom Exponential power : Parameter Median 95% C.I. θ 1 1188.83 (923.71, 1561.89) θ 2-2796.18 (-3760.45, -2140.38) θ 3 2225.43 (1801.55, 2863.88) σ 49.87 (36.88, 73.33) β 1.35 (1.03, 3.46)

Student-t regression Exponential power

Student-t regression : objective Student-t regression Exponential power Fonseca, Ferreira and Migon (2008) Consider the linear regression y = Xβ + ɛ ɛ = (ɛ 1,..., ɛ n ) is the error vector ɛ i s are i.i.d according to the Student-t distribution (0, σ, ν) X = (x 1,..., x n ) is the n p matrix of explanatory variables and of full-rank p Model parameters: θ = (β, σ, ν) R p (0, ) 2

Jeffreys priors: Student-t regression Student-t regression Exponential power Class of improper prior distributions π(θ) π(p) σ a The independence Jeffreys prior and the Jeffreys-rule prior for Θ = (β, σ, ν), denoted by π I (Θ) and π R (Θ) are given by π I (Θ) : a = 1, π I (ν) ( ν ν+3 π R (Θ) : a = p + 1, ) 1/2 { Ψ ( ν 2 ) Ψ ( ν+1 2 ) 2(ν+3) ν(ν+1) 2 } 1/2 π R (ν) π I (ν) ( ν+1 ν+3 ) p/2

Jeffreys priors: Student-t regression Student-t regression Exponential power Corollary The marginal independence Jeffreys prior for ν given by π I (θ) is a continuous function in [0, ) and is such that π I (ν) = O(ν 1/2 ) as ν 0 and π I (ν) = O(ν 2 ) as ν. Corollary Provided that n p + 1, (i) the independence Jeffreys prior π I (θ) and the Jeffreys-rule prior π R (θ) yield proper posterior densities, and (ii) the marginal posteriors π I (ν y, x) and π R (ν y, x) do not have any positive integer moments.

Exponential power : objective Student-t regression Exponential power Density: EP(µ, σ p, p) p(y µ, σ p, p) = [ ] 1 2p 1/p [ σ p Γ(1 + 1/p) exp (pσ p p ) 1 x µ p] with < y <, < µ <, σ > 0 and p > 0 µ = E(y), the location parameter σ p = [E( y µ p )] 1/p, the scale parameter p the shape parameter

Exponential power : objective Student-t regression Exponential power Density: EP(µ, σ p, p) p(y µ, σ p, p) = [ ] 1 2p 1/p [ σ p Γ(1 + 1/p) exp (pσ p p ) 1 x µ p] with < y <, < µ <, σ > 0 and p > 0 µ = E(y), the location parameter σ p = [E( y µ p )] 1/p, the scale parameter p the shape parameter Reparametrization, similar to Zhu & Zinde-Walsh (2009): p(y µ, σ, p) = 1» «p Γ(1 + 1/p) y µ 2σ exp σ where σ = p 1/p σ pγ(1 + 1/p)

Characteristics Student-t regression Exponential power p = 1 Laplace distribution p = 2 Normal p Uniform distribution 0 < p < 2 leptokurtic distributions p > 2 platikurtic distributions f(x) 0.0 0.1 0.2 0.3 0.4 0.5 β = 1 (Laplace dist.) β = 1.5 β = 2 (Normal dist.) β = 3.5 4 2 0 2 4 x

Kurtosis Student-t regression Exponential power kurtosis 2 3 4 5 6 1.8 1 2 3 4 5 6 7 8 Figure: Kurtosis function for values of p between 1 and 8. Dashed lines represent especial cases: Laplace distribution (p = 1, κ = 6), normal distribution (p = 2, κ = 3). Horizontal line represents the kurtosis value for uniform distribution (p, κ = 1.8). p

EP regression Student-t regression Exponential power The data consist of n observations y = (y 1,..., y n ) satisfying where y = xβ + ɛ ɛ = (ɛ 1,..., ɛ n ) is the error vector ɛ i s are i.i.d according to the EP(0, σ, p) x is the known n k matrix of explanatory variables assumed to have full rank and β = (β 1,..., β k ) R k are unknown regression parameters

Jeffreys prior: EP Student-t regression Exponential power The Fisher information matrix on Θ = (θ, σ, p) is given by I(θ) = 1 σ Γ( 1 2 p )Γ(2 1 p ) n i=1 x ix i 0 0 np 0 σ 2 0 n σp n σp n p 3 (1 + 1 p )Ψ (1 + 1 p ) Ψ( ) and Ψ ( ) are the digamma and trigamma functions, respectively. Note that: Jeffreys-rule prior is that associated with the single group {(β, σ, p)} and Independence Jeffreys prior is that associated with {β, (σ, p)}

Jeffreys prior: EP Student-t regression Exponential power The reference prior for {β, (σ, p)} (independence Jeffreys prior) and for {(β, σ, p)} (Jeffreys-rule prior) denoted by π I (Θ) and π R (Θ), belong to the class of improper prior distributions given by π(θ) π(p) σ a where a R and π(p) is the marginal prior of the shape parameter p π I (Θ) : a = 1, π R (Θ) : a = k + 1, π I (p) 1 p π R (p) [( ) ( ) ] 1/2 1 + 1 p Ψ 1 + 1 p 1 [ Γ ( 1 p ) ( k/2 Γ 2 p)] 1 π I (p) Another independence prior ) 1/2 ( )] 1/2 π I2 (Θ) : a = 1, π I2 (p) (1 1 + 1 p 3/2 p [Ψ 1 + 1 p

Jeffreys prior: EP Student-t regression Exponential power For p L(p; y) = O(1). Moreover: π I (p) = O(p 1 ) for p π R (p) = O(p k/2 1 ) for p Due to the tail behavior of π I (p) and π R (p) both priors lead to improper posterior distributions π I2 (p) = O(p 3/2 ) for p This prior leads to a proper posterior distribution π(p) 0.0 0.2 0.4 0.6 0.8 1.0 2 4 6 8 10 12 14 p Proposed Ind. prior Ind. Jeffreys Jeffreys rule

Proposed proper prior as a calibration tool Student-t regression Exponential power p 2 4 6 8 10 12 * 0.5 1.0 1.5 2.0 σ (a) p 2 4 6 8 10 12 * 0.5 1.0 1.5 2.0 σ Figure: (a) Contour plot of the likelihood function for (σ, p) considering a data set of size n = 50 with parameters β = 0 (fixed), σ = 1 and p = 1.8. (b) Contour plot of the joint posterior distribution of (σ, p) based on the proposed prior. The symbol indicates the position of the true values. (b)

Frequentist properties of estimators Student-t regression Exponential power 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.4 1.7 2.0 2.3 2.6 2.9 p (a) Coverage of 95%: p 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.4 1.7 2.0 2.3 2.6 2.9 p (b) Coverage of 95%: σ Figure: Frequentist coverage probability of 95% HPD credible intervals (solid line) and 95% confidence interval (dashed line) for p and σ based on proposed prior for n = 50 (circle) and n = 100 (triangle). Horizontal line indicates the 95% nominal level.

Frequentist properties of estimators Student-t regression Exponential power 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.1 1.4 1.7 2.0 2.3 2.6 2.9 p (a) MSE/p 0.00 0.05 0.10 0.15 0.20 0.25 0.30 1.1 1.4 1.7 2.0 2.3 2.6 2.9 p (b) MSE/σ Figure: Square root of the relative mean square error of estimators of p (left panel) and σ (right panel) based on the independence Jeffreys prior π I2 (solid line) and maximum likelihood estimation (dashed line) for n = 50 (circle) and n = 100 (triangle).

Application 1: Excess returns for Martin Marietta company 60 monthly observations (January,1982 to December, 1986. Variables: The excess rate of return for the Martin Marietta company 20 (y) and the index for the excess rate returns (x) for the New York stock exchange (CRSP). Excess return for Martin Marietta 0.0 0.2 0.4 0.6 0.05 0.00 0.05 0.10 CRSP (a) 0 1 2 3 4 5 6 7 0.2 0.0 0.2 0.4 (b) Figure: (a) Scatterplot of the data and fitted EP regression. (b) Histogram of the residuals from the fitted EP regression and fitted density (solid line).

Application 1: Excess returns for Martin Marietta company 0 1 2 3 4 5 6 7 1.0 1.2 1.4 1.6 1.8 p (a) 0 10 20 30 40 0.05 0.06 0.07 0.08 0.09 0.10 0.11 σ (b) Figure: Marginal posterior densities for (a) p and (b) σ. Vertical dashed lines are the posterior 95% HPD credible intervals.

Application 1: Excess returns for Martin Marietta company 0 10 20 30 0.04 0.02 0.00 0.01 0.02 0.03 1.0 1.5 2.0 β 1 β 2 (a) (b) 0.0 0.5 1.0 1.5 Figure: Marginal posterior densities for (a) β 1 and (b) β 2. Vertical dashed lines are the posterior 95% HPD credible intervals.

Application 1: Excess returns for Martin Marietta company Table: Posterior summaries: median, mode and 95% HPD credible interval, based on the independence Jeffreys prior π I 2. Parameter Median Mode 95% C.I. β 1-0.006-0.006 (-0.027, 0.014) β 2 1.327 1.295 (0.891, 1.844) σ 0.064 0.062 (0.047, 0.085) p 1.092 1.000 (1.000, 1.314)

Thank you migon@im.ufrj.br