MODEL SELECTION CRITERIA IN R:

Similar documents
Multiple regression - a brief introduction

Non-linearities in Simple Regression

Regression and Simulation

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Stat 328, Summer 2005

Final Exam Suggested Solutions

1 Estimating risk factors for IBM - using data 95-06

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS

NHY examples. Bernt Arne Ødegaard. 23 November Estimating dividend growth in Norsk Hydro 8

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Homework Assignment Section 3

Random Effects ANOVA

Stat 401XV Exam 3 Spring 2017

Study 2: data analysis. Example analysis using R

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Intro to GLM Day 2: GLM and Maximum Likelihood

R is a collaborative project with many contributors. Type contributors() for more information.

Generalized Linear Models

Return Predictability: Dividend Price Ratio versus Expected Returns

The method of Maximum Likelihood.

Much of what appears here comes from ideas presented in the book:

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Analysis of Variance in Matrix form

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Discussion of: Asset Prices with Fading Memory

Random Effects... and more about pigs G G G G G G G G G G G

Lecture Note of Bus 41202, Spring 2008: More Volatility Models. Mr. Ruey Tsay

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

> > is.factor(scabdata$trt) [1] TRUE > is.ordered(scabdata$trt) [1] FALSE > scabdata$trtord <- ordered(scabdata$trt, +

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Midterm

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

The Norwegian State Equity Ownership

Economics 424/Applied Mathematics 540. Final Exam Solutions

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction

Estimating a demand function

Window Width Selection for L 2 Adjusted Quantile Regression

6 Multiple Regression

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2010, Mr. Ruey S. Tsay Solutions to Final Exam

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Financial Econometrics: Problem Set # 3 Solutions

The SAS System 11:03 Monday, November 11,

Statistics for Business and Economics

Jaime Frade Dr. Niu Interest rate modeling

Bayesian Linear Model: Gory Details

Introduction to General and Generalized Linear Models

Cross-validation, ridge regression, and bootstrap

Econometric Methods for Valuation Analysis

2SLS HATCO SPSS, STATA and SHAZAM. Example by Eddie Oczkowski. August 2001

Chapter 7: Estimation Sections

Dual response surface methodology: Applicable always?

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Final Exam

Topic 8: Model Diagnostics

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Appendix. Table A.1 (Part A) The Author(s) 2015 G. Chakrabarti and C. Sen, Green Investing, SpringerBriefs in Finance, DOI /

Credit Risk Modelling

Financial Risk Management

Logit Models for Binary Data

Portfolio Risk Management and Linear Factor Models

Maximum Likelihood Estimation

Panel Data. November 15, The panel is balanced if all individuals have a complete set of observations, otherwise the panel is unbalanced.

Parameter Estimation

Time series: Variance modelling

State Ownership at the Oslo Stock Exchange. Bernt Arne Ødegaard

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

boxcox() returns the values of α and their loglikelihoods,

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

Problem Set 9 Heteroskedasticty Answers

Projects for Bayesian Computation with R

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm

Unobserved Heterogeneity Revisited

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm

Two factors, factorial experiment, both factors

Objective Bayesian Analysis for Heteroscedastic Regression

Forecast Combination

Predicting Charitable Contributions

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Midterm

Two Way ANOVA in R Solutions

Exchange Rate Regime Classification with Structural Change Methods

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics

Modelling Returns: the CER and the CAPM

STA258 Analysis of Variance

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

General Business 706 Midterm #3 November 25, 1997

Evaluation of a New Variance Components Estimation Method Modi ed Henderson s Method 3 With the Application of Two Way Mixed Model

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Chapter 7: Estimation Sections

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Final Exam

Fall 2004 Social Sciences 7418 University of Wisconsin-Madison Problem Set 5 Answers

State Ownership at the Oslo Stock Exchange

Exchange Rate Regime Classification with Structural Change Methods

MCMC Package Example

Lecture Note: Analysis of Financial Time Series Spring 2008, Ruey S. Tsay. Seasonal Time Series: TS with periodic patterns and useful in

CHAPTER 4 DATA ANALYSIS Data Hypothesis

Transcription:

1. R 2 statistics We may use MODEL SELECTION CRITERIA IN R R 2 = SS R SS T = 1 SS Res SS T or R 2 Adj = 1 SS Res/(n p) SS T /(n 1) = 1 ( ) n 1 (1 R 2 ). n p where p is the total number of parameters. R 2 does not take into account model complexity (that is, the number of parameters fitted), whereas R 2 Adj does. 2. Mean Square Residual We consider and note that R 2 Adj = 1 MS Res = SS Res (n p) ( ) ( n 1 1 SS ) Res MS Res = 1 n p SS T SS T /(n 1) so that maximizing R 2 Adj corresponds exactly to minimizing MS Res. 3. Mallows s C p statistic Let µ i = E Yi X i [Y i x i ] and µ i = E Y X [Ŷi x i ] be the modelled and fitted expected values of response Y i at predictor values x i respectively. The expected (or mean) squared error (MSE) of the fit for datum i is E Y X [(Ŷi µ i ) 2 x i ] which can be decomposed Let E Y X [(Ŷi µ i ) 2 x i ] = E Y X [(Ŷi µ i ) 2 x i ] + ( µ i µ i ) 2 = Var Y X [Ŷi x i ] + ( µ i µ i ) 2 SS B = = variance for datum i + (bias for datum i) 2 n ( µ i µ i ) 2 = (µ µ) (µ µ) = µ (I n H)µ say, denote the total squared bias, aggregated across all data points, and FMSE = 1 σ 2 n [Var Y X [Ŷi x i ] + ( µ i µ i ) 2] = 1 σ 2 n Var Y X [Ŷi x i ] + SS B σ 2. Recall that if H is the hat matrix H = X(X X) 1 X then Var Y X [Ŷ x] = Var Y X[HY x] = σ 2 H H = σ 2 H and so n Var Y X [Ŷi x i ] = Trace(σ 2 H) = σ 2 Trace(H) = pσ 2 Also by previous results for quadratic forms ] E Y X [SS Res X] = E Y X [Y (I n H)Y X = µ (I n H)µ + Trace(σ 2 (I n H)) = (µ µ) (µ µ) + (n p)σ 2 = SS B + (n p)σ 2. 1

Therefore we may rewrite An estimator of this quantity is FMSE = 1 σ 2 [ pσ 2 + E Y X [SS Res X] (n p)σ 2] = E Y X[SS Res X] σ 2 C p = SS Res σ 2 n + 2p n + 2p where σ 2 is some estimator of σ 2 derived, say, from the the largest model that is being considered. C p is Mallows s statistic. We choose the model that minimizes C p. We have that E Y X [C p X] = p. 4. Akaike s Information Criterion (AIC) We define for a probability model with parameters θ AIC = 2l( θ) + 2dim(θ) where l(θ) is the log-likelihood function, θ is the maximum likelihood estimate of the parameter θ, and dim(θ) is the dimension of θ. For linear regression models under a normality assumption, we have that θ = (β, σ 2 ) with l(β, σ 2 ) = n 2 log(2π) n 2 log σ2 1 2σ 2 n (y i x i β) 2 Plugging in β and σ ML 2, we obtain l( β, σ ML) 2 = n 2 log(2π) n ( ) 2 log SSRes nss Res n 2SS Res so therefore, writing for the constant function of n, we have AIC = c(n) + n log c(n) = n log(2π) + n ( SSRes n ) + 2(p + 1). This is Akaike s Information Criterion we choose the model with the lowest value of AIC. The constant c(n) need not be included in the calculation as it is constant across all models considered. 5. Bayesian Information Criterion (BIC) The Bayesian Information Criterion (BIC) is a modification of AIC. We define ( ) SSRes BIC = n log + (p + 1) log(n). n and again choose the model with the smallest BIC. 2

SIMULATION STUDY We have the model for three continuous predictors X 1, X 2, X 3 Y i = 2 + 2x i1 + 2x i2 2x i1 x i2 + ɛ i with σ 2 = 1. We have n = 200. Here is the simulation code set.seed(798) n<-200; p<-3 Sig<-rWishart(1,p+2,diag(1,p)/(p+2))[,,1] library(mass) x<-mvrnorm(n,mu=rep(0,p),sigma=sig) be<-c(2,2,2,0,-2) xm<-cbind(rep(1,n),x,x[,1]*x[,2]) Y<-xm %*% be + rnorm(n) x1<-x[,1] x2<-x[,2] x3<-x[,3] fit0<-lm(y~1) fit1<-lm(y~x1) fit2<-lm(y~x2) fit3<-lm(y~x3) fit12<-lm(y~x1+x2) fit13<-lm(y~x1+x3) fit23<-lm(y~x2+x3) fit123<-lm(y~x1+x2+x3) fit12i<-lm(y~x1*x2) fit13i<-lm(y~x1*x3) fit23i<-lm(y~x2*x3) fit123i<-lm(y~x1*x2*x3) criteria.eval<-function(fit.obj,nv,bigsig.hat){ cvec<-rep(0,5) SSRes<-sum(residuals(fit.obj)^2) p<-length(coef(fit.obj)) cvec[1]<-summary(fit.obj)$r.squared cvec[2]<-summary(fit.obj)$adj.r.squared cvec[3]<-ssres/bigsig.hat^2-n+2*p #AIC in R computes # n*log(sum(residuals(fit.obj)^2)/n)+2*(length(coef(fit.obj))+1)+n*log(2*pi)+n cvec[4]<-aic(fit.obj) #BIC in R computes # n*log(sum(residuals(fit.obj)^2)/n)+log(n)*(length(coef(fit.obj))+1)+n*log(2*pi)+n cvec[5]<-bic(fit.obj) } return(cvec) bigs.hat<-summary(fit123i)$sigma cvals<-matrix(0,nrow=12,ncol=5) cvals[1,]<-criteria.eval(fit0,n,bigs.hat) cvals[2,]<-criteria.eval(fit1,n,bigs.hat) cvals[3,]<-criteria.eval(fit2,n,bigs.hat) cvals[4,]<-criteria.eval(fit3,n,bigs.hat) cvals[5,]<-criteria.eval(fit12,n,bigs.hat) cvals[6,]<-criteria.eval(fit13,n,bigs.hat) cvals[7,]<-criteria.eval(fit23,n,bigs.hat) 3

cvals[8,]<-criteria.eval(fit123,n,bigs.hat) cvals[9,]<-criteria.eval(fit12i,n,bigs.hat) cvals[10,]<-criteria.eval(fit13i,n,bigs.hat) cvals[11,]<-criteria.eval(fit23i,n,bigs.hat) cvals[12,]<-criteria.eval(fit123i,n,bigs.hat) Criteria<-data.frame(cvals) names(criteria)<-c('rsq','adj.rsq','cp','aic','bic') rownames(criteria)<-c('1','x1','x2','x3','x1+x2','x1+x3','x2+x3','x1+x2+x3', 'x1*x2','x1*x3','x2*x3','x1*x2*x3') round(criteria,4) Rsq Adj.Rsq Cp AIC BIC 1 0.0000 0.0000 799.1174 875.3679 881.9646 x1 0.2505 0.2467 551.3719 819.7068 829.6018 x2 0.5189 0.5164 283.7367 731.0417 740.9366 x3 0.1196 0.1151 681.8659 851.8930 861.7880 x1+x2 0.7055 0.7026 99.6020 634.8392 648.0325 x1+x3 0.3890 0.3828 415.2121 780.8275 794.0208 x2+x3 0.5239 0.5190 280.7558 730.9543 744.1476 x1+x2+x3 0.7058 0.7013 101.3825 636.6897 653.1813 x1*x2 0.8032 0.8001 4.2736 556.2961 572.7877 x1*x3 0.4074 0.3983 398.9377 776.7363 793.2279 x2*x3 0.5240 0.5167 282.6702 732.9183 749.4098 x1*x2*x3 0.8074 0.8004 8.0000 559.8933 589.5782 This reveals the model X 1 X 2 = X 1 + X 2 + X 1 X 2 as most appropriate model. summary(fit12i) Call lm(formula = Y ~ x1 * x2) Residuals Min 1Q Median 3Q Max -2.43675-0.68819-0.01849 0.68452 2.18404 Coefficients Estimate Std. Error t value Pr(> t ) (Intercept) 2.02079 0.06895 29.310 <2e-16 *** x1 1.91766 0.12823 14.954 <2e-16 *** x2 2.05010 0.10398 19.717 <2e-16 *** x1x2-1.91633 0.19438-9.859 <2e-16 *** --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error 0.9578 on 196 degrees of freedom Multiple R-squared 0.8032,Adjusted R-squared 0.8001 F-statistic 266.6 on 3 and 196 DF, p-value < 2.2e-16 The parameter estimates are therefore which are close to the data generating values. β 0 = 2.0208 β1 = 1.9177 β2 = 2.0501 β12 = 1.9163 4

For an equivalent ANOVA test to the one in the summary output anova(fit12,fit12i) Analysis of Variance Table Model 1 Y ~ x1 + x2 Model 2 Y ~ x1 * x2 Res.Df RSS Df Sum of Sq F Pr(>F) 1 197 268.98 2 196 179.81 1 89.166 97.193 < 2.2e-16 *** --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 par(mfrow=c(2,2),mar=c(4,2,1,2)) plot(x1,residuals(fit12i),pch=19,cex=0.75) plot(x2,residuals(fit12i),pch=19,cex=0.75) plot(x1*x2,residuals(fit12i),pch=19,cex=0.75) 1.0 0.5 0.0 0.5 1.0 1.5 x1 1 0 1 2 x2 1.0 0.5 0.0 0.5 1.0 x1 * x2 5

Finally, for an incorrect model we obtain misleading results summary(fit13i) Call lm(formula = Y ~ x1 * x3) Residuals Min 1Q Median 3Q Max -5.3750-1.0790 0.0121 0.9794 4.5081 Coefficients Estimate Std. Error t value Pr(> t ) (Intercept) 2.0229 0.1186 17.057 < 2e-16 *** x1 2.0842 0.2193 9.503 < 2e-16 *** x3 0.9138 0.1337 6.834 1.02e-10 *** x1x3-0.5377 0.2184-2.462 0.0147 * --- Signif. codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error 1.662 on 196 degrees of freedom Multiple R-squared 0.4074,Adjusted R-squared 0.3983 F-statistic 44.91 on 3 and 196 DF, p-value < 2.2e-16 par(mfrow=c(2,2),mar=c(4,2,1,2)) plot(x1,residuals(fit13i),pch=19,cex=0.75) plot(x3,residuals(fit13i),pch=19,cex=0.75) plot(x1*x3,residuals(fit13i),pch=19,cex=0.75) 4 2 0 2 4 4 2 0 2 4 1.0 0.5 0.0 0.5 1.0 1.5 x1 x3 4 2 0 2 4 2 1 0 1 x1 * x3 6