Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Similar documents
Final Exam Suggested Solutions

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS

Stat 328, Summer 2005

Example 1 of econometric analysis: the Market Model

Non-linearities in Simple Regression

Regression and Simulation

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Study 2: data analysis. Example analysis using R

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Generalized Linear Models

NHY examples. Bernt Arne Ødegaard. 23 November Estimating dividend growth in Norsk Hydro 8

Economics 424/Applied Mathematics 540. Final Exam Solutions

Multiple regression - a brief introduction

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

Financial Econometrics: Problem Set # 3 Solutions

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

> attach(grocery) > boxplot(sales~discount, ylab="sales",xlab="discount")

6 Multiple Regression

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

1 Estimating risk factors for IBM - using data 95-06

Global Journal of Finance and Banking Issues Vol. 5. No Manu Sharma & Rajnish Aggarwal PERFORMANCE ANALYSIS OF HEDGE FUND INDICES

Risk Analysis. å To change Benchmark tickers:

Solutions to questions in Chapter 8 except those in PS4. The minimum-variance portfolio is found by applying the formula:

Risk Reduction Potential

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

Monetary Economics Risk and Return, Part 2. Gerald P. Dwyer Fall 2015

The Norwegian State Equity Ownership

Econometric Methods for Valuation Analysis

Random Effects ANOVA

OPTIMAL RISKY PORTFOLIOS- ASSET ALLOCATIONS. BKM Ch 7

SFSU FIN822 Project 1

P2.T8. Risk Management & Investment Management. Jorion, Value at Risk: The New Benchmark for Managing Financial Risk, 3rd Edition.

Jaime Frade Dr. Niu Interest rate modeling

Stat 401XV Exam 3 Spring 2017

CHAPTER III METHODOLOGY

ECO 317 Economics of Uncertainty Fall Term 2009 Tuesday October 6 Portfolio Allocation Mean-Variance Approach

The SAS System 11:03 Monday, November 11,

Openness and Inflation

Topic 8: Model Diagnostics

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

Some estimates of the height of the podium

1 Describing Distributions with numbers

SAS Simple Linear Regression Example

Random Variables and Probability Distributions

York University MATH 1131 (FALL 2005): Introduction to Statistics Mid Term Test Friday, Oct 28, 2005

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Point-Biserial and Biserial Correlations

Risk and Return and Portfolio Theory

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

Linear regression model

Asset Pricing and Excess Returns over the Market Return

Lecture 5a: ARCH Models

MODEL SELECTION CRITERIA IN R:

Lecture 3: Factor models in modern portfolio choice

Improving Returns-Based Style Analysis

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

General structural model Part 2: Nonnormality. Psychology 588: Covariance structure and factor models

Maximum Likelihood Estimation

State Ownership at the Oslo Stock Exchange. Bernt Arne Ødegaard

Appendix. A.1 Independent Random Effects (Baseline)

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases.

Rationale. Learning about return and risk from the historical record and beta estimation. T Bills and Inflation

Amath 546/Econ 589 Univariate GARCH Models

The method of Maximum Likelihood.

Answer FOUR questions out of the following FIVE. Each question carries 25 Marks.

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

Homework Assignment Section 3

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Monetary Economics Measuring Asset Returns. Gerald P. Dwyer Fall 2015

Determination of the Optimal Stratum Boundaries in the Monthly Retail Trade Survey in the Croatian Bureau of Statistics

Multiple Regression. Review of Regression with One Predictor

Online Appendix for. Explaining Corporate Capital Structure: Product Markets, Leases, and Asset Similarity. Joshua D.

Online Appendix for The Importance of Being. Marginal: Gender Differences in Generosity

Economics 483. Midterm Exam. 1. Consider the following monthly data for Microsoft stock over the period December 1995 through December 1996:

Predicting Charitable Contributions

> > is.factor(scabdata$trt) [1] TRUE > is.ordered(scabdata$trt) [1] FALSE > scabdata$trtord <- ordered(scabdata$trt, +

Quantile Regression due to Skewness. and Outliers

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Computing Statistics ID1050 Quantitative & Qualitative Reasoning

Foundations of Finance

Simple Descriptive Statistics

Financial Mathematics III Theory summary

General Business 706 Midterm #3 November 25, 1997

Intro to GLM Day 2: GLM and Maximum Likelihood

Business Statistics: A First Course

State Ownership at the Oslo Stock Exchange

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

COMM 324 INVESTMENTS AND PORTFOLIO MANAGEMENT ASSIGNMENT 1 Due: October 3

Loss Simulation Model Testing and Enhancement

Financial Econometrics Jeffrey R. Russell Midterm 2014

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

The Evidence for Differences in Risk for Fixed vs Mobile Telecoms For the Office of Communications (Ofcom)

Fall 2004 Social Sciences 7418 University of Wisconsin-Madison Problem Set 5 Answers

Transcription:

Regression Review and Robust Regression Slides prepared by Elizabeth Newton (MIT)

S-Plus Oil City Data Frame Monthly Excess Returns of Oil City Petroleum, Inc. Stocks and the Market SUMMARY: The oilcity data frame has 29 rows and 2 columns. The sample runs from April 979 to December 989. This data frame contains the following columns: VALUE: Oil monthly excess returns of Oil City Petroleum, Inc. stocks. Market monthly excess returns of the market. E Newton 2 This output was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Oil City Data (continued) Returns relative change in the stock price over a one month interval Excess returns are computed relative to the monthly return of a 90-day US Treasury bill at the risk-free rate Financial economists use least squares to fit a straight line predicting a particular stock return from the market return. Beta estimated coefficient of the market return. Measures the riskiness of the stock in terms of standard deviation and expected returns. Large beta -> stock is risky compared to market, but also expected returns from the stock are large. E Newton 3

Plot of Market returns vs. month oilcity$market -0.2-0. 0.0 0 20 40 60 80 00 20 Month E Newton 4 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of Oil City Petroleum return vs. month Oil 0 2 3 4 5 0 20 40 60 80 00 20 month E Newton 5 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Histogram of Market Returns 0 0 20 30 40 50-0.3-0.2-0. 0.0 0. Market E Newton 6 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Histogram of Oil City Returns 0 20 40 60 80 00-0 2 3 4 5 Oil E Newton 7 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of Oil City vs. Market Returns 94 Oil City 0 2 3 4 5 79 06 8 20 07 57 6 3 00 66 9 4 49 29 2 4 6 22 25 23 9 8 7 5 7 26 24 0 3 5 27 28 34 52 53 90 46 35 648855 78 68 50 39 38 586 93 54 3 4248 5 44 62 33 59 60 63 67 7 37 32 2 30 2 40 456 69 70 72 73 2 08 77 7685 0 7 27 9 82 75 09 8792968 0 3 23 26 2028 86 849 5 2 8 83 98 47 4 74 99 24 9580 03 04 6 2522 02 97 29 05 89 36 65 4 43-0.2-0. 0.0 Market E Newton 8 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of Oil City vs. Market Returns without observation 94 Oil City -0.6-0.4-0.2 0.0 0.2 0.4 0.6 0.8 02 2 29 30 79 05 8 20 06 57 99 3 6 49 66 4 9 53 34 52 6 46 6 0 90 552 68 4 25 8878 7 22 23 26 9307 50 7685 54 82 08 879295 48 5 544 8 86 849 64 8 22 25 00 639 09 27 20 7 23 83 70 28 9 35 2 586 873 3 7 42 3826 60 77 4 69 75 98 5 39 67 72 242 32 37 3 3 74 62 24 7 33 59 5 45 94 56 97 80 27 047 03 04 40 28 0 96 89 2 36 65 4 43-0.25-0.20-0.5-0.0-0.05 0.0 0.05 Market E Newton 9 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

> summary(oilcity) Oil Market Min.:-0.55667260 Min.:-0.27857020 st Qu.:-0.23968330 st Qu.:-0.0557534 Median:-0.0049000 Median:-0.07277544 Mean:-0.072225 Mean:-0.07689209 3rd Qu.:-0.0582000 3rd Qu.:-0.03973828 Max.: 5.9292000 Max.: 0.073940 E Newton 0 This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Summary oil.lm Call: lm(formula Oil ~ Market, data oilcity) Residuals: Min Q Median 3Q Max -0.6952-0.732-0.05444 0.08407 4.842 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 0.474 0.0707 2.0849 0.039 Market 2.8567 0.738 3.9040 0.0002 Residual standard error: 0.4867 on 27 degrees of freedom Multiple R-Squared: 0.07 F-statistic: 5.24 on and 27 degrees of freedom, the p-value is 0.000528 Correlation of Coefficients: (Intercept) Market 0.7956 E Newton This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of residual vs. fit for oil.lm Residuals 0 2 3 4 5 79 94 65-0.6-0.4-0.2 0.0 0.2 Fitted : Market E Newton 2 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of Cooks Distance vs. Index 94 Cook's Distance 0.0 0.5.0.5 2.0 2.5 3.0 43 65 0 20 40 60 80 00 20 E Newton 3 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of hat matrix diagonals for oil.lm hat(model.matrix(oil.lm)) 0.02 0.04 0.06 0.08 0.0 2 29 30 7 22 43 4 2 35 25 89 05 2728 34 26 333638 39 62 65 70 49 80 83 9 46 23 2 3 5 4 6 89035 78 920 40 44 24 332 37 4245 46 52 59 74 8486 95 07 24 4748 50 5 5354 55 64 56 5758 60663 66 67 68697727375 78 79 99 8 7677 82 87 88 90 00 06 92 96 004 8 85 93 97 98 4 02 08 09 0 2 3 5 6 7 9 2 20 22 23 25 27 26 28 29 94 03 0 20 40 60 80 00 20 month E Newton 4 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Summary of model without observation 94 Call: lm(formula Oil ~ Market, data oilcity94) Residuals: Min Q Median 3Q Max -0.569-0.74-0.0959 0.06864 0.859 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) -0.0247 0.0304-0.839 0.473 Market.355 0.337 3.6202 0.0004 Residual standard error: 0.2033 on 26 degrees of freedom Multiple R-Squared: 0.09422 F-statistic: 3. on and 26 degrees of freedom, the p-value is 0.0004249 Correlation of Coefficients: (Intercept) Market 0.806 E Newton 5 This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of residual vs fit for model without observation 94 Residuals -0.4-0.2 0.0 0.2 0.4 0.6 0.8 8 79 05-0.3-0.2-0. 0.0 Fitted : Market E Newton 6 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Weighted Least Squares Used when observations, y, have unequal y Xβ + 2 E( ) 0, Var ( ) σ V V is non - singular positive definite V is diagonal if errors are uncorrelated, V is always symmetric nxn non - singular symmetric matrix,r such that R'R RR V R is sometimes called the square root of V i variances E Newton 7

Weighted least squares (continued) 0 ) ( ) ( y or, becomes, X, y Define new variables : + + + β β β R E E X R X R y R X y R X R y R E Newton 8

Weighted least squares (continued) I RRR R VR R R E R R R E E E E E Var 2 2 2 ) ' ( ) ' ( ) ' ( } )]' ( )][ ( {[ ) ( σ σ σ E Newton 9

Weighted Least Squares (continued) Q( β ) ' V ( y Var ( ˆ) β 2 σ (X' WX) 2 σ ( X' WX ) Xβ )' W ( y - Least squares normal equations are (X' WX) ˆ β The solution is : ˆ β (X' WX) - Xβ ) X' WW W, (X' WX) X' W - W V X' Wy WX( X' WX ) - var( y) WX( XWX ) weights X' Wy E Newton 20

Robust Regression Used to reduce influence of outliers LAR Regression : minimize L n i y i x β i n i e i LMS Regression : minimize : median{[y i x β ] i 2 } median{e 2 i } M estimators : minimize : n i g(y i x β ) i n i g(e ), i g a function of residuals E Newton 2

Robust Regression (continued) IRLS, iteratively reweighted least squares Minimize e We W is a diagonal matrix of weights, inversely proportional to magnitude of scaled residuals, u i u i e i /s, smadmedian{ e i -median(e i ) } Procedure:. Obtain initial coefficient estimates from OLS 2. Obtain weights from scaled residuals 3. Obtain coefficient estimates from WLS 4. Return to 2. Convergence usually rapid. E Newton 22

(See Figure 0.4, and Equations 0.44 and 0.45 in Neter et al. Applied Linear Statistical Models.) Neter et al. Applied Linear Statistical Models 23

Plot of residuals in oil.rreg oil.rreg$resid 0 2 3 4 5 0 20 40 60 80 00 20 E Newton 24 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of weights in robust regression for oil city data set Weights 0.0 0.2 0.4 0.6 0.8.0 2 3 5 4 7 7 89 26 23 283 3538394244 4850554 58 3 56 22 29 24 25 27 3233 37 59 606 62 63 646769 707 72 73 757677 882 8384 85 86 87 992 96 0 78 93 99 08 09 0 30 45 46 55 68 88 03 2 3 58 9 20 2 23 24 26 28 2225 27 90 4 6 2 56 74 0 40 47 9598 04 7 4 52 9 2 34 4 80 53 29 02 49 97 43 05 66 6 8 20 36 57 65 79 89 94 00 07 06 0 20 40 60 80 00 20 Month E Newton 25 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plot of sqrt(weights)resid/s in oil.rreg (sqrt(oil.rreg$w) oil.rr... - 0 0 20 40 60 80 00 20 E Newton 26 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Coefficient table for oil.rreg > x<-cbind(,market) > beta<-solve(t(x)%%diag(w)%%x)%%t(x)%%diag(w)%%oil > r<-oil-x%%beta > s<- median(abs(r-median(r))).4826 > covm<-solve(t(x)%%diag(w)%%x)s^2 > se<-sqrt(diag(covm)) > tvaluebeta/se > prob<-2(-pt(abs(tvalue),27)) > cbind(beta,se,tvalue,prob) beta se tvalue prob (Intercept) -0.06779903 0.0245469-2.765649 0.0065285939 x 0.898955 0.24902845 3.609849 0.0004394276 Covariance matrix is approximate. E Newton 27 This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Plots of fitted regression lines for oil city data 94 Oil 0 2 3 4 5 oil.lm oil.lm94 oil.rreg 79 06 8 20 07 57 6 3 00 66 9 4 49 29 2 4 6 22 25 23 9 8 7 5 7 26 24 0 3 5 27 28 34 52 53 90 46 35 648855 78 68 50 39 38 586 93 54 3 4248 5 44 62 33 59 60 63 67 7 37 32 2 30 2 40 456 69 70 72 73 2 08 77 7685 0 7 27 9 82 75 09 8792968 0 3 23 26 2028 86 849 5 2 8 83 98 47 4 74 99 24 9580 03 04 6 2522 02 97 29 05 89 36 65 4 43-0.2-0. 0.0 Market E Newton 28 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.

Least Trimmed Squares Regression Minimizes where q is : q i e 2 i chosen, to be between n/2 and n Based on a genetic algorithm for finding a subset of data with minimum SSE. High breakdown point: fits the bulk of the data well, even if bulk is only a little more than half the data. Resulting weights are or 0 E Newton 29

> summary(oil.lts) Method: [] "Least Trimmed Squares Robust Regression." Call: ltsreg(formula Oil ~ Market) Coefficients: Intercept Market -0.0864 0.7907 Scale estimate of residuals: 0.468 Robust Multiple R-Squared: 0.09863 Total number of observations: 29 Number of observations that determine the LTS estimate: 6 Residuals: Min. st Qu. Median 3rd Qu. Max. -0.454-0.088 0.032 0.097 5.223 Weights: 0 0 9 E Newton 30 This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.