Regression Review and Robust Regression Slides prepared by Elizabeth Newton (MIT)
S-Plus Oil City Data Frame Monthly Excess Returns of Oil City Petroleum, Inc. Stocks and the Market SUMMARY: The oilcity data frame has 29 rows and 2 columns. The sample runs from April 979 to December 989. This data frame contains the following columns: VALUE: Oil monthly excess returns of Oil City Petroleum, Inc. stocks. Market monthly excess returns of the market. E Newton 2 This output was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Oil City Data (continued) Returns relative change in the stock price over a one month interval Excess returns are computed relative to the monthly return of a 90-day US Treasury bill at the risk-free rate Financial economists use least squares to fit a straight line predicting a particular stock return from the market return. Beta estimated coefficient of the market return. Measures the riskiness of the stock in terms of standard deviation and expected returns. Large beta -> stock is risky compared to market, but also expected returns from the stock are large. E Newton 3
Plot of Market returns vs. month oilcity$market -0.2-0. 0.0 0 20 40 60 80 00 20 Month E Newton 4 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of Oil City Petroleum return vs. month Oil 0 2 3 4 5 0 20 40 60 80 00 20 month E Newton 5 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Histogram of Market Returns 0 0 20 30 40 50-0.3-0.2-0. 0.0 0. Market E Newton 6 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Histogram of Oil City Returns 0 20 40 60 80 00-0 2 3 4 5 Oil E Newton 7 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of Oil City vs. Market Returns 94 Oil City 0 2 3 4 5 79 06 8 20 07 57 6 3 00 66 9 4 49 29 2 4 6 22 25 23 9 8 7 5 7 26 24 0 3 5 27 28 34 52 53 90 46 35 648855 78 68 50 39 38 586 93 54 3 4248 5 44 62 33 59 60 63 67 7 37 32 2 30 2 40 456 69 70 72 73 2 08 77 7685 0 7 27 9 82 75 09 8792968 0 3 23 26 2028 86 849 5 2 8 83 98 47 4 74 99 24 9580 03 04 6 2522 02 97 29 05 89 36 65 4 43-0.2-0. 0.0 Market E Newton 8 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of Oil City vs. Market Returns without observation 94 Oil City -0.6-0.4-0.2 0.0 0.2 0.4 0.6 0.8 02 2 29 30 79 05 8 20 06 57 99 3 6 49 66 4 9 53 34 52 6 46 6 0 90 552 68 4 25 8878 7 22 23 26 9307 50 7685 54 82 08 879295 48 5 544 8 86 849 64 8 22 25 00 639 09 27 20 7 23 83 70 28 9 35 2 586 873 3 7 42 3826 60 77 4 69 75 98 5 39 67 72 242 32 37 3 3 74 62 24 7 33 59 5 45 94 56 97 80 27 047 03 04 40 28 0 96 89 2 36 65 4 43-0.25-0.20-0.5-0.0-0.05 0.0 0.05 Market E Newton 9 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
> summary(oilcity) Oil Market Min.:-0.55667260 Min.:-0.27857020 st Qu.:-0.23968330 st Qu.:-0.0557534 Median:-0.0049000 Median:-0.07277544 Mean:-0.072225 Mean:-0.07689209 3rd Qu.:-0.0582000 3rd Qu.:-0.03973828 Max.: 5.9292000 Max.: 0.073940 E Newton 0 This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Summary oil.lm Call: lm(formula Oil ~ Market, data oilcity) Residuals: Min Q Median 3Q Max -0.6952-0.732-0.05444 0.08407 4.842 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) 0.474 0.0707 2.0849 0.039 Market 2.8567 0.738 3.9040 0.0002 Residual standard error: 0.4867 on 27 degrees of freedom Multiple R-Squared: 0.07 F-statistic: 5.24 on and 27 degrees of freedom, the p-value is 0.000528 Correlation of Coefficients: (Intercept) Market 0.7956 E Newton This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of residual vs. fit for oil.lm Residuals 0 2 3 4 5 79 94 65-0.6-0.4-0.2 0.0 0.2 Fitted : Market E Newton 2 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of Cooks Distance vs. Index 94 Cook's Distance 0.0 0.5.0.5 2.0 2.5 3.0 43 65 0 20 40 60 80 00 20 E Newton 3 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of hat matrix diagonals for oil.lm hat(model.matrix(oil.lm)) 0.02 0.04 0.06 0.08 0.0 2 29 30 7 22 43 4 2 35 25 89 05 2728 34 26 333638 39 62 65 70 49 80 83 9 46 23 2 3 5 4 6 89035 78 920 40 44 24 332 37 4245 46 52 59 74 8486 95 07 24 4748 50 5 5354 55 64 56 5758 60663 66 67 68697727375 78 79 99 8 7677 82 87 88 90 00 06 92 96 004 8 85 93 97 98 4 02 08 09 0 2 3 5 6 7 9 2 20 22 23 25 27 26 28 29 94 03 0 20 40 60 80 00 20 month E Newton 4 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Summary of model without observation 94 Call: lm(formula Oil ~ Market, data oilcity94) Residuals: Min Q Median 3Q Max -0.569-0.74-0.0959 0.06864 0.859 Coefficients: Value Std. Error t value Pr(> t ) (Intercept) -0.0247 0.0304-0.839 0.473 Market.355 0.337 3.6202 0.0004 Residual standard error: 0.2033 on 26 degrees of freedom Multiple R-Squared: 0.09422 F-statistic: 3. on and 26 degrees of freedom, the p-value is 0.0004249 Correlation of Coefficients: (Intercept) Market 0.806 E Newton 5 This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of residual vs fit for model without observation 94 Residuals -0.4-0.2 0.0 0.2 0.4 0.6 0.8 8 79 05-0.3-0.2-0. 0.0 Fitted : Market E Newton 6 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Weighted Least Squares Used when observations, y, have unequal y Xβ + 2 E( ) 0, Var ( ) σ V V is non - singular positive definite V is diagonal if errors are uncorrelated, V is always symmetric nxn non - singular symmetric matrix,r such that R'R RR V R is sometimes called the square root of V i variances E Newton 7
Weighted least squares (continued) 0 ) ( ) ( y or, becomes, X, y Define new variables : + + + β β β R E E X R X R y R X y R X R y R E Newton 8
Weighted least squares (continued) I RRR R VR R R E R R R E E E E E Var 2 2 2 ) ' ( ) ' ( ) ' ( } )]' ( )][ ( {[ ) ( σ σ σ E Newton 9
Weighted Least Squares (continued) Q( β ) ' V ( y Var ( ˆ) β 2 σ (X' WX) 2 σ ( X' WX ) Xβ )' W ( y - Least squares normal equations are (X' WX) ˆ β The solution is : ˆ β (X' WX) - Xβ ) X' WW W, (X' WX) X' W - W V X' Wy WX( X' WX ) - var( y) WX( XWX ) weights X' Wy E Newton 20
Robust Regression Used to reduce influence of outliers LAR Regression : minimize L n i y i x β i n i e i LMS Regression : minimize : median{[y i x β ] i 2 } median{e 2 i } M estimators : minimize : n i g(y i x β ) i n i g(e ), i g a function of residuals E Newton 2
Robust Regression (continued) IRLS, iteratively reweighted least squares Minimize e We W is a diagonal matrix of weights, inversely proportional to magnitude of scaled residuals, u i u i e i /s, smadmedian{ e i -median(e i ) } Procedure:. Obtain initial coefficient estimates from OLS 2. Obtain weights from scaled residuals 3. Obtain coefficient estimates from WLS 4. Return to 2. Convergence usually rapid. E Newton 22
(See Figure 0.4, and Equations 0.44 and 0.45 in Neter et al. Applied Linear Statistical Models.) Neter et al. Applied Linear Statistical Models 23
Plot of residuals in oil.rreg oil.rreg$resid 0 2 3 4 5 0 20 40 60 80 00 20 E Newton 24 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of weights in robust regression for oil city data set Weights 0.0 0.2 0.4 0.6 0.8.0 2 3 5 4 7 7 89 26 23 283 3538394244 4850554 58 3 56 22 29 24 25 27 3233 37 59 606 62 63 646769 707 72 73 757677 882 8384 85 86 87 992 96 0 78 93 99 08 09 0 30 45 46 55 68 88 03 2 3 58 9 20 2 23 24 26 28 2225 27 90 4 6 2 56 74 0 40 47 9598 04 7 4 52 9 2 34 4 80 53 29 02 49 97 43 05 66 6 8 20 36 57 65 79 89 94 00 07 06 0 20 40 60 80 00 20 Month E Newton 25 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plot of sqrt(weights)resid/s in oil.rreg (sqrt(oil.rreg$w) oil.rr... - 0 0 20 40 60 80 00 20 E Newton 26 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Coefficient table for oil.rreg > x<-cbind(,market) > beta<-solve(t(x)%%diag(w)%%x)%%t(x)%%diag(w)%%oil > r<-oil-x%%beta > s<- median(abs(r-median(r))).4826 > covm<-solve(t(x)%%diag(w)%%x)s^2 > se<-sqrt(diag(covm)) > tvaluebeta/se > prob<-2(-pt(abs(tvalue),27)) > cbind(beta,se,tvalue,prob) beta se tvalue prob (Intercept) -0.06779903 0.0245469-2.765649 0.0065285939 x 0.898955 0.24902845 3.609849 0.0004394276 Covariance matrix is approximate. E Newton 27 This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Plots of fitted regression lines for oil city data 94 Oil 0 2 3 4 5 oil.lm oil.lm94 oil.rreg 79 06 8 20 07 57 6 3 00 66 9 4 49 29 2 4 6 22 25 23 9 8 7 5 7 26 24 0 3 5 27 28 34 52 53 90 46 35 648855 78 68 50 39 38 586 93 54 3 4248 5 44 62 33 59 60 63 67 7 37 32 2 30 2 40 456 69 70 72 73 2 08 77 7685 0 7 27 9 82 75 09 8792968 0 3 23 26 2028 86 849 5 2 8 83 98 47 4 74 99 24 9580 03 04 6 2522 02 97 29 05 89 36 65 4 43-0.2-0. 0.0 Market E Newton 28 This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Least Trimmed Squares Regression Minimizes where q is : q i e 2 i chosen, to be between n/2 and n Based on a genetic algorithm for finding a subset of data with minimum SSE. High breakdown point: fits the bulk of the data well, even if bulk is only a little more than half the data. Resulting weights are or 0 E Newton 29
> summary(oil.lts) Method: [] "Least Trimmed Squares Robust Regression." Call: ltsreg(formula Oil ~ Market) Coefficients: Intercept Market -0.0864 0.7907 Scale estimate of residuals: 0.468 Robust Multiple R-Squared: 0.09863 Total number of observations: 29 Number of observations that determine the LTS estimate: 6 Residuals: Min. st Qu. Median 3rd Qu. Max. -0.454-0.088 0.032 0.097 5.223 Weights: 0 0 9 E Newton 30 This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.