Multiple linear regression

Size: px
Start display at page:

Download "Multiple linear regression"

Transcription

1 Multiple linear regression Business Statistics Spring

2 Topics 1. Including multiple predictors 2. Controlling for confounders 3. Transformations, interactions, dummy variables OpenIntro 8.1, Super Cruchers excerpt. 2

3 Simple linear regression recap We saw last week how to use the least squares criterion to define the best linear predictor. We also saw how to use R or Excel to compute the best linear predictor on a given data set. The best in-sample linear predictor is probably not the true linear predictor, but with enough data it should be similar. We can use the idea of a confidence interval to help us gauge how much we trust our fit. 3

4 Square feet versus sale price The least squares line-of-best-fit for the housing data is a = and b = Price in dollars Square Feet The residual standard error the noise level is ˆσ = $22, 480 in this case. Why is it so big? What can we do to make it smaller? 4

5 Bedrooms versus sale price Perhaps we could find a better predictor. If we use the number of bedrooms instead of price we get this fit. Price in dollars Square feet Now ˆσ = $22, 940, which is not an improvement. Couldn t we use both square feet and number of bedrooms to predict? 5

6 The best linear multivariate predictor We still want to find a prediction ŷ to minimize our squared error but now we have E{(ŷ Y ) 2 }, ŷ = b 0 + b 1 X 1 + b 2 X 2... b p X p For a whole list of predictor variables. Applied to a data set, this becomes the optimization problem: find coefficients b 0... b p that minimize: n b 0 + j i=1 2 b j x ij y i. Why are there two subscripts on x ij? 6

7 Including more predictors can improve prediction If we include both square feet and the number of bedrooms in our prediction of price, the residual standard error drops to ˆσ = $21, 100. Price in dollars Predicted price Plotting in this case is trickier...but to get a sense of our prediction accuracy we can look at an predicted versus actual plot. 7

8 Including more predictors can improve prediction If we include SqFt, Bedrooms, Bathrooms and Brick in our prediction of price, the residual standard error drops to ˆσ = $17, 630. Price in dollars Predicted price What would this plot look like if we had all the relevant determinants of price? 8

9 Multiple linear regression in R > summary(housefit) Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + Brick) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) SqFt *** Bedrooms e-05 *** Bathrooms ** BrickYes e-09 *** Residual standard error: on 123 degrees of freedom Multiple R-squared: , Adjusted R-squared: R 2 is a standard measure of goodness-of-fit, but I like ˆσ better. 9

10 Plug-in predictions Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) SqFt *** Bedrooms e-05 *** Bathrooms ** BrickYes e-09 *** Just as in the single-predictor case, we can calculate predictions by plugging in particular values for the predictor variables: ŷ = (2000) (3) (2) (1) would be our prediction for a 2K sq ft, three bed, two bath, brick home. 10

11 Categorical predictors Can we use information in a linear regression even if it isn t numerical? In the housing data we have three neighborhoods, denoted 1, 2 and 3. Why would we potentially not want to include the Nbhd variable into the regression as-is? We can, via the creation of dummy variables. If we have k categories, we create k extra columns in each row exactly one column can be a one. What happens if we include an intercept in this model? 11

12 Dummy variable with no intercept Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + as.factor(nbhd) + Brick - 1) Coefficients: Estimate Std. Error t value Pr(> t ) SqFt e-07 *** Bedrooms Bathrooms ** as.factor(nbhd) as.factor(nbhd) * as.factor(nbhd) e-05 *** BrickYes e-12 *** Residual standard error: on 121 degrees of freedom Multiple R-squared: , Adjusted R-squared:

13 Dummy variable with an intercept Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + Brick + as.factor(nbhd)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) SqFt e-07 *** Bedrooms Bathrooms ** BrickYes e-12 *** as.factor(nbhd) as.factor(nbhd) < 2e-16 *** Residual standard error: on 121 degrees of freedom Multiple R-squared: 0.805, Adjusted R-squared:

14 In-sample prediction accuracy Price in dollars Predicted price Our predictions are getting progressively better. Or are they? 14

15 Over-fitting As you continue to add more and more predictors, you will notice R 2 gets closer and closer to 1. As a crazy though experiment, would this happen even if we kept including garbage variables? In addition to the variables above, let s include 100 junk variable (drawn from a normal distribution) and see what happens. garbage <- matrix(rnorm(100*128),128,100) 15

16 Predicting with garbage Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + as.factor(nbhd) + Brick garbage) Coefficients: Estimate Std. Error t value Pr(> t ) SqFt Bedrooms Bathrooms as.factor(nbhd) as.factor(nbhd) as.factor(nbhd) BrickYes ***. garbage *. garbage * Residual standard error: on 21 degrees of freedom Multiple R-squared: , Adjusted R-squared:

17 Over-fitting Price in dollars Predicted price One simple way to check for over-fitting is to use a hold-out set of data and try to predict them without peeking. 17

18 Interactions What if we think that the price-premium associated with brick might be different between different neighborhoods? Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + as.factor(nbhd):brick - 1) Coefficients: Estimate Std. Error t value Pr(> t ) SqFt e-07 *** Bedrooms Bathrooms * as.factor(nbhd)1:brickno as.factor(nbhd)2:brickno * as.factor(nbhd)3:brickno e-05 *** as.factor(nbhd)1:brickyes ** as.factor(nbhd)2:brickyes *** as.factor(nbhd)3:brickyes e-09 *** Residual standard error: on 119 degrees of freedom Multiple R-squared: , Adjusted R-squared: Adding an interaction term to our regression model explicitly accounts for this possibility. 18

19 Interactions Here is an equivalent way to run this regression. Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + as.factor(nbhd) * Brick - 1) Coefficients: Estimate Std. Error t value Pr(> t ) SqFt e-07 *** Bedrooms Bathrooms * as.factor(nbhd) as.factor(nbhd) * as.factor(nbhd) e-05 *** BrickYes * as.factor(nbhd)2:brickyes as.factor(nbhd)3:brickyes Residual standard error: on 119 degrees of freedom Multiple R-squared: , Adjusted R-squared: How can we see that these are equivalent? Which one do you prefer in terms of interpretation? 19

20 Time series Consider predicting the temperature based on day of the year. These are Chicago daily highs, in Fahrenheit, Daily high temp Time We can often turn nonlinear problem into linear problems by transforming our predictor variables in various ways and using many of them to predict. 20

21 Transformations Since we suspect a seasonal trend, let us create the following two predictor variables: x 1 = sin(2πt/365) and x 2 = cos(2πt/365). We then use least-squares, via lm(), to find a linear prediction rule. Call: lm(formula = y ~ x1 + x2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** x <2e-16 *** x <2e-16 *** Residual standard error: on 1092 degrees of freedom Multiple R-squared: , Adjusted R-squared:

22 Nonlinear prediction via linear regression Daily high temp Time yhat < *sin(2*pi*t/365) *cos(2*pi*t/365) 22

23 Nonlinear prediction via linear regression If we consider a string of 50 days, the daily high temps are sticky...the temp today looks like the temp in the preceding days. Daily high temp Time This suggests we can use previous days weather to predict today s weather. 23

24 Auto-regression Plotting today s weather versus tomorrow s weather gives a nice clean correlation. Temp tomorrow Temp today Running a linear regression will produce a prediction rule. What do you suppose the slope coefficient will be close to? 24

25 Auto-regression Here s how we set this up in R. > today <- y[1:149] > tomorrow <- y[2:150] > temp_auto_reg <- lm(tomorrow~today) > summary(temp_auto_reg) Call: lm(formula = tomorrow ~ today) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** today <2e-16 *** Residual standard error: on 147 degrees of freedom Multiple R-squared: , Adjusted R-squared: We still have nearly 9 degree swings from day to day. 25

26 Auto-regression On a two-day lag the predictability decreases. > today <- y[1:148] > tomorrow <- y[2:149] > dayaftertomorrow <- y[3:150] Call: lm(formula = dayaftertomorrow ~ today) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-05 *** today < 2e-16 *** Residual standard error: on 146 degrees of freedom Multiple R-squared: , Adjusted R-squared: The two-day variability is nearly 12 degrees. 26

27 Auto-regression What happens if we include both today and yesterday to predict tomorrow? > yesterday <- y[1:148] > today <- y[2:149] > tomorrow <- y[3:150] Call: lm(formula = tomorrow ~ today + yesterday) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** today < 2e-16 *** yesterday Residual standard error: on 145 degrees of freedom Multiple R-squared: , Adjusted R-squared: Yesterday s weather is old news! 27

28 MBA beer survey How many beers can you drink before becoming drunk? height number of beers 28

29 MBA beer survey Height seems to be a valuable predictor of beer tolerance. Call: lm(formula = nbeer ~ height) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** height e-06 *** Residual standard error: on 48 degrees of freedom Multiple R-squared: , Adjusted R-squared:

30 MBA beer survey But weight seems also to be relevant. weight number of beers So weight and height both seem predictive, but is one more important than the other? 30

31 MBA beer survey It appears that weight is the relevant variable. Call: lm(formula = nbeer ~ height + weight) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) height weight *** Residual standard error: on 47 degrees of freedom Multiple R-squared: , Adjusted R-squared: On what basis is this determination made? 31

32 Prediction versus intervention We are always safe interpreting our regression models as prediction engines steps the computer follows for turning data into forecasts. We are on much shakier ground when we try to interpret our regression coefficients as knobs to be adjusted. As we reminder ourselves last week, correlation does not imply causation. Straight teeth do not cause nice cars, remember? Essentially we have two alternate explanations: either causation in the other direction (umbrellas do not lead to rain), or common cause (rich folks have nice cars and nice teeth). The first one we have to use common sense. For the second problem lurking confounders we can possibly adjust or control for them. 32

33 Controlling = matching When we include a variable in a regression, we sometimes say that we are controlling for that variable. The intuition is that if we compare like-with-like, then our regression parameters make good mechanistic sense. So, presumably if I looked only at groups of individuals in the same socio-economic status, there would be no remaining relationship between the quality of one s smile and price of one s car. What we are aiming for is a rich enough set of predictors that the variation within each slice of the population (observations) is random there is no hidden structure to trick us. 33

34 Sales versus price Suppose you own a taco truck. The past three years of weekly sales and price data look like this: Sales Price Apparently we should raise prices, right? Bigger price is better, clearly. Or is it? 34

35 Sales versus price The result is statistically significant. Call: lm(formula = sales ~ p1) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** p <2e-16 *** Residual standard error: on 154 degrees of freedom Multiple R-squared: , Adjusted R-squared: How should we interpret this result? 35

36 Price versus sales What if we account for our competitor s price? Competition Price Our Price What do you suppose this tells us? What is this a proxy for? 36

37 Sales versus price The result is not statistically significant, but the least squares coefficient on our price variable changes sign! Call: lm(formula = sales ~ p1 + p2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** p p e-05 *** Residual standard error: on 153 degrees of freedom Multiple R-squared: 0.525, Adjusted R-squared:

38 Simpson s paradox, revisited Within each color, what is the sign of the slope? Sales Our Price 38

39 The kitchen sink regression In an effort to clear out all unwanted confounding so we can interpret our regression coefficients cleanly we often reach for any and all available predictor variables. But this has its downsides. Specifically there are both statistical and also interpretational reasons not to do this. We have already seen the statistical argument, which is that we will tend to over-fit, and we become less certain about our estimates because our effective sample size decreases as we add more predictor variables. But there is another reason not to just throw everything into our regression models willy-nilly. 39

40 Intermediate outcomes Suppose we want to learn about how smoking relates to cancer rates by zip code. That is, Y = cancer rate is our response/outcome variable and X = smoking rate is our predictor variable. To avoid confounding, we control for many other attributes, such as average income, racial make-up, average age, crime rates, etc. Suppose we also included a measure of lung tar in our regression. What do you suppose would happen to the estimated impact of smoking? 40

Multiple regression - a brief introduction

Multiple regression - a brief introduction Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict

More information

Homework Assignment Section 3

Homework Assignment Section 3 Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.

More information

Regression and Simulation

Regression and Simulation Regression and Simulation This is an introductory R session, so it may go slowly if you have never used R before. Do not be discouraged. A great way to learn a new language like this is to plunge right

More information

Non-linearities in Simple Regression

Non-linearities in Simple Regression Non-linearities in Simple Regression 1. Eample: Monthly Earnings and Years of Education In this tutorial, we will focus on an eample that eplores the relationship between total monthly earnings and years

More information

Business Statistics: A First Course

Business Statistics: A First Course Business Statistics: A First Course Fifth Edition Chapter 12 Correlation and Simple Linear Regression Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc. Chap 12-1 Learning Objectives In this

More information

Homework Assignment Section 3

Homework Assignment Section 3 Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.

More information

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty Milestone2 Zillow House Price Prediciton Group Lingzi Hong and Pranali Shetty MILESTONE 2 REPORT Data Collection The following additional features were added 1. Population, Number of College Graduates

More information

Econometrics and Economic Data

Econometrics and Economic Data Econometrics and Economic Data Chapter 1 What is a regression? By using the regression model, we can evaluate the magnitude of change in one variable due to a certain change in another variable. For example,

More information

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings Dummy Variables A dummy variable or binary variable is a variable that takes on a value of 0 or 1 as an indicator that the observation has some kind of characteristic. Common examples: Sex (female): FEMALE=1

More information

11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression

11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression Multiple Linear Regression Analysis BSAD 30 Dave Novak Fall 208 Source: Ragsdale, 208 Spreadsheet Modeling and Decision Analysis 8 th edition 207 Cengage Learning 2 Overview Last class we considered the

More information

Stat 401XV Exam 3 Spring 2017

Stat 401XV Exam 3 Spring 2017 Stat 40XV Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times. Mixed-effects models An introduction by Christoph Scherber Up to now, we have been dealing with linear models of the form where ß0 and ß1 are parameters of fixed value. Example: Let us assume that we are

More information

Analysis of Variance in Matrix form

Analysis of Variance in Matrix form Analysis of Variance in Matrix form The ANOVA table sums of squares, SSTO, SSR and SSE can all be expressed in matrix form as follows. week 9 Multiple Regression A multiple regression model is a model

More information

Regression. Lecture Notes VII

Regression. Lecture Notes VII Regression Lecture Notes VII Statistics 112, Fall 2002 Outline Predicting based on Use of the conditional mean (the regression function) to make predictions. Prediction based on a sample. Regression line.

More information

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley. Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1

More information

Chapter 18: The Correlational Procedures

Chapter 18: The Correlational Procedures Introduction: In this chapter we are going to tackle about two kinds of relationship, positive relationship and negative relationship. Positive Relationship Let's say we have two values, votes and campaign

More information

6 Multiple Regression

6 Multiple Regression More than one X variable. 6 Multiple Regression Why? Might be interested in more than one marginal effect Omitted Variable Bias (OVB) 6.1 and 6.2 House prices and OVB Should I build a fireplace? The following

More information

Jaime Frade Dr. Niu Interest rate modeling

Jaime Frade Dr. Niu Interest rate modeling Interest rate modeling Abstract In this paper, three models were used to forecast short term interest rates for the 3 month LIBOR. Each of the models, regression time series, GARCH, and Cox, Ingersoll,

More information

Final Exam Suggested Solutions

Final Exam Suggested Solutions University of Washington Fall 003 Department of Economics Eric Zivot Economics 483 Final Exam Suggested Solutions This is a closed book and closed note exam. However, you are allowed one page of handwritten

More information

Econometric Methods for Valuation Analysis

Econometric Methods for Valuation Analysis Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 26 Correlation Analysis Simple Regression

More information

Statistic Midterm. Spring This is a closed-book, closed-notes exam. You may use any calculator.

Statistic Midterm. Spring This is a closed-book, closed-notes exam. You may use any calculator. Statistic Midterm Spring 2018 This is a closed-book, closed-notes exam. You may use any calculator. Please answer all problems in the space provided on the exam. Read each question carefully and clearly

More information

Linear regression model

Linear regression model Regression Model Assumptions (Solutions) STAT-UB.0003: Regression and Forecasting Models Linear regression model 1. Here is the least squares regression fit to the Zagat restaurant data: 10 15 20 25 10

More information

Estimating a demand function

Estimating a demand function Estimating a demand function One of the most basic topics in economics is the supply/demand curve. Simply put, the supply offered for sale of a commodity is directly related to its price, while the demand

More information

Study 2: data analysis. Example analysis using R

Study 2: data analysis. Example analysis using R Study 2: data analysis Example analysis using R Steps for data analysis Install software on your computer or locate computer with software (e.g., R, systat, SPSS) Prepare data for analysis Subjects (rows)

More information

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Section 2: Estimation, Confidence Intervals and Testing Hypothesis Section 2: Estimation, Confidence Intervals and Testing Hypothesis Tengyuan Liang, Chicago Booth https://tyliang.github.io/bus41000/ Suggested Reading: Naked Statistics, Chapters 7, 8, 9 and 10 OpenIntro

More information

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Section 2: Estimation, Confidence Intervals and Testing Hypothesis Section 2: Estimation, Confidence Intervals and Testing Hypothesis Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/

More information

Business Statistics. University of Chicago Booth School of Business Fall Jeffrey R. Russell

Business Statistics. University of Chicago Booth School of Business Fall Jeffrey R. Russell Business Statistics University of Chicago Booth School of Business Fall 08 Jeffrey R. Russell There is no text book for the course. You may choose to pick up a copy of Statistics for Business and Economics

More information

Problem Set 5 Answers. ( ) 2. Yes, like temperature. See the plot of utility in the notes. Marginal utility should be positive.

Problem Set 5 Answers. ( ) 2. Yes, like temperature. See the plot of utility in the notes. Marginal utility should be positive. Business John H. Cochrane Problem Set Answers Part I A simple very short readings questions. + = + + + = + + + + = ( ). Yes, like temperature. See the plot of utility in the notes. Marginal utility should

More information

Chapter 12. Homework. For each situation below, state the independent variable and the dependent variable.

Chapter 12. Homework. For each situation below, state the independent variable and the dependent variable. Homework EXERCISE 1 For each situation below, state the independent variable and the dependent variable. a. A study is done to determine if elderly drivers are involved in more motor vehicle fatalities

More information

AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:

AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives: Objectives: INTERPRET the slope and y intercept of a least-squares regression line USE the least-squares regression line to predict y for a given x CALCULATE and INTERPRET residuals and their standard

More information

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay Seasonal Time Series: TS with periodic patterns and useful in predicting quarterly earnings pricing weather-related derivatives

More information

This homework assignment uses the material on pages ( A moving average ).

This homework assignment uses the material on pages ( A moving average ). Module 2: Time series concepts HW Homework assignment: equally weighted moving average This homework assignment uses the material on pages 14-15 ( A moving average ). 2 Let Y t = 1/5 ( t + t-1 + t-2 +

More information

The Least Squares Regression Line

The Least Squares Regression Line The Least Squares Regression Line Section 5.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 pm - 3:30 pm 620 PGH & 5:30 pm - 7:00 pm CASA Department of Mathematics University of Houston

More information

Multiple Regression. Review of Regression with One Predictor

Multiple Regression. Review of Regression with One Predictor Fall Semester, 2001 Statistics 621 Lecture 4 Robert Stine 1 Preliminaries Multiple Regression Grading on this and other assignments Assignment will get placed in folder of first member of Learning Team.

More information

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Multiple Regression and Logistic Regression II. Dajiang 525 Apr Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from Last Time Multiple regression model: Include multiple predictors in the model = + + + + How to interpret the

More information

STT 315 Handout and Project on Correlation and Regression (Unit 11)

STT 315 Handout and Project on Correlation and Regression (Unit 11) STT 315 Handout and Project on Correlation and Regression (Unit 11) This material is self contained. It is an introduction to regression that will help you in MSC 317 where you will study the subject in

More information

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay Homework Assignment #2 Solution April 25, 2003 Each HW problem is 10 points throughout this quarter.

More information

STA 371G Outline Spring 2014

STA 371G Outline Spring 2014 STA 371G Outline Spring 2014 Profess: Mingyuan Zhou Office: CBA 6.462 Phone: 512-232-6763 Email: mingyuan.zhou@mccombs.utexas.edu Office Hours: Tuesday Thursday 3:30-4:30 PM. You are welcome to come by

More information

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1 Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide

More information

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine Models of Patterns Lecture 3, SMMD 2005 Bob Stine Review Speculative investing and portfolios Risk and variance Volatility adjusted return Volatility drag Dependence Covariance Review Example Stock and

More information

R is a collaborative project with many contributors. Type contributors() for more information.

R is a collaborative project with many contributors. Type contributors() for more information. R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type license() or licence() for distribution details. R is a collaborative project

More information

Economic Response Models in LookAhead

Economic Response Models in LookAhead Economic Models in LookAhead Interthinx, Inc. 2013. All rights reserved. LookAhead is a registered trademark of Interthinx, Inc.. Interthinx is a registered trademark of Verisk Analytics. No part of this

More information

Introduction to Population Modeling

Introduction to Population Modeling Introduction to Population Modeling In addition to estimating the size of a population, it is often beneficial to estimate how the population size changes over time. Ecologists often uses models to create

More information

PRACTICE PROBLEMS FOR EXAM 2

PRACTICE PROBLEMS FOR EXAM 2 ST 0 F'08 PRACTICE PROLEMS FOR EAM EAM : THURSDAY /6 Reiland Material covered on test: Chapters 7-9, in text. This material is covered in webassign homework assignments 6-9. Lecture worksheets: - 6 WARNING!

More information

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998 Economics 312 Sample Project Report Jeffrey Parker Introduction This project is based on Exercise 2.12 on page 81 of the Hill, Griffiths, and Lim text. It examines how the sale price of houses in Stockton,

More information

Econometrics is. The estimation of relationships suggested by economic theory

Econometrics is. The estimation of relationships suggested by economic theory Econometrics is Econometrics is The estimation of relationships suggested by economic theory Econometrics is The estimation of relationships suggested by economic theory The application of mathematical

More information

Stat3011: Solution of Midterm Exam One

Stat3011: Solution of Midterm Exam One 1 Stat3011: Solution of Midterm Exam One Fall/2003, Tiefeng Jiang Name: Problem 1 (30 points). Choose one appropriate answer in each of the following questions. 1. (B ) The mean age of five people in a

More information

Intro to GLM Day 2: GLM and Maximum Likelihood

Intro to GLM Day 2: GLM and Maximum Likelihood Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the

More information

Lecture 5 Theory of Finance 1

Lecture 5 Theory of Finance 1 Lecture 5 Theory of Finance 1 Simon Hubbert s.hubbert@bbk.ac.uk January 24, 2007 1 Introduction In the previous lecture we derived the famous Capital Asset Pricing Model (CAPM) for expected asset returns,

More information

Business Statistics Final Exam

Business Statistics Final Exam Business Statistics Final Exam Winter 2018 This is a closed-book, closed-notes exam. You may use a calculator. Please answer all problems in the space provided on the exam. Read each question carefully

More information

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either

More information

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of

More information

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT) Regression Review and Robust Regression Slides prepared by Elizabeth Newton (MIT) S-Plus Oil City Data Frame Monthly Excess Returns of Oil City Petroleum, Inc. Stocks and the Market SUMMARY: The oilcity

More information

Monotonically Constrained Bayesian Additive Regression Trees

Monotonically Constrained Bayesian Additive Regression Trees Constrained Bayesian Additive Regression Trees Robert McCulloch University of Chicago, Booth School of Business Joint with: Hugh Chipman (Acadia), Ed George (UPenn, Wharton), Tom Shively (U Texas, McCombs)

More information

Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras

Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras Lecture 23 Minimum Cost Flow Problem In this lecture, we will discuss the minimum cost

More information

Multidimensional Monotonicity Discovery with mbart

Multidimensional Monotonicity Discovery with mbart Multidimensional Monotonicity Discovery with mart Rob McCulloch Arizona State Collaborations with: Hugh Chipman (Acadia), Edward George (Wharton, University of Pennsylvania), Tom Shively (UT Austin) October

More information

9. Logit and Probit Models For Dichotomous Data

9. Logit and Probit Models For Dichotomous Data Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar

More information

Section 0: Introduction and Review of Basic Concepts

Section 0: Introduction and Review of Basic Concepts Section 0: Introduction and Review of Basic Concepts Carlos M. Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1 Getting Started Syllabus

More information

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price Orange Juice Sales and Prices In this module, you will be looking at sales and price data for orange juice in grocery stores. You have data from 83 stores on three brands (Tropicana, Minute Maid, and the

More information

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING International Civil Aviation Organization 27/8/10 WORKING PAPER REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING Cairo 2 to 4 November 2010 Agenda Item 3 a): Forecasting Methodology (Presented

More information

Economics 345 Applied Econometrics

Economics 345 Applied Econometrics Economics 345 Applied Econometrics Problem Set 4--Solutions Prof: Martin Farnham Problem sets in this course are ungraded. An answer key will be posted on the course website within a few days of the release

More information

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link'; BIOS 6244 Analysis of Categorical Data Assignment 5 s 1. Consider Exercise 4.4, p. 98. (i) Write the SAS code, including the DATA step, to fit the linear probability model and the logit model to the data

More information

Risk Analysis. å To change Benchmark tickers:

Risk Analysis. å To change Benchmark tickers: Property Sheet will appear. The Return/Statistics page will be displayed. 2. Use the five boxes in the Benchmark section of this page to enter or change the tickers that will appear on the Performance

More information

The Simple Regression Model

The Simple Regression Model Chapter 2 Wooldridge: Introductory Econometrics: A Modern Approach, 5e Definition of the simple linear regression model "Explains variable in terms of variable " Intercept Slope parameter Dependent var,

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION In Inferential Statistic, ESTIMATION (i) (ii) is called the True Population Mean and is called the True Population Proportion. You must also remember that are not the only population parameters. There

More information

Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation,

Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation, Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation, Hour 2 Hypothesis testing for correlation (Pearson) Correlation and regression. Correlation vs association

More information

A useful modeling tricks.

A useful modeling tricks. .7 Joint models for more than two outcomes We saw that we could write joint models for a pair of variables by specifying the joint probabilities over all pairs of outcomes. In principal, we could do this

More information

ARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS

ARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS TASK Run intervention analysis on the price of stock M: model a function of the price as ARIMA with outliers and interventions. SOLUTION The document below is an abridged version of the solution provided

More information

Predicting Charitable Contributions

Predicting Charitable Contributions Predicting Charitable Contributions By Lauren Meyer Executive Summary Charitable contributions depend on many factors from financial security to personal characteristics. This report will focus on demographic

More information

Maths/stats support 12 Spearman s rank correlation

Maths/stats support 12 Spearman s rank correlation Maths/stats support 12 Spearman s rank correlation Using Spearman s rank correlation Use a Spearman s rank correlation test when you ve got two variables and you want to see if they are correlated. Your

More information

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 For this assignment use the Diamonds dataset in the Stat2Data library. The dataset is used in examples

More information

Chapter 8 Statistical Intervals for a Single Sample

Chapter 8 Statistical Intervals for a Single Sample Chapter 8 Statistical Intervals for a Single Sample Part 1: Confidence intervals (CI) for population mean µ Section 8-1: CI for µ when σ 2 known & drawing from normal distribution Section 8-1.2: Sample

More information

Risk Neutral Agent. Class 4

Risk Neutral Agent. Class 4 Risk Neutral Agent Class 4 How to Pay Tree Planters? Consequences of Hidden Action q=e+u u (0, ) c(e)=0.5e 2 Agent is risk averse Principal is risk neutral w = a + bq No Hidden Action Hidden Action b*

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis These models are appropriate when the response

More information

WEB APPENDIX 8A 7.1 ( 8.9)

WEB APPENDIX 8A 7.1 ( 8.9) WEB APPENDIX 8A CALCULATING BETA COEFFICIENTS The CAPM is an ex ante model, which means that all of the variables represent before-the-fact expected values. In particular, the beta coefficient used in

More information

Stochastic Manufacturing & Service Systems. Discrete-time Markov Chain

Stochastic Manufacturing & Service Systems. Discrete-time Markov Chain ISYE 33 B, Fall Week #7, September 9-October 3, Introduction Stochastic Manufacturing & Service Systems Xinchang Wang H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of

More information

MODEL SELECTION CRITERIA IN R:

MODEL SELECTION CRITERIA IN R: 1. R 2 statistics We may use MODEL SELECTION CRITERIA IN R R 2 = SS R SS T = 1 SS Res SS T or R 2 Adj = 1 SS Res/(n p) SS T /(n 1) = 1 ( ) n 1 (1 R 2 ). n p where p is the total number of parameters. R

More information

Test #1 (Solution Key)

Test #1 (Solution Key) STAT 47/67 Test #1 (Solution Key) 1. (To be done by hand) Exploring his own drink-and-drive habits, a student recalls the last 7 parties that he attended. He records the number of cans of beer he drank,

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

36106 Managerial Decision Modeling Sensitivity Analysis

36106 Managerial Decision Modeling Sensitivity Analysis 1 36106 Managerial Decision Modeling Sensitivity Analysis Kipp Martin University of Chicago Booth School of Business September 26, 2017 Reading and Excel Files 2 Reading (Powell and Baker): Section 9.5

More information

The basic goal of regression analysis is to use data to analyze relationships.

The basic goal of regression analysis is to use data to analyze relationships. 01-Kahane-45364.qxd 11/9/2007 4:39 PM Page 1 1 An Introduction to the Linear Regression Model The basic goal of regression analysis is to use data to analyze relationships. Thus, the starting point for

More information

Hedging and Regression. Hedging and Regression

Hedging and Regression. Hedging and Regression Returns The discrete return on a stock is the percentage change: S i S i 1 S i 1. The index i can represent days, weeks, hours etc. What happens if we compute returns at infinitesimally short intervals

More information

Chapter 5. Sampling Distributions

Chapter 5. Sampling Distributions Lecture notes, Lang Wu, UBC 1 Chapter 5. Sampling Distributions 5.1. Introduction In statistical inference, we attempt to estimate an unknown population characteristic, such as the population mean, µ,

More information

Lecture 3: Factor models in modern portfolio choice

Lecture 3: Factor models in modern portfolio choice Lecture 3: Factor models in modern portfolio choice Prof. Massimo Guidolin Portfolio Management Spring 2016 Overview The inputs of portfolio problems Using the single index model Multi-index models Portfolio

More information

Business Statistics 41000: Probability 3

Business Statistics 41000: Probability 3 Business Statistics 41000: Probability 3 Drew D. Creal University of Chicago, Booth School of Business February 7 and 8, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office: 404

More information

NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam

NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam Do not look at other pages until instructed to do so. The time limit is two hours. This exam consists of 6 problems. Do all of your work

More information

Confidence Intervals. σ unknown, small samples The t-statistic /22

Confidence Intervals. σ unknown, small samples The t-statistic /22 Confidence Intervals σ unknown, small samples The t-statistic 1 /22 Homework Read Sec 7-3. Discussion Question pg 365 Do Ex 7-3 1-4, 6, 9, 12, 14, 15, 17 2/22 Objective find the confidence interval for

More information

Jacob: What data do we use? Do we compile paid loss triangles for a line of business?

Jacob: What data do we use? Do we compile paid loss triangles for a line of business? PROJECT TEMPLATES FOR REGRESSION ANALYSIS APPLIED TO LOSS RESERVING BACKGROUND ON PAID LOSS TRIANGLES (The attached PDF file has better formatting.) {The paid loss triangle helps you! distinguish between

More information

Homework Assignments for BusAdm 713: Business Forecasting Methods. Assignment 1: Introduction to forecasting, Review of regression

Homework Assignments for BusAdm 713: Business Forecasting Methods. Assignment 1: Introduction to forecasting, Review of regression Homework Assignments for BusAdm 713: Business Forecasting Methods Note: Problem points are in parentheses. Assignment 1: Introduction to forecasting, Review of regression 1. (3) Complete the exercises

More information

Linear functions Increasing Linear Functions. Decreasing Linear Functions

Linear functions Increasing Linear Functions. Decreasing Linear Functions 3.5 Increasing, Decreasing, Max, and Min So far we have been describing graphs using quantitative information. That s just a fancy way to say that we ve been using numbers. Specifically, we have described

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. CHAPTER FORM A Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Determine whether the given ordered pair is a solution of the given equation.

More information

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay Midterm GSB Honor Code: I pledge my honor that I have not violated the Honor Code during this examination.

More information

Module 4: Point Estimation Statistics (OA3102)

Module 4: Point Estimation Statistics (OA3102) Module 4: Point Estimation Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 8.1-8.4 Revision: 1-12 1 Goals for this Module Define

More information

The Simple Regression Model

The Simple Regression Model Chapter 2 Wooldridge: Introductory Econometrics: A Modern Approach, 5e Definition of the simple linear regression model Explains variable in terms of variable Intercept Slope parameter Dependent variable,

More information

STA 371G Outline Fall 2018

STA 371G Outline Fall 2018 STA 371G Outline Fall 2018 Instruct: Mingyuan Zhou, Ph.D., Assistant Profess of Statistics Office: CBA 6.458 Phone: 512-232-6763 Email: mingyuan.zhou@mccombs.utexas.edu Website: http://mingyuanzhou.github.io/

More information

The Norwegian State Equity Ownership

The Norwegian State Equity Ownership The Norwegian State Equity Ownership B A Ødegaard 15 November 2018 Contents 1 Introduction 1 2 Doing a performance analysis 1 2.1 Using R....................................................................

More information

b) According to the statistics above the graph, the slope is What are the units and meaning of this value?

b) According to the statistics above the graph, the slope is What are the units and meaning of this value? ! Name: Date: Hr: LINEAR MODELS Writing Motion Equations 1) Answer the following questions using the position vs. time graph of a runner in a race shown below. Be sure to show all work (formula, substitution,

More information

Case 2: Motomart INTRODUCTION OBJECTIVES

Case 2: Motomart INTRODUCTION OBJECTIVES Case 2: Motomart INTRODUCTION The Motomart case is designed to supplement your Managerial/ Cost Accounting textbook coverage of cost behavior and variable costing using real-world cost data and an auto-industryaccepted

More information