Multiple linear regression
|
|
- Rosalyn Sharleen Kelley
- 5 years ago
- Views:
Transcription
1 Multiple linear regression Business Statistics Spring
2 Topics 1. Including multiple predictors 2. Controlling for confounders 3. Transformations, interactions, dummy variables OpenIntro 8.1, Super Cruchers excerpt. 2
3 Simple linear regression recap We saw last week how to use the least squares criterion to define the best linear predictor. We also saw how to use R or Excel to compute the best linear predictor on a given data set. The best in-sample linear predictor is probably not the true linear predictor, but with enough data it should be similar. We can use the idea of a confidence interval to help us gauge how much we trust our fit. 3
4 Square feet versus sale price The least squares line-of-best-fit for the housing data is a = and b = Price in dollars Square Feet The residual standard error the noise level is ˆσ = $22, 480 in this case. Why is it so big? What can we do to make it smaller? 4
5 Bedrooms versus sale price Perhaps we could find a better predictor. If we use the number of bedrooms instead of price we get this fit. Price in dollars Square feet Now ˆσ = $22, 940, which is not an improvement. Couldn t we use both square feet and number of bedrooms to predict? 5
6 The best linear multivariate predictor We still want to find a prediction ŷ to minimize our squared error but now we have E{(ŷ Y ) 2 }, ŷ = b 0 + b 1 X 1 + b 2 X 2... b p X p For a whole list of predictor variables. Applied to a data set, this becomes the optimization problem: find coefficients b 0... b p that minimize: n b 0 + j i=1 2 b j x ij y i. Why are there two subscripts on x ij? 6
7 Including more predictors can improve prediction If we include both square feet and the number of bedrooms in our prediction of price, the residual standard error drops to ˆσ = $21, 100. Price in dollars Predicted price Plotting in this case is trickier...but to get a sense of our prediction accuracy we can look at an predicted versus actual plot. 7
8 Including more predictors can improve prediction If we include SqFt, Bedrooms, Bathrooms and Brick in our prediction of price, the residual standard error drops to ˆσ = $17, 630. Price in dollars Predicted price What would this plot look like if we had all the relevant determinants of price? 8
9 Multiple linear regression in R > summary(housefit) Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + Brick) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) SqFt *** Bedrooms e-05 *** Bathrooms ** BrickYes e-09 *** Residual standard error: on 123 degrees of freedom Multiple R-squared: , Adjusted R-squared: R 2 is a standard measure of goodness-of-fit, but I like ˆσ better. 9
10 Plug-in predictions Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) SqFt *** Bedrooms e-05 *** Bathrooms ** BrickYes e-09 *** Just as in the single-predictor case, we can calculate predictions by plugging in particular values for the predictor variables: ŷ = (2000) (3) (2) (1) would be our prediction for a 2K sq ft, three bed, two bath, brick home. 10
11 Categorical predictors Can we use information in a linear regression even if it isn t numerical? In the housing data we have three neighborhoods, denoted 1, 2 and 3. Why would we potentially not want to include the Nbhd variable into the regression as-is? We can, via the creation of dummy variables. If we have k categories, we create k extra columns in each row exactly one column can be a one. What happens if we include an intercept in this model? 11
12 Dummy variable with no intercept Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + as.factor(nbhd) + Brick - 1) Coefficients: Estimate Std. Error t value Pr(> t ) SqFt e-07 *** Bedrooms Bathrooms ** as.factor(nbhd) as.factor(nbhd) * as.factor(nbhd) e-05 *** BrickYes e-12 *** Residual standard error: on 121 degrees of freedom Multiple R-squared: , Adjusted R-squared:
13 Dummy variable with an intercept Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + Brick + as.factor(nbhd)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) SqFt e-07 *** Bedrooms Bathrooms ** BrickYes e-12 *** as.factor(nbhd) as.factor(nbhd) < 2e-16 *** Residual standard error: on 121 degrees of freedom Multiple R-squared: 0.805, Adjusted R-squared:
14 In-sample prediction accuracy Price in dollars Predicted price Our predictions are getting progressively better. Or are they? 14
15 Over-fitting As you continue to add more and more predictors, you will notice R 2 gets closer and closer to 1. As a crazy though experiment, would this happen even if we kept including garbage variables? In addition to the variables above, let s include 100 junk variable (drawn from a normal distribution) and see what happens. garbage <- matrix(rnorm(100*128),128,100) 15
16 Predicting with garbage Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + as.factor(nbhd) + Brick garbage) Coefficients: Estimate Std. Error t value Pr(> t ) SqFt Bedrooms Bathrooms as.factor(nbhd) as.factor(nbhd) as.factor(nbhd) BrickYes ***. garbage *. garbage * Residual standard error: on 21 degrees of freedom Multiple R-squared: , Adjusted R-squared:
17 Over-fitting Price in dollars Predicted price One simple way to check for over-fitting is to use a hold-out set of data and try to predict them without peeking. 17
18 Interactions What if we think that the price-premium associated with brick might be different between different neighborhoods? Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + as.factor(nbhd):brick - 1) Coefficients: Estimate Std. Error t value Pr(> t ) SqFt e-07 *** Bedrooms Bathrooms * as.factor(nbhd)1:brickno as.factor(nbhd)2:brickno * as.factor(nbhd)3:brickno e-05 *** as.factor(nbhd)1:brickyes ** as.factor(nbhd)2:brickyes *** as.factor(nbhd)3:brickyes e-09 *** Residual standard error: on 119 degrees of freedom Multiple R-squared: , Adjusted R-squared: Adding an interaction term to our regression model explicitly accounts for this possibility. 18
19 Interactions Here is an equivalent way to run this regression. Call: lm(formula = Price ~ SqFt + Bedrooms + Bathrooms + as.factor(nbhd) * Brick - 1) Coefficients: Estimate Std. Error t value Pr(> t ) SqFt e-07 *** Bedrooms Bathrooms * as.factor(nbhd) as.factor(nbhd) * as.factor(nbhd) e-05 *** BrickYes * as.factor(nbhd)2:brickyes as.factor(nbhd)3:brickyes Residual standard error: on 119 degrees of freedom Multiple R-squared: , Adjusted R-squared: How can we see that these are equivalent? Which one do you prefer in terms of interpretation? 19
20 Time series Consider predicting the temperature based on day of the year. These are Chicago daily highs, in Fahrenheit, Daily high temp Time We can often turn nonlinear problem into linear problems by transforming our predictor variables in various ways and using many of them to predict. 20
21 Transformations Since we suspect a seasonal trend, let us create the following two predictor variables: x 1 = sin(2πt/365) and x 2 = cos(2πt/365). We then use least-squares, via lm(), to find a linear prediction rule. Call: lm(formula = y ~ x1 + x2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** x <2e-16 *** x <2e-16 *** Residual standard error: on 1092 degrees of freedom Multiple R-squared: , Adjusted R-squared:
22 Nonlinear prediction via linear regression Daily high temp Time yhat < *sin(2*pi*t/365) *cos(2*pi*t/365) 22
23 Nonlinear prediction via linear regression If we consider a string of 50 days, the daily high temps are sticky...the temp today looks like the temp in the preceding days. Daily high temp Time This suggests we can use previous days weather to predict today s weather. 23
24 Auto-regression Plotting today s weather versus tomorrow s weather gives a nice clean correlation. Temp tomorrow Temp today Running a linear regression will produce a prediction rule. What do you suppose the slope coefficient will be close to? 24
25 Auto-regression Here s how we set this up in R. > today <- y[1:149] > tomorrow <- y[2:150] > temp_auto_reg <- lm(tomorrow~today) > summary(temp_auto_reg) Call: lm(formula = tomorrow ~ today) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** today <2e-16 *** Residual standard error: on 147 degrees of freedom Multiple R-squared: , Adjusted R-squared: We still have nearly 9 degree swings from day to day. 25
26 Auto-regression On a two-day lag the predictability decreases. > today <- y[1:148] > tomorrow <- y[2:149] > dayaftertomorrow <- y[3:150] Call: lm(formula = dayaftertomorrow ~ today) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-05 *** today < 2e-16 *** Residual standard error: on 146 degrees of freedom Multiple R-squared: , Adjusted R-squared: The two-day variability is nearly 12 degrees. 26
27 Auto-regression What happens if we include both today and yesterday to predict tomorrow? > yesterday <- y[1:148] > today <- y[2:149] > tomorrow <- y[3:150] Call: lm(formula = tomorrow ~ today + yesterday) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** today < 2e-16 *** yesterday Residual standard error: on 145 degrees of freedom Multiple R-squared: , Adjusted R-squared: Yesterday s weather is old news! 27
28 MBA beer survey How many beers can you drink before becoming drunk? height number of beers 28
29 MBA beer survey Height seems to be a valuable predictor of beer tolerance. Call: lm(formula = nbeer ~ height) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** height e-06 *** Residual standard error: on 48 degrees of freedom Multiple R-squared: , Adjusted R-squared:
30 MBA beer survey But weight seems also to be relevant. weight number of beers So weight and height both seem predictive, but is one more important than the other? 30
31 MBA beer survey It appears that weight is the relevant variable. Call: lm(formula = nbeer ~ height + weight) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) height weight *** Residual standard error: on 47 degrees of freedom Multiple R-squared: , Adjusted R-squared: On what basis is this determination made? 31
32 Prediction versus intervention We are always safe interpreting our regression models as prediction engines steps the computer follows for turning data into forecasts. We are on much shakier ground when we try to interpret our regression coefficients as knobs to be adjusted. As we reminder ourselves last week, correlation does not imply causation. Straight teeth do not cause nice cars, remember? Essentially we have two alternate explanations: either causation in the other direction (umbrellas do not lead to rain), or common cause (rich folks have nice cars and nice teeth). The first one we have to use common sense. For the second problem lurking confounders we can possibly adjust or control for them. 32
33 Controlling = matching When we include a variable in a regression, we sometimes say that we are controlling for that variable. The intuition is that if we compare like-with-like, then our regression parameters make good mechanistic sense. So, presumably if I looked only at groups of individuals in the same socio-economic status, there would be no remaining relationship between the quality of one s smile and price of one s car. What we are aiming for is a rich enough set of predictors that the variation within each slice of the population (observations) is random there is no hidden structure to trick us. 33
34 Sales versus price Suppose you own a taco truck. The past three years of weekly sales and price data look like this: Sales Price Apparently we should raise prices, right? Bigger price is better, clearly. Or is it? 34
35 Sales versus price The result is statistically significant. Call: lm(formula = sales ~ p1) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** p <2e-16 *** Residual standard error: on 154 degrees of freedom Multiple R-squared: , Adjusted R-squared: How should we interpret this result? 35
36 Price versus sales What if we account for our competitor s price? Competition Price Our Price What do you suppose this tells us? What is this a proxy for? 36
37 Sales versus price The result is not statistically significant, but the least squares coefficient on our price variable changes sign! Call: lm(formula = sales ~ p1 + p2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** p p e-05 *** Residual standard error: on 153 degrees of freedom Multiple R-squared: 0.525, Adjusted R-squared:
38 Simpson s paradox, revisited Within each color, what is the sign of the slope? Sales Our Price 38
39 The kitchen sink regression In an effort to clear out all unwanted confounding so we can interpret our regression coefficients cleanly we often reach for any and all available predictor variables. But this has its downsides. Specifically there are both statistical and also interpretational reasons not to do this. We have already seen the statistical argument, which is that we will tend to over-fit, and we become less certain about our estimates because our effective sample size decreases as we add more predictor variables. But there is another reason not to just throw everything into our regression models willy-nilly. 39
40 Intermediate outcomes Suppose we want to learn about how smoking relates to cancer rates by zip code. That is, Y = cancer rate is our response/outcome variable and X = smoking rate is our predictor variable. To avoid confounding, we control for many other attributes, such as average income, racial make-up, average age, crime rates, etc. Suppose we also included a measure of lung tar in our regression. What do you suppose would happen to the estimated impact of smoking? 40
Multiple regression - a brief introduction
Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict
More informationHomework Assignment Section 3
Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.
More informationRegression and Simulation
Regression and Simulation This is an introductory R session, so it may go slowly if you have never used R before. Do not be discouraged. A great way to learn a new language like this is to plunge right
More informationNon-linearities in Simple Regression
Non-linearities in Simple Regression 1. Eample: Monthly Earnings and Years of Education In this tutorial, we will focus on an eample that eplores the relationship between total monthly earnings and years
More informationBusiness Statistics: A First Course
Business Statistics: A First Course Fifth Edition Chapter 12 Correlation and Simple Linear Regression Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc. Chap 12-1 Learning Objectives In this
More informationHomework Assignment Section 3
Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.
More informationMilestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty
Milestone2 Zillow House Price Prediciton Group Lingzi Hong and Pranali Shetty MILESTONE 2 REPORT Data Collection The following additional features were added 1. Population, Number of College Graduates
More informationEconometrics and Economic Data
Econometrics and Economic Data Chapter 1 What is a regression? By using the regression model, we can evaluate the magnitude of change in one variable due to a certain change in another variable. For example,
More informationDummy Variables. 1. Example: Factors Affecting Monthly Earnings
Dummy Variables A dummy variable or binary variable is a variable that takes on a value of 0 or 1 as an indicator that the observation has some kind of characteristic. Common examples: Sex (female): FEMALE=1
More information11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression
Multiple Linear Regression Analysis BSAD 30 Dave Novak Fall 208 Source: Ragsdale, 208 Spreadsheet Modeling and Decision Analysis 8 th edition 207 Cengage Learning 2 Overview Last class we considered the
More informationStat 401XV Exam 3 Spring 2017
Stat 40XV Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationLet us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.
Mixed-effects models An introduction by Christoph Scherber Up to now, we have been dealing with linear models of the form where ß0 and ß1 are parameters of fixed value. Example: Let us assume that we are
More informationAnalysis of Variance in Matrix form
Analysis of Variance in Matrix form The ANOVA table sums of squares, SSTO, SSR and SSE can all be expressed in matrix form as follows. week 9 Multiple Regression A multiple regression model is a model
More informationRegression. Lecture Notes VII
Regression Lecture Notes VII Statistics 112, Fall 2002 Outline Predicting based on Use of the conditional mean (the regression function) to make predictions. Prediction based on a sample. Regression line.
More informationStat 101 Exam 1 - Embers Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.
More informationCopyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.
Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1
More informationChapter 18: The Correlational Procedures
Introduction: In this chapter we are going to tackle about two kinds of relationship, positive relationship and negative relationship. Positive Relationship Let's say we have two values, votes and campaign
More information6 Multiple Regression
More than one X variable. 6 Multiple Regression Why? Might be interested in more than one marginal effect Omitted Variable Bias (OVB) 6.1 and 6.2 House prices and OVB Should I build a fireplace? The following
More informationJaime Frade Dr. Niu Interest rate modeling
Interest rate modeling Abstract In this paper, three models were used to forecast short term interest rates for the 3 month LIBOR. Each of the models, regression time series, GARCH, and Cox, Ingersoll,
More informationFinal Exam Suggested Solutions
University of Washington Fall 003 Department of Economics Eric Zivot Economics 483 Final Exam Suggested Solutions This is a closed book and closed note exam. However, you are allowed one page of handwritten
More informationEconometric Methods for Valuation Analysis
Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 26 Correlation Analysis Simple Regression
More informationStatistic Midterm. Spring This is a closed-book, closed-notes exam. You may use any calculator.
Statistic Midterm Spring 2018 This is a closed-book, closed-notes exam. You may use any calculator. Please answer all problems in the space provided on the exam. Read each question carefully and clearly
More informationLinear regression model
Regression Model Assumptions (Solutions) STAT-UB.0003: Regression and Forecasting Models Linear regression model 1. Here is the least squares regression fit to the Zagat restaurant data: 10 15 20 25 10
More informationEstimating a demand function
Estimating a demand function One of the most basic topics in economics is the supply/demand curve. Simply put, the supply offered for sale of a commodity is directly related to its price, while the demand
More informationStudy 2: data analysis. Example analysis using R
Study 2: data analysis Example analysis using R Steps for data analysis Install software on your computer or locate computer with software (e.g., R, systat, SPSS) Prepare data for analysis Subjects (rows)
More informationSection 2: Estimation, Confidence Intervals and Testing Hypothesis
Section 2: Estimation, Confidence Intervals and Testing Hypothesis Tengyuan Liang, Chicago Booth https://tyliang.github.io/bus41000/ Suggested Reading: Naked Statistics, Chapters 7, 8, 9 and 10 OpenIntro
More informationSection 2: Estimation, Confidence Intervals and Testing Hypothesis
Section 2: Estimation, Confidence Intervals and Testing Hypothesis Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/
More informationBusiness Statistics. University of Chicago Booth School of Business Fall Jeffrey R. Russell
Business Statistics University of Chicago Booth School of Business Fall 08 Jeffrey R. Russell There is no text book for the course. You may choose to pick up a copy of Statistics for Business and Economics
More informationProblem Set 5 Answers. ( ) 2. Yes, like temperature. See the plot of utility in the notes. Marginal utility should be positive.
Business John H. Cochrane Problem Set Answers Part I A simple very short readings questions. + = + + + = + + + + = ( ). Yes, like temperature. See the plot of utility in the notes. Marginal utility should
More informationChapter 12. Homework. For each situation below, state the independent variable and the dependent variable.
Homework EXERCISE 1 For each situation below, state the independent variable and the dependent variable. a. A study is done to determine if elderly drivers are involved in more motor vehicle fatalities
More informationAP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:
Objectives: INTERPRET the slope and y intercept of a least-squares regression line USE the least-squares regression line to predict y for a given x CALCULATE and INTERPRET residuals and their standard
More informationLecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay
Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay Seasonal Time Series: TS with periodic patterns and useful in predicting quarterly earnings pricing weather-related derivatives
More informationThis homework assignment uses the material on pages ( A moving average ).
Module 2: Time series concepts HW Homework assignment: equally weighted moving average This homework assignment uses the material on pages 14-15 ( A moving average ). 2 Let Y t = 1/5 ( t + t-1 + t-2 +
More informationThe Least Squares Regression Line
The Least Squares Regression Line Section 5.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 pm - 3:30 pm 620 PGH & 5:30 pm - 7:00 pm CASA Department of Mathematics University of Houston
More informationMultiple Regression. Review of Regression with One Predictor
Fall Semester, 2001 Statistics 621 Lecture 4 Robert Stine 1 Preliminaries Multiple Regression Grading on this and other assignments Assignment will get placed in folder of first member of Learning Team.
More informationMultiple Regression and Logistic Regression II. Dajiang 525 Apr
Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from Last Time Multiple regression model: Include multiple predictors in the model = + + + + How to interpret the
More informationSTT 315 Handout and Project on Correlation and Regression (Unit 11)
STT 315 Handout and Project on Correlation and Regression (Unit 11) This material is self contained. It is an introduction to regression that will help you in MSC 317 where you will study the subject in
More informationStatistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron
Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay Homework Assignment #2 Solution April 25, 2003 Each HW problem is 10 points throughout this quarter.
More informationSTA 371G Outline Spring 2014
STA 371G Outline Spring 2014 Profess: Mingyuan Zhou Office: CBA 6.462 Phone: 512-232-6763 Email: mingyuan.zhou@mccombs.utexas.edu Office Hours: Tuesday Thursday 3:30-4:30 PM. You are welcome to come by
More informationInterval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems
Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1 Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide
More informationModels of Patterns. Lecture 3, SMMD 2005 Bob Stine
Models of Patterns Lecture 3, SMMD 2005 Bob Stine Review Speculative investing and portfolios Risk and variance Volatility adjusted return Volatility drag Dependence Covariance Review Example Stock and
More informationR is a collaborative project with many contributors. Type contributors() for more information.
R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type license() or licence() for distribution details. R is a collaborative project
More informationEconomic Response Models in LookAhead
Economic Models in LookAhead Interthinx, Inc. 2013. All rights reserved. LookAhead is a registered trademark of Interthinx, Inc.. Interthinx is a registered trademark of Verisk Analytics. No part of this
More informationIntroduction to Population Modeling
Introduction to Population Modeling In addition to estimating the size of a population, it is often beneficial to estimate how the population size changes over time. Ecologists often uses models to create
More informationPRACTICE PROBLEMS FOR EXAM 2
ST 0 F'08 PRACTICE PROLEMS FOR EAM EAM : THURSDAY /6 Reiland Material covered on test: Chapters 7-9, in text. This material is covered in webassign homework assignments 6-9. Lecture worksheets: - 6 WARNING!
More informationThe data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998
Economics 312 Sample Project Report Jeffrey Parker Introduction This project is based on Exercise 2.12 on page 81 of the Hill, Griffiths, and Lim text. It examines how the sale price of houses in Stockton,
More informationEconometrics is. The estimation of relationships suggested by economic theory
Econometrics is Econometrics is The estimation of relationships suggested by economic theory Econometrics is The estimation of relationships suggested by economic theory The application of mathematical
More informationStat3011: Solution of Midterm Exam One
1 Stat3011: Solution of Midterm Exam One Fall/2003, Tiefeng Jiang Name: Problem 1 (30 points). Choose one appropriate answer in each of the following questions. 1. (B ) The mean age of five people in a
More informationIntro to GLM Day 2: GLM and Maximum Likelihood
Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the
More informationLecture 5 Theory of Finance 1
Lecture 5 Theory of Finance 1 Simon Hubbert s.hubbert@bbk.ac.uk January 24, 2007 1 Introduction In the previous lecture we derived the famous Capital Asset Pricing Model (CAPM) for expected asset returns,
More informationBusiness Statistics Final Exam
Business Statistics Final Exam Winter 2018 This is a closed-book, closed-notes exam. You may use a calculator. Please answer all problems in the space provided on the exam. Read each question carefully
More informationSTATISTICAL DISTRIBUTIONS AND THE CALCULATOR
STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either
More informationBoth the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.
Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of
More informationRegression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)
Regression Review and Robust Regression Slides prepared by Elizabeth Newton (MIT) S-Plus Oil City Data Frame Monthly Excess Returns of Oil City Petroleum, Inc. Stocks and the Market SUMMARY: The oilcity
More informationMonotonically Constrained Bayesian Additive Regression Trees
Constrained Bayesian Additive Regression Trees Robert McCulloch University of Chicago, Booth School of Business Joint with: Hugh Chipman (Acadia), Ed George (UPenn, Wharton), Tom Shively (U Texas, McCombs)
More informationAdvanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras
Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras Lecture 23 Minimum Cost Flow Problem In this lecture, we will discuss the minimum cost
More informationMultidimensional Monotonicity Discovery with mbart
Multidimensional Monotonicity Discovery with mart Rob McCulloch Arizona State Collaborations with: Hugh Chipman (Acadia), Edward George (Wharton, University of Pennsylvania), Tom Shively (UT Austin) October
More information9. Logit and Probit Models For Dichotomous Data
Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar
More informationSection 0: Introduction and Review of Basic Concepts
Section 0: Introduction and Review of Basic Concepts Carlos M. Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1 Getting Started Syllabus
More information$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price
Orange Juice Sales and Prices In this module, you will be looking at sales and price data for orange juice in grocery stores. You have data from 83 stores on three brands (Tropicana, Minute Maid, and the
More informationREGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING
International Civil Aviation Organization 27/8/10 WORKING PAPER REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING Cairo 2 to 4 November 2010 Agenda Item 3 a): Forecasting Methodology (Presented
More informationEconomics 345 Applied Econometrics
Economics 345 Applied Econometrics Problem Set 4--Solutions Prof: Martin Farnham Problem sets in this course are ungraded. An answer key will be posted on the course website within a few days of the release
More informationproc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';
BIOS 6244 Analysis of Categorical Data Assignment 5 s 1. Consider Exercise 4.4, p. 98. (i) Write the SAS code, including the DATA step, to fit the linear probability model and the logit model to the data
More informationRisk Analysis. å To change Benchmark tickers:
Property Sheet will appear. The Return/Statistics page will be displayed. 2. Use the five boxes in the Benchmark section of this page to enter or change the tickers that will appear on the Performance
More informationThe Simple Regression Model
Chapter 2 Wooldridge: Introductory Econometrics: A Modern Approach, 5e Definition of the simple linear regression model "Explains variable in terms of variable " Intercept Slope parameter Dependent var,
More informationT.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION
In Inferential Statistic, ESTIMATION (i) (ii) is called the True Population Mean and is called the True Population Proportion. You must also remember that are not the only population parameters. There
More informationToday's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation,
Today's Agenda Hour 1 Correlation vs association, Pearson s R, non-linearity, Spearman rank correlation, Hour 2 Hypothesis testing for correlation (Pearson) Correlation and regression. Correlation vs association
More informationA useful modeling tricks.
.7 Joint models for more than two outcomes We saw that we could write joint models for a pair of variables by specifying the joint probabilities over all pairs of outcomes. In principal, we could do this
More informationARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS
TASK Run intervention analysis on the price of stock M: model a function of the price as ARIMA with outliers and interventions. SOLUTION The document below is an abridged version of the solution provided
More informationPredicting Charitable Contributions
Predicting Charitable Contributions By Lauren Meyer Executive Summary Charitable contributions depend on many factors from financial security to personal characteristics. This report will focus on demographic
More informationMaths/stats support 12 Spearman s rank correlation
Maths/stats support 12 Spearman s rank correlation Using Spearman s rank correlation Use a Spearman s rank correlation test when you ve got two variables and you want to see if they are correlated. Your
More informationSTATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15
STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 For this assignment use the Diamonds dataset in the Stat2Data library. The dataset is used in examples
More informationChapter 8 Statistical Intervals for a Single Sample
Chapter 8 Statistical Intervals for a Single Sample Part 1: Confidence intervals (CI) for population mean µ Section 8-1: CI for µ when σ 2 known & drawing from normal distribution Section 8-1.2: Sample
More informationRisk Neutral Agent. Class 4
Risk Neutral Agent Class 4 How to Pay Tree Planters? Consequences of Hidden Action q=e+u u (0, ) c(e)=0.5e 2 Agent is risk averse Principal is risk neutral w = a + bq No Hidden Action Hidden Action b*
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis These models are appropriate when the response
More informationWEB APPENDIX 8A 7.1 ( 8.9)
WEB APPENDIX 8A CALCULATING BETA COEFFICIENTS The CAPM is an ex ante model, which means that all of the variables represent before-the-fact expected values. In particular, the beta coefficient used in
More informationStochastic Manufacturing & Service Systems. Discrete-time Markov Chain
ISYE 33 B, Fall Week #7, September 9-October 3, Introduction Stochastic Manufacturing & Service Systems Xinchang Wang H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of
More informationMODEL SELECTION CRITERIA IN R:
1. R 2 statistics We may use MODEL SELECTION CRITERIA IN R R 2 = SS R SS T = 1 SS Res SS T or R 2 Adj = 1 SS Res/(n p) SS T /(n 1) = 1 ( ) n 1 (1 R 2 ). n p where p is the total number of parameters. R
More informationTest #1 (Solution Key)
STAT 47/67 Test #1 (Solution Key) 1. (To be done by hand) Exploring his own drink-and-drive habits, a student recalls the last 7 parties that he attended. He records the number of cans of beer he drank,
More informationCharacterization of the Optimum
ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing
More information36106 Managerial Decision Modeling Sensitivity Analysis
1 36106 Managerial Decision Modeling Sensitivity Analysis Kipp Martin University of Chicago Booth School of Business September 26, 2017 Reading and Excel Files 2 Reading (Powell and Baker): Section 9.5
More informationThe basic goal of regression analysis is to use data to analyze relationships.
01-Kahane-45364.qxd 11/9/2007 4:39 PM Page 1 1 An Introduction to the Linear Regression Model The basic goal of regression analysis is to use data to analyze relationships. Thus, the starting point for
More informationHedging and Regression. Hedging and Regression
Returns The discrete return on a stock is the percentage change: S i S i 1 S i 1. The index i can represent days, weeks, hours etc. What happens if we compute returns at infinitesimally short intervals
More informationChapter 5. Sampling Distributions
Lecture notes, Lang Wu, UBC 1 Chapter 5. Sampling Distributions 5.1. Introduction In statistical inference, we attempt to estimate an unknown population characteristic, such as the population mean, µ,
More informationLecture 3: Factor models in modern portfolio choice
Lecture 3: Factor models in modern portfolio choice Prof. Massimo Guidolin Portfolio Management Spring 2016 Overview The inputs of portfolio problems Using the single index model Multi-index models Portfolio
More informationBusiness Statistics 41000: Probability 3
Business Statistics 41000: Probability 3 Drew D. Creal University of Chicago, Booth School of Business February 7 and 8, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office: 404
More informationNCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam
NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam Do not look at other pages until instructed to do so. The time limit is two hours. This exam consists of 6 problems. Do all of your work
More informationConfidence Intervals. σ unknown, small samples The t-statistic /22
Confidence Intervals σ unknown, small samples The t-statistic 1 /22 Homework Read Sec 7-3. Discussion Question pg 365 Do Ex 7-3 1-4, 6, 9, 12, 14, 15, 17 2/22 Objective find the confidence interval for
More informationJacob: What data do we use? Do we compile paid loss triangles for a line of business?
PROJECT TEMPLATES FOR REGRESSION ANALYSIS APPLIED TO LOSS RESERVING BACKGROUND ON PAID LOSS TRIANGLES (The attached PDF file has better formatting.) {The paid loss triangle helps you! distinguish between
More informationHomework Assignments for BusAdm 713: Business Forecasting Methods. Assignment 1: Introduction to forecasting, Review of regression
Homework Assignments for BusAdm 713: Business Forecasting Methods Note: Problem points are in parentheses. Assignment 1: Introduction to forecasting, Review of regression 1. (3) Complete the exercises
More informationLinear functions Increasing Linear Functions. Decreasing Linear Functions
3.5 Increasing, Decreasing, Max, and Min So far we have been describing graphs using quantitative information. That s just a fancy way to say that we ve been using numbers. Specifically, we have described
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
CHAPTER FORM A Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Determine whether the given ordered pair is a solution of the given equation.
More informationGraduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm
Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay Midterm GSB Honor Code: I pledge my honor that I have not violated the Honor Code during this examination.
More informationModule 4: Point Estimation Statistics (OA3102)
Module 4: Point Estimation Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 8.1-8.4 Revision: 1-12 1 Goals for this Module Define
More informationThe Simple Regression Model
Chapter 2 Wooldridge: Introductory Econometrics: A Modern Approach, 5e Definition of the simple linear regression model Explains variable in terms of variable Intercept Slope parameter Dependent variable,
More informationSTA 371G Outline Fall 2018
STA 371G Outline Fall 2018 Instruct: Mingyuan Zhou, Ph.D., Assistant Profess of Statistics Office: CBA 6.458 Phone: 512-232-6763 Email: mingyuan.zhou@mccombs.utexas.edu Website: http://mingyuanzhou.github.io/
More informationThe Norwegian State Equity Ownership
The Norwegian State Equity Ownership B A Ødegaard 15 November 2018 Contents 1 Introduction 1 2 Doing a performance analysis 1 2.1 Using R....................................................................
More informationb) According to the statistics above the graph, the slope is What are the units and meaning of this value?
! Name: Date: Hr: LINEAR MODELS Writing Motion Equations 1) Answer the following questions using the position vs. time graph of a runner in a race shown below. Be sure to show all work (formula, substitution,
More informationCase 2: Motomart INTRODUCTION OBJECTIVES
Case 2: Motomart INTRODUCTION The Motomart case is designed to supplement your Managerial/ Cost Accounting textbook coverage of cost behavior and variable costing using real-world cost data and an auto-industryaccepted
More information