The Simple Regression Model

Similar documents
The Simple Regression Model

Basic Regression Analysis with Time Series Data

Econometrics and Economic Data

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Linear Regression with One Regressor

Appendix A (Pornprasertmanit & Little, in press) Mathematical Proof

Principles of Econometrics Mid-Term

Universidade NOVA de Lisboa Faculdade de Economia

Econometric Methods for Valuation Analysis

Empirical Methods for Corporate Finance. Panel Data, Fixed Effects, and Standard Errors

Topic 8 Lecture 1 Estimating Policy Effects in the Presence of. Endogeneity via the Linear Instrumental Variables (IV) Method

F UNCTIONAL R ELATIONSHIPS BETWEEN S TOCK P RICES AND CDS S PREADS

Chapter 4 Level of Volatility in the Indian Stock Market

Estimating the Current Value of Time-Varying Beta

Business Statistics: A First Course

A SEARCH FOR A STABLE LONG RUN MONEY DEMAND FUNCTION FOR THE US

Multiple Regression. Review of Regression with One Predictor

INTERNATIONAL REAL ESTATE REVIEW 2002 Vol. 5 No. 1: pp Housing Demand with Random Group Effects

Introduction to Population Modeling

DATABASE AND RESEARCH METHODOLOGY

NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

University of Nottingham

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Time Observations Time Period, t

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Advanced Topic 7: Exchange Rate Determination IV

Rand Final Pop 2. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

An Instrumental Variables Panel Data Approach to. Farm Specific Efficiency Estimation

Financial Development and Economic Growth at Different Income Levels

Computer Lab Session 3 The Generalized Linear Regression Model

Sampling and sampling distribution

Public Economics. Contact Information

Multivariate Statistics Lecture Notes. Stephen Ansolabehere

Econometrics is. The estimation of relationships suggested by economic theory

Acemoglu, et al (2008) cast doubt on the robustness of the cross-country empirical relationship between income and democracy. They demonstrate that

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 6. Transformation of Variables

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

Financial Econometrics

Module 4: Point Estimation Statistics (OA3102)

Supplement materials for Early network events in the later success of Chinese entrepreneurs

Labor Economics Field Exam Spring 2014

9. Logit and Probit Models For Dichotomous Data

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

23571 Introductory Econometrics Assignment B (Spring 2017)

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

2009/2010 CAIA Prerequisite Diagnostic Review (PDR) And Answer Key

Volume 37, Issue 2. Handling Endogeneity in Stochastic Frontier Analysis

Volume 35, Issue 1. Thai-Ha Le RMIT University (Vietnam Campus)

3.3-Measures of Variation

Stat3011: Solution of Midterm Exam One

Econometric Analysis of the Mortgage Loans Dependence on Per Capita Income

Estimating a demand function

British Journal of Economics, Finance and Management Sciences 29 July 2017, Vol. 14 (1)

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Intro to GLM Day 2: GLM and Maximum Likelihood

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Comparison of OLS and LAD regression techniques for estimating beta

Final Exam - section 1. Thursday, December hours, 30 minutes

Analysis of Variance in Matrix form

CHAPTER 11 Regression with a Binary Dependent Variable. Kazu Matsuda IBEC PHBU 430 Econometrics

Long Run Stock Returns after Corporate Events Revisited. Hendrik Bessembinder. W.P. Carey School of Business. Arizona State University.

Online Appendix: Asymmetric Effects of Exogenous Tax Changes

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Topic 4: Introduction to Exchange Rates Part 1: Definitions and empirical regularities

The Determinants of Corporate Debt Maturity Structure

Benchmarking Credit ratings

Forecasting Stock Index Futures Price Volatility: Linear vs. Nonlinear Models

Economics 345 Applied Econometrics

THE CHANGING SIZE DISTRIBUTION OF U.S. TRADE UNIONS AND ITS DESCRIPTION BY PARETO S DISTRIBUTION. John Pencavel. Mainz, June 2012

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Topic 4: Introduction to Exchange Rates Part 1: Definitions and empirical regularities

Volume Title: Institutional Investors and Corporate Stock A Background Study. Volume Author/Editor: Raymond W. Goldsmith, ed.

Sales and Revenue Forecasts of Fishing and Hunting Licenses in Minnesota

Variance clustering. Two motivations, volatility clustering, and implied volatility

STAT Chapter 6: Sampling Distributions

GARCH Models. Instructor: G. William Schwert

CHAPTER III METHODOLOGY

8.1 Estimation of the Mean and Proportion

Stock Price Sensitivity

Econometric Methods for Valuation Analysis

Panel Regression of Out-of-the-Money S&P 500 Index Put Options Prices

Linear regression model

Threshold cointegration and nonlinear adjustment between stock prices and dividends

Homework Assignment Section 3

Generalized Modified Ratio Type Estimator for Estimation of Population Variance

Impact of Terrorism on Foreign Direct Investment in Pakistan

Context Power analyses for logistic regression models fit to clustered data

Example 1 of econometric analysis: the Market Model

Lecture 5. Predictability. Traditional Views of Market Efficiency ( )

Volume Title: Bank Stock Prices and the Bank Capital Problem. Volume URL:

Statistical Models and Methods for Financial Markets

Online Appendix to Bond Return Predictability: Economic Value and Links to the Macroeconomy. Pairwise Tests of Equality of Forecasting Performance

Does a financial crisis affect operating risk? Evidence from Polish listed companies 1

Volatility Persistence in Commodity Futures: Inventory and Time-to-Delivery Effects by Berna Karali and Walter N. Thurman

WEB APPENDIX 8A 7.1 ( 8.9)

Transcription:

Chapter 2 Wooldridge: Introductory Econometrics: A Modern Approach, 5e Definition of the simple linear regression model "Explains variable in terms of variable " Intercept Slope parameter Dependent var, explained var, response var, predicted var, regressand Independent var, explanatory var, control va, predictor var, regressor. Error term, disturbance, unobservables, 1

Interpretation of the simple linear regression model "Studies how varies with changes in :" as long as By how much does the dependent variable change if the independent variable is increased by one unit? Interpretation only correct if all other things remain equal when the independent variable is increased by one unit The simple linear regression model is rarely applicable in practice but its discussion is useful for pedagogical reasons Example: Soybean yield and fertilizer Measures the effect of fertilizer on yield, holding all other factors fixed Rainfall, land quality, presence of parasites, Example: A simple wage equation Measures the change in hourly wage given another year of education, holding all other factors fixed Labor force experience, tenure with current employer, work ethic, intelligence 2

When is there a causal interpretation? Conditional mean independence assumption Example: wage equation The explanatory variable must not contain information about the mean of the unobserved factors e.g. intelligence The conditional mean independence assumption is unlikely to hold because individuals with more education will also be more intelligent on average. Population regression function (PFR) The conditional mean independence assumption implies that This means that the average value of the dependent variable can be expressed as a linear function of the explanatory variable 3

Population regression function For individuals with average value of is, the In order to estimate the regression model one needs data A random sample of observations First observation Second observation Third observation n-th observation Value of the explanatory variable of the i-th observation Value of the dependent variable of the i-th observation 4

Fit as good as possible a regression line through the data points: For example, the i-th data point Fitted regression line What does "as good as possible" mean? Regression residuals Minimize sum of squared regression residuals Ordinary Least Squares (OLS) estimates 5

CEO Salary and return on equity Salary in thousands of dollars Return on equity of the CEO s firm Fitted regression Intercept Causal interpretation? If the return on equity increases by 1 percent, then salary is predicted to change by 18,501 $ Fitted regression line (depends on sample) Unknown population regression line 6

Wage and education Hourly wage in dollars Years of education Fitted regression Intercept Causal interpretation? In the sample, one more year of education was associated with an increase in hourly wage by 0.54 $ Voting outcomes and campaign expenditures (two parties) Percentage of vote for candidate A Percentage of campaign expenditures candidate A Fitted regression Intercept Causal interpretation? If candidate A s share of spending increases by one percentage point, he or she receives 0.464 percentage points more of the total vote 7

Properties of OLS on any sample of data Fitted values and residuals Fitted or predicted values Deviations from regression line (= residuals) Algebraic properties of OLS regression Deviations from regression line sum up to zero Correlation between deviations and regressors is zero Sample averages of y and x lie on regression line For example, CEO number 12 s salary was 526,023 $ lower than predicted using the the information on his firm s return on equity 8

Goodness-of-Fit "How well does the explanatory variable explain the dependent variable?" Measures of Variation Total sum of squares, represents total variation in dependent variable Explained sum of squares, represents variation explained by regression Residual sum of squares, represents variation not explained by regression Decomposition of total variation Total variation Explained part Unexplained part Goodness-of-fit measure (R-squared) R-squared measures the fraction of the total variation that is explained by the regression 9

CEO Salary and return on equity The regression explains only 1.3 % of the total variation in salaries Voting outcomes and campaign expenditures The regression explains 85.6 % of the total variation in election outcomes Caution: A high R-squared does not necessarily mean that the regression has a causal interpretation! Incorporating nonlinearities: Semi-logarithmic form Regression of log wages on years of eduction Natural logarithm of wage This changes the interpretation of the regression coefficient: Percentage change of wage if years of education are increased by one year 10

Fitted regression The wage increases by 8.3 % for every additional year of education (= return to education) For example: Growth rate of wage is 8.3 % per year of education Incorporating nonlinearities: Log-logarithmic form CEO salary and firm sales Natural logarithm of CEO salary Natural logarithm of his/her firm s sales This changes the interpretation of the regression coefficient: Percentage change of salary if sales increase by 1 % Logarithmic changes are always percentage changes 11

CEO salary and firm sales: fitted regression For example: + 1 % sales! + 0.257 % salary The log-log form postulates a constant elasticity model, whereas the semi-log form assumes a semi-elasticity model Expected values and variances of the OLS estimators The estimated regression coefficients are random variables because they are calculated from a random sample Data is random and depends on particular sample that has been drawn The question is what the estimators will estimate on average and how large their variability in repeated samples is 12

Standard assumptions for the linear regression model Assumption SLR.1 (Linear in parameters) In the population, the relationship between y and x is linear Assumption SLR.2 (Random sampling) The data is a random sample drawn from the population Each data point therefore follows the population equation Discussion of random sampling: Wage and education The population consists, for example, of all workers of country A In the population, a linear relationship between wages (or log wages) and years of education holds Draw completely randomly a worker from the population The wage and the years of education of the worker drawn are random because one does not know beforehand which worker is drawn Throw back worker into population and repeat random draw times The wages and years of education of the sampled workers are used to estimate the linear relationship between wages and education 13

The values drawn for the i-th worker The implied deviation from the population relationship for the i-th worker: Assumptions for the linear regression model (cont.) Assumption SLR.3 (Sample variation in explanatory variable) Assumption SLR.4 (Zero conditional mean) The values of the explanatory variables are not all the same (otherwise it would be impossible to study how different values of the explanatory variable lead to different values of the dependent variable) The value of the explanatory variable must contain no information about the mean of the unobserved factors 14

Theorem 2.1 (Unbiasedness of OLS) Interpretation of unbiasedness The estimated coefficients may be smaller or larger, depending on the sample that is the result of a random draw However, on average, they will be equal to the values that characterize the true relationship between y and x in the population "On average" means if sampling was repeated, i.e. if drawing the random sample und doing the estimation was repeated many times In a given sample, estimates may differ considerably from true values Variances of the OLS estimators Depending on the sample, the estimates will be nearer or farther away from the true population values How far can we expect our estimates to be away from the true population values on average (= sampling variability)? Sampling variability is measured by the estimator s variances Assumption SLR.5 (Homoskedasticity) The value of the explanatory variable must contain no information about the variability of the unobserved factors 15

Graphical illustration of homoskedasticity The variability of the unobserved influences does not dependent on the value of the explanatory variable An example for heteroskedasticity: Wage and education The variance of the unobserved determinants of wages increases with the level of education 16

Theorem 2.2 (Variances of OLS estimators) Under assumptions SLR.1 SLR.5: Conclusion: The sampling variability of the estimated regression coefficients will be the higher the larger the variability of the unobserved factors, and the lower, the higher the variation in the explanatory variable Estimating the error variance The variance of u does not depend on x, i.e. is equal to the unconditional variance One could estimate the variance of the errors by calculating the variance of the residuals in the sample; unfortunately this estimate would be biased An unbiased estimate of the error variance can be obtained by substracting the number of estimated regression coefficients from the number of observations 17

Theorem 2.3 (Unbiasedness of the error variance) Calculation of standard errors for regression coefficients Plug in for the unknown The estimated standard deviations of the regression coefficients are called "standard errors". They measure how precisely the regression coefficients are estimated. 18