Economics 300 Econometrics Econometric Approaches to Causal Inference: Variables Dennis C. Plott University of Illinois at Chicago Department of Economics www.dennisplott.com Fall 2014 Dennis C. Plott (UIC) ECON 300 Fall 2014 1 / 21
Overview: Difference-in-Difference & Variables Data Problem Difference-in-Difference (DID) Randomized experiment or natural experiment We cannot observe the counterfactual (what if treatment group had not received treatment) Observational data omitted variable bias, reverse causality Dennis C. Plott (UIC) ECON 300 Fall 2014 2 / 21
Review: Regression Assumptions One of the assumptions of the error term (u) in a regression analysis is that the error must be independent and identically distributed (i.i.d.). Error variance is the same for all values; i.e., homoskedasticity. Error is not related to other error values; i.e., no serial correlation. Error is normally distributed. When the independent variable is correlated with unobservable error; i.e., zero-conditional mean assumption is violated. Four reasons why this assumption might be violated: 1. Omitted variable bias: when an unobservable variable is capturing some of the dependent variable and this unobservable variable is not in your model. Instead, the variables you have included are picking up some of the unobserved and the unobserved needs to be accounted for on it s own. In other words, there are other variables that can explain the outcome measure and your variable is picking up some of this explanation (omitted variable bias). 2. Measurement error: causation is not determined due to error in the collection of the data. 3. Reverse Causality: direction of causality is not determined. 4. Selection bias: selection of individuals, groups, or entities for analysis such that proper randomization is not achieved, thereby ensuring the sample obtained is not representative of the population intended to be analyzed. Dennis C. Plott (UIC) ECON 300 Fall 2014 3 / 21
Review: Endogeneity When an independent variable correlates with unobservable error we call this endogeneity. Endogenous variables: variables that are correlated with error term. You cannot say that the independent variables cause the dependent variable. Often the factors that affect an outcome depend on that outcome (reverse causality). Example The more passes Chicago Bears quarterback Jay Cutler makes, the lower the percentage of wins for the Chicago Bears. Does an increase in pass attempts that Cutler makes cause the Bears to lose? Or does the loss of the game and the fact that teammates are not making plays cause Cutler to make more passes? Sometimes in a linear model some of the variables are endogenous, meaning the regressors or independent variables are correlated with the error term. An exogenous variable or instrument can fix endogeneity. These variables are correlated with the regressors, but are uncorrelated with the error term. We call these exogenous variables instruments. Dennis C. Plott (UIC) ECON 300 Fall 2014 4 / 21
IV: Basic Idea Causality is difficult to prove, even in experimental research. However, we can t always randomize, create a true experiment, or have a natural experiment to exploit. Suppose we want to estimate a treatment effect using observational data The OLS estimator is biased and inconsistent (due to correlation between regressor and error term) if there is omitted variable bias selection bias reverse causality If a direct solution (e.g., including the omitted variable) is not available, instrumental variables regression offers an alternative way to obtain a consistent estimator Dennis C. Plott (UIC) ECON 300 Fall 2014 5 / 21
IV: Basic Idea (Continued) Consider the following regression model: y i = β 0 + β 1 x i + u i Variation in the endogenous regressor x i has two parts 1. the part that is uncorrelated with the error ( good variation) 2. the part that is correlated with the error ( bad variation) The basic idea behind instrumental variables regression is to isolate the good variation and disregard the bad variation. Dennis C. Plott (UIC) ECON 300 Fall 2014 6 / 21
IV: History Historically IV has mostly been used by economists and statisticians (Angrist and Kreuger, 2001). Philip G. Wright (econometrician) vs. Sewell Wright (biologist) (Wright, 1928). Philip had written about the problem of endogenous variation in previous papers. Sewell had discovered the use of an instrument, but the variables were already exogenous, so the analysis was unnecessary. Stylometric analysis of their writing (Stock and Trebbi, 2003) Authors found Philip to be the writer and founder of IV 1940 s IV was rediscovered 1953 Theil introduced the two-stage least squares method for computing IV Dennis C. Plott (UIC) ECON 300 Fall 2014 7 / 21
IV: Conditional for a Valid Instrument The first step is to identify a valid instrument. A variable z i is a valid instrument for the endogenous regressor x i if it satisfies two conditions: 1. Relevance: corr(z i,x i ) 0 2. Exogeneity (Exclusion Restriction): corr(z i,u i ) = 0! Dennis C. Plott (UIC) ECON 300 Fall 2014 8 / 21
IV: Two-Stage Least Squares (2SLS) The most common IV method is two-stage least squares (2SLS). Stage 1: Decompose x i into the component that can be predicted by z i and the problematic component. x i = α 0 + α 1 z i + v i Stage 2: Use the predicted value of x i from the first-stage regression to estimate its effect on y i. y i = β 0 + β 1 ˆx i + u i Note: software packages like Stata perform the two stages in a single regression, producing the correct standard errors. Dennis C. Plott (UIC) ECON 300 Fall 2014 9 / 21
IV: Example Levitt (1997) Levitt (1997) 1 what is the effect of increasing the police force on the crime rate? This is a classic case of simultaneous causality (high crime areas tend to need large police forces) resulting in an incorrectly-signed (positive) coefficient. To address this problem, Levitt uses the timing of mayoral and gubernatorial elections as an instrumental variable. Is this instrument valid? Relevance: 1 Levitt, Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crme, American Economic Review, June 1997 Dennis C. Plott (UIC) ECON 300 Fall 2014 10 / 21
IV: Example Levitt (1997) Levitt (1997) 1 what is the effect of increasing the police force on the crime rate? This is a classic case of simultaneous causality (high crime areas tend to need large police forces) resulting in an incorrectly-signed (positive) coefficient. To address this problem, Levitt uses the timing of mayoral and gubernatorial elections as an instrumental variable. Is this instrument valid? Relevance: police force increases in election years Exogeneity (Exclusion Restriction): 1 Levitt, Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crme, American Economic Review, June 1997 Dennis C. Plott (UIC) ECON 300 Fall 2014 10 / 21
IV: Example Levitt (1997) Levitt (1997) 1 what is the effect of increasing the police force on the crime rate? This is a classic case of simultaneous causality (high crime areas tend to need large police forces) resulting in an incorrectly-signed (positive) coefficient. To address this problem, Levitt uses the timing of mayoral and gubernatorial elections as an instrumental variable. Is this instrument valid? Relevance: police force increases in election years Exogeneity (Exclusion Restriction): election cycles are pre-determined 1 Levitt, Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crme, American Economic Review, June 1997 Dennis C. Plott (UIC) ECON 300 Fall 2014 10 / 21
IV: Example Levitt (1997) (Continued) Two-stage least squares: Stage 1: Decompose police hires into the component that can be predicted by the electoral cycle and the problematic component police i = α 0 + α 1 election i + v i Stage 2: Use the predicted value of police i from the first-stage regression to estimate its effect on crime i crime i = β 0 + β 1 policei + u i Finding: an increased police force reduces violent crime, but has little effect on property crime Dennis C. Plott (UIC) ECON 300 Fall 2014 11 / 21
IV: Example Angrist (1990) Effect of military service on future earnings (Angrist, 1990). Military service is endogenous. Does the military cause a soldier s future earnings to be a certain amount when he or she leaves the service? Or are there certain characteristics of those that join the military that influence future earnings? An individual s choice to enter the service might be indicative of the individual s expected future earnings. There are some individuals that choose to go into the military because their expected future earnings are low. Therefore, their enrollment is related to the fact that those that join the service might on average have lower future earnings. Also, veterans have certain observed and unobserved characteristics that affect their decision to enroll and these could be related to earnings. Dennis C. Plott (UIC) ECON 300 Fall 2014 12 / 21
IV: Example Angrist (1990) (Continued) Joshua Angrist s 1990 work. He analyzed the difference in earnings between veterans and non-veterans. But analyzing this difference does not tell us the causal impact of military service on future earnings. A young person s decision to enter the military could be affected by his/her expectations of future earnings. This is an endogeneity problem: does military service affect future earnings or does the prospect of future earnings affect the decision to enter the military? Veterans have observed and unobserved characteristics that affect their reason for entering the military. We cannot control for the unobserved characteristics. Dennis C. Plott (UIC) ECON 300 Fall 2014 13 / 21
IV: Example Angrist (1990) He used the Vietnam draft lottery as an instrument (exogenous variable). The draft lottery is correlated with serving in the military. The draft lottery is only correlated with future earnings of military personnel through enrollment in the military. Since determining earnings is dependent on other things such as expected earnings, Angrist (1990) used the Vietnam draft as an instrument. It is correlated with entering the service, but is not correlated with earnings. The draft system is exogenous. Problem What about those who were drafted and avoided the draft? Or those who were not drafted, but felt compelled to fight anyway? Solution The IV method recognizes that those described previously cannot be included in the sample. It is not an average treatment effect for the whole sample, but is a local average treatment effect (LATE) Military earnings example only tells you the treatment effect on those who pulled a bad number and served and those who pulled a good number and did not serve. Therefore, we are only measuring a treatment effect for compliers, which makes this method less generalizable. Dennis C. Plott (UIC) ECON 300 Fall 2014 14 / 21
How to Find a Good Instrument The biggest challenge in an IV analysis is finding a credible instrument; i.e., a z that is correlated with x but not y (other than via x). Common sources of instruments include Nature: geography, weather, biology in which a truly random source of variation influences x (no possible reverse causation) History: things determined a long time ago, which were possibly endogenous contemporaneously, but which no longer plausibly influence y Institutions: formal or informal rules that influence the assignment of x in a way unrelated to y Above all, finding a good IV is based on deep substantive knowledge of the processes shaping x and y. Fancy econometrics will not help you if you don t have this substantive knowledge. Dennis C. Plott (UIC) ECON 300 Fall 2014 15 / 21
Acemoglu, Johnson, and Robinson (2001) Daron Acemoglu, Simon Johnson, and James A. Robinson, 2001. The Colonial Origins of Comparative Development: An Empirical Investigation, American Economic Review, American Economic Association, vol. 91(5), pages 1369 1401, December. Question: Do institutions affect economic performance? Basic Story: 1. (potential) settler mortality; (2.) settlements; (3.) early institutions; (4.) current institutions; (5.) current performance Endogeneity Concern: Economic prosperity could lead to better institutions Outcome Variable (y): log of GDP per capita today Endogenous Variable (x): Contemporary institutional quality (protection against appropriation) Variable (z): European settler mortality Dennis C. Plott (UIC) ECON 300 Fall 2014 16 / 21
Acemoglu, Johnson, and Robinson (2001) Settler mortality is used as an instrument for current institutions Instrument relevance: Estimates of settler mortality must affect contemporary institutions. The exclusion restriction: Mortality rates of European settlers more than 100 years ago have no effect on current income per capita other than via correlation with institutions. Data 75 total colonized nations; 64 observations with complete data Data on soldier, bishop, sailor mortality rates as indicator of mortality rates that European settlers should expect to encounter Risk of expropriation measures differences in institutions originating from different state policies Economic performance income per capita; GDP per capita Dennis C. Plott (UIC) ECON 300 Fall 2014 17 / 21
Miguel, Satyanath, and Sergenti (2004) Edward Miguel, Shanker Satyanath, and Ernest Sergenti, 2004. Economic Shocks and Civil Conflict: An Variables Approach, Journal of Political Economy, University of Chicago Press, vol. 112(4), pages 725 753, August. Question: Do economic conditions affect civil conflict? Endogeneity Concern: Civil war is bad for the economy Outcome Variable (y): Dummy variable for civil conflict with more than twenty-five battle deaths Endogenous Variable (x): Per capital income growth Variable (z): Change in rainfall Dennis C. Plott (UIC) ECON 300 Fall 2014 18 / 21
Miguel, Satyanath, and Sergenti (2004) Motivation: Civil wars have resulted in 3x as many deaths as wars between states since WWII. What causes civil wars? Theory: Economic conditions promote/inhibit civil conflict by one of two mechanisms Opportunity costs: in a bad economy, the returns to taking up arms relative to economic activities are greater (Collier and Hoeffler) State capacity: In a bad economy, the state (military) is weaker, making it more difficult to repress insurgents (Fearon and Laitin) This paper won t try to distinguish between the two mechanisms Estimand: The effect of economic conditions on the probability of civil war (comparative static) Dennis C. Plott (UIC) ECON 300 Fall 2014 19 / 21
Miguel, Satyanath, and Sergenti (2004) The identification problem Because civil conflict negatively affects economic performance (e.g., GDP), we cannot simply regress war on GDP; i.e., endogeneity (reverse causation). It s also likely that the lead-up to civil war (expectations) hurt the economy, so simply lagging [more on what this means in the time series section of the course] GDP won t suffice either Econometric strategy Use weather (rainfall) as an instrument for GDP Weather strongly predicts GDP in countries that rely on rain fed agriculture (no irrigation) and are prone to drought Works for sub-saharan Africa, may not work elsewhere Exclusion restriction: weather should not affect likelihood of conflict except through its influence on economic growth Dennis C. Plott (UIC) ECON 300 Fall 2014 20 / 21
IV: Problems, Limitations, and Advantages Problems and Limitations Advantage It can be incredibly difficult to find an instrument that is both relevant (not weak) and exogenous. IV can be difficult to explain to those who are unfamiliar with it. LATE; only generalizable to those that benefit from the instrument. Can be used to estimate a causal relationship when randomization is not applicable. Dennis C. Plott (UIC) ECON 300 Fall 2014 21 / 21