Quasi-Experimental Methods Technical Track East Asia Regional Impact Evaluation Workshop Seoul, South Korea Joost de Laat, World Bank
Randomized Assignment IE Methods Toolbox Discontinuity Design Difference-in- Differences Matching
Anti-poverty Programs Pensions Education Agriculture Discontinuity Design Many social programs select beneficiaries using an index or score: Targeted to households below a given poverty index/income Targeted to population above a certain age Scholarships targeted to students with high scores on standarized text Fertilizer program targeted to small farms less than given number of hectares)
Example: Effect of scholarship program on school attendance Goal Improve school attendance for poor students Method o Households with a score (Pa) of assets 50 are poor o Households with a score (Pa) of assets >50 are not poor Intervention Poor households receive scholarships to send children to school
Enrollment enrolled 0.2.4.6.8 1 POOR NON POOR 0 10 20 30 40 50 60 70 80 90 100 score
Enrollment enrolled 0.2.4.6.8 1 POOR NON POOR 0 10 20 30 40 50 60 70 80 90 100 score
Regression Discontinuity Design-Baseline Eligible Not eligible
Regression Discontinuity Design-Post Intervention IMPACT
For a Discontinuity Design you need 1) Continuous eligibility index (e.g. income) 2) Clearly defined cut-off. Households with a score cutoff are eligible Households with a score > cutoff are not-eligible Or vice-versa
Example Progresa CCT Eligibility for Progresa is based on national poverty index Household is poor if score 750 Eligibility for Progresa: o Eligible if score 750 o Not eligible if score > 750
Example of Progresa Score vs. consumption at Baseline-No treatment 379.224 Consumption Fitted values Fitted values 153.578 276 1294 puntaje estimado en focalizacion Poverty Index
Example of Progresa Score vs. consumption post-intervention period-treatment 399.51 Consumption Fitted values Fitted values 30.58** Estimated impact on consumption (Y) 183.647 276 1294 puntaje estimado en focalizacion Poverty Index (**) Significant at 1%
Example Cambodia CCT Eligibility is based on an index of the likelihood of dropping out of school. 2 cutoff points within each school: Applicants with the highest dropout risk offered US $60 per year scholarship Applicants with intermediate dropout risk offered US $45 per year scholarship Applicants with low dropout risk were not offered scholarship by the program No Scholarship US$ 45 scholarship US$ 60 scholarship Cutoff 1 Cutoff 2 Likelihood of dropping out of school
Large impact on US $45 scholarship No scholarship versus $45 $60 versus $45 scholarship 1 1 0.8 0.8 Estimate of impact Probability 0.6 0.4 Estimate of impact Probability 0.6 0.4 0.2 0.2 0-25 -15-5 5 15 25 Relative ranking 0-25 -15-5 5 15 25 Relative ranking Recipients Non-recipients Recipients Non-recipients Source: Filmer, and Schady. 2011. Does More Cash in Conditional Cash Transfer Programs Always Lead to Larger Impacts on School Attendance?, Journal of Development Economics
Sharp and Fuzzy Discontinuity Sharp discontinuity The discontinuity precisely determines treatment Equivalent to random assignment in a neighborhood Fuzzy discontinuity Discontinuity is highly correlated with treatment. Use the assignment to estimate the probability of enrollment for the program
Identification for sharp discontinuity y i = β 0 + β 1 D i + δ(score i ) + ε i D i = 1 If household i receives transfer 0 If household i does not receive transfer δ(score i ) = Function that is continuous around the cut-off point Assignment rule under sharp discontinuity: D i = 1 D i = 0 score i 50 score i > 50
Identification for fuzzy discontinuity y i = β 0 + β 1 D i + δ(score i ) + ε i D i = 1 If household receives transfer 0 If household does not receive transfer However, some who are not ineligible take up the program. Some who are eligible do not. The reason why they do this could be correlated with the outcome of interest.
Estimation for fuzzy discontinuity In a first regression, use the score to predict whether individual takes up program or not. D i = γ 0 + γ 1 I(score i > 50) + η i Dummy variable In the second equation, use this predicted value for enrollment rather than actual enrollment. ^ y i = β 0 + β 1 D i + δ(score i ) + ε i Continuous function
Example Social assistance to the unemployed: o Low social assistance payments to individuals under 30 o Higher payments for individuals over 30 What is the effect of increased social assistance on employment? Lemieux & Milligan, 2008
Advantages of RDD for evaluation Yields an unbiased estimate of treatment effect at the discontinuity Can take advantage of a known rule for assigning the benefit o o This is common in the design of social interventions No need to exclude a group of eligible households/ individuals from treatment
Potential disadvantages of RDD Local average treatment effects: We estimate the effect of the program around the cut-off point This is not always generalizable Power. The effect is estimated at the discontinuity, so we generally have fewer observations than in a randomized experiment with the same sample size. Specification can be sensitive to functional form. Make sure the relationship between the assignment variable and the outcome variable is correctly modeled, including: (1) Nonlinear Relationships and (2) Interactions.
False RDD
Keep in Mind Discontinuity Design Requires continuous eligibility criteria with clear cut-off. Gives unbiased estimate of the treatment effect: Observations just across the cut-off are good comparisons. No need to exclude a group of eligible households/ individuals from treatment. Can sometimes use it for programs that already ongoing.
IE Methods Randomized Assignment Toolbox Discontinuity Design Difference-in- Differences Matching
Differences-in-Difference- Outline 1. What is Differences-in-Differences (diff-in-diff)? 2. Weaknesses 3. Test for strength of internal validity 4. When to use
What is Differences-differences? (diff-in-diff) Compare the change in outcomes for those that enrolled in the program with the change in outcomes for those that did not enroll in the program. If we can not randomize, can we try to mimic randomization? Natural Experiments: unexpected change in policy, natural disasters. Exploit variation of policies in time and space
Group affected by the policy change (treatment) Group that is not affected by the policy change (comparison) After the program start Before the program start Difference Y 1 D i =1 Y 1 D i =0 Y 0 D i =1 Y 0 D i =0 (Y 1 D=1)-(Y 0 D=1) (Y 1 D=0)-(Y 0 D=0) DD=[(Y 1 D=1)-(Y 0 D=1)] - [(Y 1 D=0)-(Y 0 D=0)]
Difference-in-differences (Diff-in-diff) Y=School attendance P=Girls scholarship program Enrolled (T) After (1) 0.74 0.81 - - Before (0) 0.60 0.78 Not Enrolled (C) - = Difference +0.14 +0.03 0.11 Diff-in-Diff: Impact=(Y T1 -Y T0 )-(Y C1 -Y C0 )
Impact =(A-B)-(C-D)=(A-C)-(B-D) School Attendance Not enrolled Enrolled D=0.78 C=0.81 A=0.74 Impact=0.11 B=0.60 Similar trends before the program t=0 t=1 Time
Example of Progresa Follow-up (t=1) Consumption (Y) Baseline (t=0) Consumption (Y) Enrolled Not Enrolled Difference 268.75 290-21.25 233.47 281.74-48.27 Difference 35.28 8.26 27.02 Estimated Impact on Consumption (Y) Linear Regression 27.06** Multivariate Linear Regression 25.53** Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).
Progresa Policy Recommendation? Impact of Progresa on Consumption (Y) Case 1: Before & After 34.28** Case 2: Enrolled & Not Enrolled -4.15 Case 3: Randomized Assignment 29.75** Case 4: Randomized Promotion 30.4** Case 5: Discontinuity Design 30.58** Case 6: Difference-in-Differences 25.53** Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).
Regression (for 2 time periods) Y ii = α + β tttt + γ D i + δ tttt D i + ε ii y: outcome D i : treatment dummy tttt D i : if treatment in second period DD=[(Y 1 D=1)-(Y 0 D=1)] - [(Y 1 D=0)-(Y 0 D=0)]
Conditional Expectation Y ii = α + β tttt + γ D i + δ tttt D i + ε ii E(Y ii D i = 1) = +β + γ + δ E Y ii D i = 1 = +γ E Y ii D i = 0 = +β E Y ii D i = 0 = Treatment Group Control Group
Conditional Expectation Y ii = α + β tttt + γ D i + δ tttt D i + ε ii E Y ii Y ii D i = 1 E Y ii Y ii D i = 0 =(change in Y for treatment ) (change in Y for control) β + δ (β) = δ
If we have more than 2 time periods/groups Regression with fixed effects for time and group Y ii = +λ t + θ i + δ D ii + ε ii D ii indicates treatment in group i and period t λ t : year dummies θ t : group dummies
Differences-in-Difference- Outline 1. What is Differences-in-Differences (diffin-diff)? 2. Weaknesses 3. Test for strength of internal validity 4. When to use
Problem I: Common trends or shocks across groups Diff-in-Diff only valid if both groups would have had similar trends without the program. Then the change in observed outcomes for those not enrolled is a good counterfactual What if attendance for those enrolled would have increased by more than those not enrolled in any case? VIOLATION OF EQUAL TRENDS!
Same Trend School Attendance D=0.78 C=0.81 A=0.74 B=0.60 Similar trends before the program T=0 T=1 Time
Different Trend School Attendance Different trends before the program D=0.78 B=0.60 C=0.81 A=0.74 Diff-in-Diff cannot measure the impact of the program T=0 T=1 Time
What if an event affects only one group? Case 1: Training program Only highly motivated people participated in the program A new company is opened in the village and only the more motivated people apply for a job. Job prospects for those in the treatment would have improved even in the absence of the training program. DD overestimate the effect of the program Case 2: Subsidies on fertilizer (weather shocks) Treatment group: subsidized farmers. Control group: Unsubsidized farmers. Drought severely affects farmers that use fertilizer. DD underestimates the effect of the program
. Problem 2: Changes in group composition over time Diff-in-Diff requires that we follow the same types of people over time. For example, all the healthy people drop out of a healthcare program, because they don t need the treatment. So average health outcomes for those in the program is lower at the end of the program DD underestimates the effect of the program For example, all the sick people drop out of a health-care program, because they cannot walk to the clinic. DD overestimates the effect of the program
Considerations If program impact is heterogeneous across individual characteristics, pre-treatment differences in observed characteristics can create non-parallel outcome dynamics (Abadie, 2005). Similarly, bias would occur when the size of the response depends in a non-linear way on the size of the intervention, and we compare a group with high treatment intensity, with a group with low treatment intensity When outcomes within the unit of time/group are correlated, OLS standard errors understate the st. dev. of the DD estimator (Bertrand et al., 2004).
Differences-in-Difference- Outline 1. What is Differences-in-Differences (diffin-diff)? 2. Weaknesses 3. Test for strength of internal validity 4. When to use
Test for Trend School Attendance To test this, at least 3 observations in time are needed: o 2 observations before o 1 observation after. Before treatment t=-1 Before treatment t=0 After treatment t=1 Time
Sensitivity Analysis Placebo Test: Use a fake treatment group Should have no impact Use a different comparison group. Should still have an impact. Use a different outcome which should not be affected by the program. Should have no impact
Example Schooling and labor market consequences of school construction in Indonesia: evidence from an unusual policy experiment Esther Duflo, MIT American Economic Review, Sept 2001
School infrastructure Research questions Educational achievement? Educational achievement Salary level? What is the economic return on schooling?
Program description 1973-1978: The Indonesian government built 61,000 schools equivalent to one school per 500 children between 5 and 14 years old The enrollment rate increased from 69% to 85% between 1973 and 1978 Assignment rule -> The number of schools built in each region depended on the number of children out of school in those regions in 1972, before the start of the program.
Identification of the treatment effect There are 2 sources of variations in the intensity of the program for a given individual: By region There is variation in the number of schools received in each region. By age o o Children who were older than 12 years in 1972 did not benefit from the program. The younger a child was 1972, the more it benefited from the program because she spent more time in the new schools.
Sources of data 1995 population census. Individual-level data on: o birth date o 1995 salary level o 1995 level of education The intensity of the building program in the birth region of each person in the sample. Sample: men born between 1950 and 1972.
A first estimation of the impact Step 1: Let s simplify the problem and estimate the impact of the program. We simplify the intensity of the program: high or low We simplify the groups of children affected by the program o Young cohort of children who benefitted o Older cohort of children who did not benefit
Let s look at the average of the outcome variable years of schooling Intensity of the Building Program Age in 1974 High Low 2-6 (young cohort) 12-17 (older cohort) 8.49 9.76 8.02 9.4 Difference 0.47 0.36 0.12 DD (0.089)
Let s look at the average of the outcome variable years of schooling Intensity of the Building program Age in 1974 High Low Difference 2-6 (young cohort) 12-17 (older cohort) 8.49 9.76-1.27 8.02 9.4-1.39 0.12 DD (0.089)
Idea: o o Placebo DD (Cf. p.798, Table 3, panel B) Look for 2 groups whom you know did not benefit, compute a DD, and check whether the estimated effect is 0. If it is NOT 0, we re in trouble Intensity of the Building Program Age in 1974 High Low 12-17 8.02 9.40 18-24 7.70 9.12 Difference 0.32 0.28 0.034 DD (0.098)
Step 2: Let s estimate this with a regression S = c+ α + β + γ.( PT. ) + δ.( C. T) + ε i ijk j k j i j i ijk with S P T ijk j j ijk = education level of person i in region j in cohort k = 1 if the person was born in a region with a high program intensity = 1 if the person belongs to the "young" cohort C j = dummy variable for region j βk = cohort fixed-effect α = district of birth fixed-effect ε = error term for person i in region j in cohort k
Step 3: Let s use additional information We will use the intensity of the program in each region: S = c+ α + β + γ.( PT. ) + δ.( C. T) + ε ijk j k j i j i ijk where P C j j = the intensity of building activity in region j = a vector of regional characteristics We estimate the effect of the program for each cohort separately: 23 23 S c α β γ.( P. d) δ CT ε ijk j k l j i l j i ijk l= 2 l= 2 where = + + + + + d i = a dummy variable for belonging to cohort i
Program effect per cohort γ l Age in 1974
For y = Dependent variable = Salary
Conclusion Results: For each school built per 1000 students; o The average educational achievement increase by 0.12-0.19 years o The average salaries increased by 2.6 5.4 % Making sure the DD estimation is accurate: o A placebo DD gave 0 estimated effect o Use various alternative specifications o Check that the impact estimates for each age cohort make sense.
Keep in Mind! Difference-in-Differences Combines Enrolled & Not Enrolled with Before & After. Slope: Generate counterfactual for change in outcome FUNDAMENTAL ASSUMPTION Trends slopes- are the same in treatments and comparisons To test this, at least 3 observations in time are needed: o 2 observations before o 1 observation after.
IE Methods Randomized Assignment Toolbox Discontinuity Design Difference-in- Differences Matching
Matching The group that enrolled is, on average, different the group that did not enroll However, some individuals are similar. So, can match similar individuals with each other
ENROLLED NOT ENROLLED VERY POOR POOR RICH VERY RICH
Compare Outcomes for Similar People ENROLLED Y NOT ENROLLED VERY POOR POOR RICH 2 3 4 1 2 3 VERY RICH 5 4
More Complicated in Practice Match on all observable characteristics (e.g. income, gender, education ) Comparison group: non-participants with similar characteristics Create one aggregate Propensity Score to match: Compute everyone s probability of participating, based on their observable characteristics. Choose matches that have the same probability of participation as the treatments.
Density of propensity scores Density Non-Participants Participants Common Support 0 Propensity Score 1
Estimation strategy Predict the propensity scores for participants and nonparticipants. If participation status is binary, run a limited dependent variable regression and predict participation status for all units. Common support: Restrict the analysis to participants with P(X) s which are identical P(X) s to nonparticipants.
Estimation strategy Estimate the treatment effect for participant by finding the set of nonparticipants with P(X) s similar to that of the participant Take the difference between the outcome for the participant and the mean outcome for the similar nonparticipants. Repeat the exercise for all participants. Take the weighted average of the outcome differences across all matched participants to obtain: The average treatment effect on the matched treated. Estimate the standard error around the treatment effect for statistical inference.
Finding similar nonparticipants Different weighting functions to match nonparticipants with P(X) s similar to the P(X) of the participant: Stratification Nearest neighbor Radius Kernel
Main Problems
Problem One: Need Similar People ENROLLED NOT ENROLLED VERY POOR POOR RICH VERY RICH
Problem Two: Can Only Match on Observables MATCHING DOES NOT OVERCOME SELECTION PROBLEM! What if we can t collect data on people characteristics that are relevant for program participation and outcomes?
Summary Requirements for successful matching implementation: Data on variables that matter for participation. Common support No selection on based on unobservables. Matching can be combined with DID. Matching performed on baseline X s. DD controls for time-invariant unobservables.
Looking for a Volunteer
Case 7: Progresa Matching (P-Score) Baseline Characteristics Estimated Coefficient Probit Regression, Prob Enrolled=1 Head s age (years) -0.022** Spouse s age (years) -0.017** Head s education (years) -0.059** Spouse s education (years) -0.03** Head is female=1-0.067 Indigenous=1 0.345** Number of household members 0.216** Dirt floor=1 0.676** Bathroom=1-0.197** Hectares of Land -0.042** Distance to Hospital (km) 0.001* Constant 0.664** Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).
Case 7: Progresa Common Support Density: Pr (Enrolled) Density: Pr (Enrolled) Density: Pr (Enrolled) Pr (Enrolled)
Case 7: Progresa Matching (P-Score) Estimated Impact on Consumption (Y) Multivariate Linear Regression 7.06+ Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**). If significant at 10% level, we label impact with +
When to use Use when selection into program participation status is based on observable variables. Requirements Understand which variables matter for participation in the program (e.g., program rules) Data available on these variables that matter prior to units becoming participants or nonparticipants (baseline data). Common Support Best when combined with diff-diff Key assumption: There are no remaining unobservable differences between participants and nonparticipants
Keep in Mind! Matching Requires large samples and good quality data. Matching at baseline can be very useful: o o Know the assignment rule and match based on it combine with other techniques (i.e. diff-in-diff) Ex-post matching is risky: o o If there is no baseline, be careful! matching on endogenous expost variables gives bad results.
Progresa Policy Recommendation? Impact of Progresa on Consumption (Y) Case 1: Before & After 34.28** Case 2: Enrolled & Not Enrolled -4.15 Case 3: Randomized Assignment 29.75** Case 4: Randomized Promotion 30.4** Case 5: Discontinuity Design 30.58** Case 6: Differences in Differences 25.53** Case 7: Matching 7.06+ Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**). If significant at 10% level, we label impact with +
Progresa Policy Recommendation? Impact of Progresa on Consumption (Y) Case 1: Before & After 34.28** Case 2: Enrolled & Not Enrolled -4.15 Case 3: Randomized Assignment 29.75** Case 4: Randomized Promotion 30.4** Case 5: Discontinuity Design 30.58** Case 6: Differences in Differences 25.53** Case 7: Matching 7.06+ Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**). If significant at 10% level, we label impact with +
IE Methods Randomized Assignment Discontinuity Design Difference-in- Differences Toolbox Choose Your Method Matching + Diff-in- Diff
Where Do Comparison Groups come from? The rules of program operation determine the evaluation strategy. We can almost always find a valid comparison group if: the operational rules for selecting beneficiaries are equitable, transparent and accountable; the evaluation is designed prospectively.
Choosing your IE method(s) Money Excess demand No Excess demand Targeting Timing Phased Roll-out Targeted Universal Targeted Universal 1 Randomized assignment 4 RDD 1 Randomized assignment 2 Randomized promotion 3 DD with 5 Matching 1 Random ized Assignment 4 RDD 1 Randomized assignment to phases 2 Randomized Promotion to early take-up 3 DD with 5 matching Immediate Roll-out 1 Randomized Assignment 4 RDD 1 Randomized Assignment 2 Randomized Promotion 3 DD with 5 Matching 4 RDD If less than full Take-up: 2 Randomized Promotion 3 DD with 5 Matching
Test
Q1: What is the short-coming(s) of difference-difference? A. Those enrolled in the program might have a different trend over time as those not enrolled. B. It does not have a counter-factual. C. Sample size might be too small. D. People who are different to comparison group might drop out of the program E. Both A and C F. Both A and D.
Q2 You are evaluating a school management reform program that targets poor school. You decide to perform a diff-diff, comparing target schools with schools that did not receive the program. Over the same period government deployed more teachers to poor areas. Would this overestimate or under-estimate the program? A. Over-estimate B. Under-estimate C. Neither
Q3: What is the biggest short-coming of propensity match scoring? A. Cannot match on observables characteristics B. Cannot match on unobservables characteristics C. Different trends between treatment and comparison groups.
When is it possible to do regression discontinuity design? A. When there is a continuous eligibility criteria with a clear cut-off. B. When there is a comparison group of people who do not receive the program. C. When government randomly assigns some to receive the program and some not.