Regression Discontinuity Design Aniceto Orbeta, Jr. Philippine Institute for Development Studies Stream 2 Impact Evaluation Methods (Intermediate) Making Impact Evaluation Matter Better Evidence for Effective Policies and Program September 1-3 204, ADB Headquarters
Outline Basic Characteristics Validity Test Specification Decisions Estimation methods Classic examples
Basic Characteristics There is an continuous indicator / forcing variable that can order observation units in some manner Poverty Index (e.g. PMT) Impact of development projects to communities/households above a poverty incidence threshold (e.g. Pantawid Pamilya (Philippine CCT)) Age Impact on access to public goods of discounts to senior citizens (above 60 years old) Exam scores Impact of remedial school program mandatory for children whose score is less than some cut off level on a test Impact of migration when eligibility is based on qualifying exam scores
Basic Characteristics: RDD before and after intervention Before Intervention After Intervention Impact Gertler et al. (2011) Impact Evaluation in Practice. World Bank
Validity Tests Shown mostly by scatter plots No jump in outcome before treatment. Without the intervention, there exist a smooth relationship between the outcome and the predictor of treatment assignment (plot of outcomes against running variable) No jumps in relevant covariates at cut off. Only the treatment variable should cause a jump in the outcome variable; there should be no other covariate that exhibits similar jump at the threshold or it will invalidate the design (plot of covariates vs running variable) Eligibility cannot be manipulated McCrary (2008) density test (density plot of running variable around the cutoff)
Decision to Make: Choice of functional form RDD requires a break in outcome at the threshold point Panel A Linear - Will require linear model Panel B Non-linear - Will require a non-linear model Panel C Not a RD - There is no break only a non-linear relationship
Decision to make: Choice of How wide from the threshold should be included? trade-off between bias and variance The wider BW: more bias but smaller the variance and viceversa Optimum BW i.e. optimality based on MSE Common practice: Provide several estimates at different BW to reveal bias Bandwidth (BW)
Data-Determined Optimum Bandwidth Trade-off between bias and variance Imbens and Kalyanaraman (2012) Calonico, Cattaneo, Titiunik (2014) Cross Validation (Ludwig & Miller, 2007)
Two Types of RD When treatment is determined completely by predictor / forcing variable (called sharp); if partly (called fuzzy) Outcome Y i Yi (0) if Wi 0, Yi (1) if Wi 1 Where W i is the treatment assignment Let X be the predictor and c threshold value Sharp: W i =1{X i c} ( deterministic assignment ) Fuzzy: P(W i =1 X i =x) from below c P(W i =1 X i =x) from above c rather than having treatment jumping, it s the probability of getting treatment that jumps ( probabilistic assignment )
Impact Estimates Sharp: Sharp E[ Y (1) Y (0) X c] Fuzzy: Fuzzy E[ Y (1) X c] E[ Y (0) X c] E[ W (1) X c] E[ W (0) X c] measured from above (1) and from below (0) of threshold c
Importance of Graphical Analysis Validity test are shown through graphs Specification are often revealed by a graph If there is no visible jump in the graph, chances are there is significant impact Of course, a graph cannot give us a precise numerical estimate of the impact
Estimation Global Polynomial Local Linear Local Randomization
Global Polynomial Naïve (assumes constant treatment effect with equal slope ) Y * W ( X c)*... ( X c) * p i sharp i 1 i p i Flexible p Y * W ( X c)*... ( X c) * i sharp i i i p W *( X c)*... W *( X c) * p i 1 i p i
Local-Polynomial Estimate separately a weighted regression (using kernel weights) locally (within a bandwidth, h) Left of cutoff [-h,cutoff] Right of cut-off [cutoff,h] Yi ( X i c)* _ _, i Y ( X c)*, Or combined i i i ˆ ˆ_ ˆ ( h ) sharp Y * W ( X c)* W *( X c)* i sharp i i 1 i i 1 i
Local Randomization Idea: Near the cutoff treat as if randomly assigned Find a window [-h<x<h] such that for all Xi, Wi is independent of outcome - Y(0), Y(1) Employ RCT methods in the window, i.e. test of difference in means below and above the cutoff
Classic Examples Impact of social assistance on labor market outcomes (Lemeiux and Milligan, 2008) (Sharp RD) Impact of Class Size on Achievement (Angrist and Lavy, 1999) (Sharp and Fuzzy RD)
Example 1 Impact of social assistance on labor market outcomes Lemeiux, T. and K. Milligan (2008) Incentive effects of social assistance: A regression discontinuity approach Journal of Econometrics (also NBER Working Paper 10541)
Evaluation Issue: What is the incentive effect of social assistance? What impact of social assistance on labor market behavior? Evaluation Model: Output Social assistance rule Intermediate outcomes Amount of social assistance Final outcomes Labor market outcome
Background - Before 1989, childless social assistance recipients in Quebec under age 30 received much lower benefits than recipient over age 30 - Used this policy rule to estimate the effects of social assistance on labor market outcomes
Social Assistance benefits for under 30 and over 30, 1980-1993 Benefits are indeed very different before 1989 between those under 30 and 30 and above
Data - Canadian census a detailed questionnaire (long form) is assigned to approximately 20% of households, consisting of questions on labor market characteristics and participation, education, income, and the demographics of respondents - From the 20% sample, obtained samples of men around 3,000 high school dropouts for each age group, keep only men without children for each age group around the discontinuity at age 30 - Employment data employment status during the reference period, hours worked - Labor force survey provide too small number of sample per age group; used to provide data on the labor market context
3-year moving average of employment rate by agegroup (25-29; 30-34), 1976-1997 Observations -Cyclicality of employment rates => need for a control group to separate business cycle effects from policy effects of interest -The pattern of employment for those aged 25-29 and aged 30-34 are similar => labor market conditions are similar to the two groups -Quebec has lower employment rate compared to rest of Canada
Empirical Model Y 1 TREAT ( a) ia o ia ia Where Y ia =outcome for individual i of age a δ(a)=effect of age on outcome variable TREAT=treatment dummy 0 if a<30 and 1 if age>=30 β 1 is the parameter of interest
Key identifying assumption δ(.) is a smooth (continuous) function We have a sharp RD design since the treatment variable is a deterministic function of the regression variable (age) The assumption that δ(.) is a continuous function means that differential benefits are the only discontinuity in outcomes around age 30
Threats to the validity of the assumption Employment rate by age is a well-known profile Violations can happen when Some people can find ways to cheat on their age by, for example, falsifying their birth certificates difficult to do because can be easily verified Differential benefits are only for individuals without dependent children to the extent that fertility and living arrangements decisions (live with your children or not) are endogenous, this generates a problem of non-random selection
Outcomes Employment rate during reference week Employment rate based on the fraction of weeks worked in the previous year [did not define how many weeks working constitutes employment]
Estimation Estimated a variety of the specification for the regression function δ(a) Linear Quadratic Cubic Linear spline Quadratic spline Used age-specific cell averages in the estimation Use weighted OLS using the inverse of the sampling variance as weights which is similar to having the number of observations per cell as weights
Graphical Evidence: Employment at census week, Quebec, 1986 A very strong evidence that employment drops abruptly once the individual is eligible for higher social assistance Employment also tends to trend down faster (steeper) as a function of age especially after age 30
Graphical Evidence Employment rate in previous year, Quebec, 1986 Similar patterns observable using employment rate based on weeks worked in previous year
Estimation Results RD estimate of impact of social assistance on labor market outcome, Quebec, 1986 Employment impacts are more precisely estimated (lower standard errors) by the first four models compared to quadratic spline The impact is even stronger using employment rate based on status during reference week compared to that based on weeks worked during the past year; more precisely estimated Similar estimates except for quadratic spline the estimate is from -0.038 to - 0.056 Goodness of fit test shows that even simpler models (linear and linear spline) fit the data very well fitted not significantly different from actual
Example 2 Impact of Class Size on Achievement Angrist and Lavy (1999) Using Maimonides Rule to Estimate the Effect of Class Size on Scholastic Achievement, Quarterly Journal of Economics, 144(2), 533-575
Evaluation Issue: Impact of school output on school outcomes Impact of class size on test scores Logic Model: Output Intermediate outcome Maimonides rule Class size Test score Final outcome
Background Class size in Israeli schools is capped at 40. Student in a grade with up to 40 students can expect to be in classes as large as 40, but grade with 41 students are split into two classes, grade with 81 students are split into three classes, and so on Called Maimonides Rule the rule was proposed by the medieval Talmudic scholar Maimonides
Implications of Maimonides Rule on class size Maimonides Rule implies that the predicted class size (in a given grade) assigned to a class c in schools s (m sc ) is m sc es ( es 1) int 1 40 e s =enrollment in grade Int(a)=interger part of (a) Enrollment 1-40 in a single class; enrollment 41-80 split into 2 classes; enrollment 81-120 into three classes, and so on m sc is an increasing function of e s, making it an important control
Maimonides Rule and actual class size, 5 th Grade Actual data on class size reveal, Maimonides rule was not strictly followed (or there are other factors determining class size) but it clearly is largely determined by the rule While class sizes is not in multiples of 40, class size increases with enrollment size; and drops sharply at integer multiples of 40
Class size and (reading) test scores Test scores are generally higher in school with larger enrollments (positive relationship) Showing, in part, a mirror image upand-down pattern Apparent positive correlation is partly attributable to the fact that larger schools are more likely to be located in relatively prosperous cities, while poorer schools are more likely to be located in relatively poor town outside of major urban centers
Model Y X n isc o s sc c s isc Y isc =student i test score in school s and class c, X=vector of school characteristics n sc =the size of this class c in school s Use class-level estimating equations
Fuzzy RD Implementation Fuzzy version: m sc is an instrument to n sc n X m sc s 0 sc 1 sc
OLS Estimates, 1991 With no controls show strong positive correlation between class size and achievement test scores (0.221 for reading; 0.322 for math) When the percentage disadvantage children was added as control, the estimate falls to -0.031 but insignificant; coefficient of math score remains to be positive Neither does the addition of enrollment significantly affect estimates
IV Estimate, 5 th grade With IV estimates, large class size are associated with lower test scores Impact of class size on reading without controls is -0.16 (0.04); with linear and quadratic controls for enrollment size ranges from -0.26 (0.08) and -0.28 (0.07) Impact on math scores is virtually zero without enrollment control; with linear and quadratic controls, the impact on math scores is -0.23 (0.09) to -0.261 (0.11) Even bigger impact with RD sample Note: Full sample includes everyone; discontinuity sample include only those in the vicinity of the thresholds class size +/- 5.
Strength of RDD Strong internal validity as we get near the threshold, the treatment and comparison groups are as if chosen by randomized assignment to treatment; strongest among the quasi-experimental methods Less ethical issues - no need to exclude eligible units from receiving treatment
Issues with RDD External validity impact is valid only around the threshold and not for the whole population; should not be used if policy issue is the impact on the whole population Statistical power - Requires larger sample size than the others (e.g. approx. 3 times than RCT*); there may not be enough sample around the threshold and expanding the band around the threshold weakens internal validity Functional form dependence may require additional functional form assumption to obtain credible impact as one gets farther from the threshold *Lee, H. & Munk, T. (2008), 'Using Regression Discontinuity Design in Program Evaluation', Survey Research Methods.; Schochet, P. Z. (2009), 'Statistical Power for Regression Discontinuity Designs in Education Evaluations', Journal of Educational and Behavioral Statistics 34(2), 238-266.
Basic References Lee, D. S. & Lemieux, T. (2010), 'Regression Discontinuity Designs in Economics', Journal of Economic Literature 48(2), 281-355. Imbens, G. W. & Lemieux, T. (2008), 'Regression discontinuity designs: A guide to practice', Journal of Econometrics 142(2), 615-635.
Thank You
Reproducing Angrist and Lavy (1999)
Estimate to reproduce
Data Available at http://economics.mit.edu/faculty/angrist/data1 /data/anglavy99
Preliminaries use AL1999_final5, clear lab var c_size "Enrollment lab var tipu "Percent disadvantage lab var classize "Class size lab var avgverb "Score, Reading lab var avgmath "Score, Math ** Variables generation replace avgverb= avgverb-100 if avgverb>100 replace avgmath= avgmath-100 if avgmath>100 g func1= c_size/(int((c_size-1)/40)+1) g func2= cohsize/(int(cohsize/40)+1) replace avgverb=. if verbsize==0 replace passverb=. if verbsize==0 replace avgmath=. if mathsize==0 replace passmath=. if mathsize==0 keep if 1<classize & classize<45 & c_size>5 keep if avgverb~=. * RD sample indicator g byte disc= (c_size>=36 & c_size<=45) (c_size>=76 & c_size<=85) (c_size>=116 & c_size<=125) g c_size2= (c_size^2)/100 * GENERATE TREND g trend= c_size if c_size>=0 & c_size<=40 replace trend= 20+(c_size/2) if c_size>=41 & c_size<=80 replace trend= (100/3)+(c_size/3) if c_size>=81 & c_size<=120 replace trend= (130/3)+(c_size/4) if c_size>=121 & c_size<=160
Descriptive stats. summ avgverb avgmath classize func1 tipu c_size schlcode, sep(0) Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- avgverb 2019 74.38641 7.684038 34.8 93.86 avgmath 2018 67.29267 9.598066 27.69 93.93 classize 2019 29.93512 6.545885 8 44 func1 2019 30.95594 6.107924 8 40 tipuach 2019 14.10203 13.49887 0 76 c_size 2019 77.74195 38.81073 8 226 schlcode 2019 39637.98 15266.16 11005 61365
Full sample (col 1). ivregress 2sls avgverb (classize=func1) tipu, vce(cl schlcode) // col 1 Instrumental variables (2SLS) regression Number of obs = 2019 Wald chi2(2) = 595.02 Prob > chi2 = 0.0000 R-squared = 0.3568 Root MSE = 6.1612 (Std. Err. adjusted for 1002 clusters in schlcode) ------------------------------------------------------------------------------ Robust avgverb Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- classize -.1584777.0416256-3.81 0.000 -.2400624 -.0768929 tipuach -.3714599.0159817-23.24 0.000 -.4027836 -.3401363 _cons 84.36879 1.344373 62.76 0.000 81.73387 87.00371 ------------------------------------------------------------------------------ Instrumented: classize Instruments: tipuach func1 Note: Angrist and Lavy (1999) used Moulton (1986) correction, here we used cluster option in ivregress Moulton (1986) Random group effects and the precision of regression estimates, J. of Econometrics, 32, 385-97
Full sample (col 2). ivregress 2sls avgverb (classize=func1) tipu c_size, vce(cl schlcode) // col 2 Instrumental variables (2SLS) regression Number of obs = 2019 Wald chi2(3) = 582.76 Prob > chi2 = 0.0000 R-squared = 0.3397 Root MSE = 6.2424 (Std. Err. adjusted for 1002 clusters in schlcode) ------------------------------------------------------------------------------ Robust avgverb Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- classize -.2770197.0758487-3.65 0.000 -.4256804 -.128359 tipuach -.3687071.0160188-23.02 0.000 -.4001034 -.3373107 c_size.0222903.009124 2.44 0.015.0044076.040173 _cons 86.14565 1.785436 48.25 0.000 82.64626 89.64504 ------------------------------------------------------------------------------ Instrumented: classize Instruments: tipuach c_size func1
Full sample (col 3). ivregress 2sls avgverb (classize=func1) tipu c_size c_size2, vce(cl schlcode) // col 3 Instrumental variables (2SLS) regression Number of obs = 2019 Wald chi2(4) = 606.12 Prob > chi2 = 0.0000 R-squared = 0.3428 Root MSE = 6.2279 (Std. Err. adjusted for 1002 clusters in schlcode) ------------------------------------------------------------------------------ Robust avgverb Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- classize -.2631278.0937161-2.81 0.005 -.4468079 -.0794477 tipuach -.3687087.0159984-23.05 0.000 -.400065 -.3373524 c_size.0131031.0261633 0.50 0.616 -.0381759.0643822 c_size2.0041682.0099564 0.42 0.675 -.015346.0236823 _cons 86.12938 1.797086 47.93 0.000 82.60715 89.6516 ------------------------------------------------------------------------------ Instrumented: classize Instruments: tipuach c_size c_size2 func1
Full sample (col 4). ivregress 2sls avgverb (classize=func1) trend, vce(cl schlcode) // col 4 Instrumental variables (2SLS) regression Number of obs = 1961 Wald chi2(2) = 42.25 Prob > chi2 = 0.0000 R-squared =. Root MSE = 7.7144 (Std. Err. adjusted for 990 clusters in schlcode) ------------------------------------------------------------------------------ Robust avgverb Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- classize -.1898637.1216194-1.56 0.118 -.4282333.0485059 trend.1369107.035901 3.81 0.000.0665461.2072753 _cons 72.55187 1.977625 36.69 0.000 68.6758 76.42795 ------------------------------------------------------------------------------ Instrumented: classize Instruments: trend func1
Discontinuity Sample (+/- 5) (col 5). ivregress 2sls avgverb (classize=func1) tipu if disc==1, vce(cl schlcode) // col 5 Instrumental variables (2SLS) regression Number of obs = 471 Wald chi2(2) = 111.51 Prob > chi2 = 0.0000 R-squared = 0.3139 Root MSE = 6.7689 (Std. Err. adjusted for 224 clusters in schlcode) ------------------------------------------------------------------------------ Robust avgverb Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- classize -.410168.1176445-3.49 0.000 -.6407469 -.1795891 tipuach -.4772855.0484219-9.86 0.000 -.5721907 -.3823803 _cons 93.62 4.001931 23.39 0.000 85.77636 101.4636 ------------------------------------------------------------------------------ Instrumented: classize Instruments: tipuach func1
Discontinuity Sample (+/- 5) (col 6). ivregress 2sls avgverb (classize=func1) tipu c_size if disc==1, vce(cl schlcode) // col 6 Instrumental variables (2SLS) regression Number of obs = 471 Wald chi2(3) = 108.98 Prob > chi2 = 0.0000 R-squared = 0.2401 Root MSE = 7.124 (Std. Err. adjusted for 224 clusters in schlcode) ------------------------------------------------------------------------------ Robust avgverb Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- classize -.5823683.205291-2.84 0.005 -.9847313 -.1800053 tipuach -.4611878.0464016-9.94 0.000 -.5521333 -.3702424 c_size.0529979.0316929 1.67 0.094 -.009119.1151149 _cons 94.66023 4.621571 20.48 0.000 85.60212 103.7183 ------------------------------------------------------------------------------ Instrumented: classize Instruments: tipuach c_size func1