Modeling Panel Data: Choosing the Correct Strategy. Roberto G. Gutierrez

Modeling Panel Data: Choosing the Correct Strategy Roberto G. Gutierrez

2 / 25 #analyticsx Overview Panel data are ubiquitous in not only economics, but in all fields Panel data have intrinsic modeling advantages You model panel data in SAS with the PANEL procedure Different model alternatives depending on assumptions and properties Key new features in SAS/ETS 14.1

3 / 25 #analyticsx Panel Data Panel data consist of a set of individuals measured over several points in time Known by other names: longitudinal data, cross-sectional time series, clustered data, multilevel (two-level) data, etc. Data collected in this manner offer key design advantages to modeling The greatest advantage is that the individuals act as their own control group

4 / 25 #analyticsx Panel Regression Model Formally, consider the linear regression model y it D ˇ0 C ˇxX it C ˇzZ i C i C it for i D 1; : : : ; N individuals on t D 1; : : : ; T i time periods. The X variables vary over time The Z variables are constant within individuals The i are individual (or cross section) effects The it are the observation-level errors Different estimation strategies for what you are willing to assume about X, Z, i, and it

5 / 25 #analyticsx Grocery Data Consumer-loyalty data from 330 households who shopped at a grocery chain in Raleigh, North Carolina Monthly expenditures for the year 2011; some monthly data missing Model meat expenditures on the following factors: I I I I I Received government assistance for that month? Household size Rural store location visited during the month? Were at least 10% of total expenditures for alcohol? The number of meals per week outside household, as provided on initial survey Specifically, assess the association with government assistance controlling for the other factors, and for latent household effects

6 / 25 #analyticsx Data Statement data Grocery; input HouseID Month Meat Govt Hsize Rural Alcohol MealsOut; datalines; 1 1 55.841 1 5 0 1 3 1 3 49.372 1 5 0 1 3 1 4 59.43 1 5 0 1 3 1 5 52.25 1 5 0 1 3 1 6 41.623 1 5 0 0 3 1 7 59.357 1 5 0 1 3 1 9 58.512 1 5 0 0 3... 330 9 55.264 1 4 0 0 2 330 10 55.096 1 4 0 0 2 330 12 49.676 1 4 0 0 2 ;

7 / 25 #analyticsx Random-Effects Estimation Model is Meat it D ˇ0 C ˇ1Govt it C ˇ2Hsize i C ˇ3Rural it C ˇ4Alcohol it C ˇ5MealsOut i C i C it Random-effects estimation is the most common strategy Treats the households as a random sample and the i as uncorrelated with X, Z, and it A Hausman test is provided as a referendum on that assumption Also known as generalized least squares (GLS)

8 / 25 #analyticsx Random-Effects Estimation proc panel data = Grocery; id HouseID Month; model Meat = Govt Hsize Rural Alcohol MealsOut / ranone; run; Wansbeek and Kapteyn Variance Components (RanOne) Dependent Variable: Meat (Meat purchases per store visit) Model Description Estimation Method RanOne Number of Cross Sections 330 Time Series Length 12 Fit Statistics SSE 84930.9948 DFE 3567 MSE 23.8102 Root MSE 4.8796 R-Square 0.1232

9 / 25 #analyticsx Random-Effects Estimation Wansbeek and Kapteyn Variance Components (RanOne) Dependent Variable: Meat (Meat purchases per store visit) Variance Component Estimates Variance Component for Cross Sections 190.123 Variance Component for Error 24.99832 Hausman Test for Random Effects Coefficients DF m Value Pr > m 3 3 25.72 <.0001 Parameter Estimates Variable DF Estimate Standard Error t Value Pr > t Label Intercept 1 20.50606 2.3327 8.79 <.0001 Intercept Govt 1 5.050562 0.5989 8.43 <.0001 1 if used government assistance that month Hsize 1 5.145648 0.4774 10.78 <.0001 Household size Rural 1-1.41068 0.3449-4.09 <.0001 1 if rural location visited at least once Alcohol 1 2.982397 0.1960 15.22 <.0001 1 if at least 10% spent on alcohol MealsOut 1-2.82761 0.3848-7.35 <.0001 Meals per week outside of household (survey)

10 / 25 #analyticsx Correlated Individual Effects The Hausman test puts the random-effects results into question The problem is that the individual effects are likely correlated with the explanatory variables This does not happen in designed experiments, but who has that these days? Does the regression coefficient on Govt represent A. The effect of a household going on government assistance; or B. A comparison of two different households, one on government assistance throughout and one not? What do you want it to represent?

Correlated Individual Effects 11 / 25 #analyticsx

12 / 25 #analyticsx Fixed-Effects Estimation Fixed-effects estimation does not assume that individual effects are uncorrelated It produces regression coefficients that are based solely on within-household comparisons Equivalent to inserting a dummy regressor for each household You lose some efficiency from not using any between-household data Regressors are required to vary within households

13 / 25 #analyticsx Fixed-Effects Estimation proc panel data = Grocery; id HouseID Month; model Meat = Govt Hsize Rural Alcohol MealsOut / fixone; run;

14 / 25 #analyticsx Fixed-Effects Estimation Fixed One-Way Estimates Dependent Variable: Meat (Meat purchases per store visit) F Test for No Fixed Effects Num DF Den DF F Value Pr > F 329 3240 32.06 <.0001 Parameter Estimates Variable DF Estimate Standard Error t Value Pr > t Label Intercept 1 53.89442 1.6500 32.66 <.0001 Intercept Govt 1 3.591205 0.6650 5.40 <.0001 1 if used government assistance that month Hsize 0 0... Household size Rural 1-1.45444 0.3578-4.07 <.0001 1 if rural location visited at least once Alcohol 1 2.992035 0.2013 14.86 <.0001 1 if at least 10% spent on alcohol MealsOut 0 0... Meals per week outside of household (survey)

15 / 25 #analyticsx Between-Groups Estimation Rarely useful, put provided for comparison proc panel data = Grocery; id HouseID Month; model Meat = Govt Hsize Rural Alcohol MealsOut / btwng; run;

16 / 25 #analyticsx Between-Groups Estimation Between-Groups Estimates Dependent Variable: Meat (Meat purchases per store visit) Parameter Estimates Variable DF Estimate Standard Error t Value Pr > t Label Intercept 1 16.98442 1.7004 9.99 <.0001 Intercept Govt 1 13.40059 0.9886 13.56 <.0001 1 if used government assistance that month Hsize 1 5.092447 0.3032 16.80 <.0001 Household size Rural 1 0.005439 1.4038 0.00 0.9969 1 if rural location visited at least once Alcohol 1 1.082457 1.7681 0.61 0.5408 1 if at least 10% spent on alcohol MealsOut 1-2.67669 0.2629-10.18 <.0001 Meals per week outside of household (survey)

17 / 25 #analyticsx Hausman-Taylor Estimation Random effects: All regressors uncorrelated with i Fixed effects: They might all be correlated Hausman-Taylor: Why not stipulate some regressors as correlated, and have the best of both worlds? Choose correlated variables based on substantive knowledge, or guess; there s a test for that Estimation is done using instrumental variables, determined internally This is a new feature of SAS/ETS 14.1

18 / 25 #analyticsx Hausman-Taylor Estimation proc panel data = Grocery; id HouseID Month; instruments correlated = (Govt Mealsout); model Meat = Govt Hsize Rural Alcohol MealsOut / htaylor; run;

19 / 25 #analyticsx Hausman-Taylor Estimation Hausman and Taylor Model for Correlated Individual Effects (HTaylor) Dependent Variable: Meat (Meat purchases per store visit) Variance Component Estimates Variance Component for Cross Sections 97.29627 Variance Component for Error 24.97519 Hausman Test against Fixed Effects Coefficients DF m Value Pr > m 3 1 0.76 0.3824 Parameter Estimates Variable Type DF Estimate Standard Error t Value Pr > t Label Intercept 1 19.12589 2.4038 7.96 <.0001 Intercept Govt C 1 3.583391 0.6649 5.39 <.0001 1 if used government assistance that month Hsize TI 1 5.17389 0.3523 14.68 <.0001 Household size Rural 1-1.43991 0.3573-4.03 <.0001 1 if rural location visited at least once Alcohol 1 2.974996 0.2004 14.85 <.0001 1 if at least 10% spent on alcohol MealsOut C TI 1-1.92242 0.8090-2.38 0.0175 Meals per week outside of household (survey) C: correlated with the individual effects TI: constant (time-invariant) within cross sections

20 / 25 #analyticsx The COMPARE Statement The COMPARE statement is another new feature of PROC PANEL in SAS/ETS 14.1 Makes it easy to compare various models and estimators side by side proc panel data = Grocery; id HouseID Month; instruments correlated = (Govt Mealsout); model Meat = Govt Hsize Rural Alcohol MealsOut / ranone fixone btwng htaylor; compare; run;

21 / 25 #analyticsx The COMPARE Statement Model Comparison Dependent Variable: Meat (Meat purchases per store visit) Comparison of Model Parameter Estimates Variable Model 1 FixOne Model 1 RanOne Model 1 HTaylor Model 1 BtwGrps Intercept Estimate Std Err 53.894415 1.649992 20.506060 2.332669 19.125895 2.403772 16.984418 1.700415 Govt Estimate Std Err 3.591205 0.665025 5.050562 0.598942 3.583391 0.664876 13.400587 0.988573 Hsize Estimate Std Err 0. 5.145648 0.477447 5.173890 0.352344 5.092447 0.303155 Rural Estimate Std Err -1.454439 0.357766-1.410680 0.344892-1.439905 0.357340 0.005439 1.403805 Alcohol Estimate Std Err 2.992035 0.201343 2.982397 0.196014 2.974996 0.200391 1.082457 1.768137 MealsOut Estimate Std Err 0. -2.827608 0.384842-1.922421 0.808967-2.676694 0.262948

22 / 25 #analyticsx Other Capabilities PROC PANEL can also do much more: Two-way models Dynamic-panel models Adjustments for serial correlation, heteroscedasticity, and clustering Unit root tests Model specification tests (e.g. Durbin-Watson)

23 / 25 #analyticsx Summary Panel data offer modeling advantages Use PROC PANEL in SAS/ETS for panel data regression Many estimators available depending on assumptions Correlated individual effects can be problematic, but there are solutions New features in SAS/ETS 14.1