Improving Timeliness and Quality of SILC Data through Sampling Design, Weighting and Variance Estimation

Thomas Glaser Nadja Lamei Richard Heuberger Statistics Austria Directorate Social Statistics Workshop on best practice for EU-SILC - London 17 September 2015 Improving Timeliness and Quality of SILC Data through Sampling Design, Weighting and Variance Estimation www.statistik.at We provide information

Prerequisites: EU-SILC in Austria "Community Statistics on Income an Living Conditions" Carried out in Austria since 2003, cross-sectional and longitudinal survey since 2004 Rotating panel design with 4 subsamples and 4 year panel duration Sample of first wave 2014: one-stage stratified probability sample from population register (ZMR) Minimum effective sample size (net- value): 4.500 households, 5.909 household surveyed in cross-section 2014 Sample survey of private households with voluntary participation Unit response rate 2014 operation: 76% first wave: 64% follow-up waves: 84% www.statistik.at slide 2 17 th September 2015

Prerequisites: Indicators Europe 2020 indicators on poverty and social exclusion: main indicator At-risk-of-poverty or social exclusion (AROPE) Measurement of Europe 2020 Targets from EU-SILC 2008 onwards Main income based indicator: At-risk-of-poverty rate (AROP) Net household income of year preceding the survey year Equivalised net household income (EPINC) Measurement period: year preceding the survey year Households with EPINC below 60% of national median of EPINC are AROP www.statistik.at slide 3 17 th September 2015

Prerequisites: Income from administrative registers Fully implemented register data use started with EU-SILC 2012 About 85% of total household income from administrative register information (remaining income components from questionnaire) Back-calculation of EU-SILC 2008-2011 to ensure unbroken time series from 2008 onwards (also in UDB) Linkage of register and survey data Encrypted personal identifier (bpk) provided by federal ministry of the interior Provision of bpk for gross sample and persons living in households Eurostat grant agreement on Improving Methodology on Sampling, Weighting, Imputation and Variance Estimation in the Austrian EU-SILC with regard to Administrative Data (final results in 2016) www.statistik.at slide 4 17 th September 2015

1. Efficiency gains by sampling www.statistik.at slide 5 17 th September 2015

Selection of first wave sample Type of sampling: one-stage stratified probability sample Sampling units: dwellings registered in the central residence register (ZMR) Sample size (gross): 3,229 households (EU-SILC 2014) Stratification criteria: - Interviewer units (geographical units below NUTS2 level) - Disproportional allocation per NUTS2 level according to expected response rates (based on average response of two preceding years) Large discrepancies of relative distribution of NUTS2-levels between population and net sample would lead to higher dispersion of weight adjustment factors www.statistik.at slide 6 17 th September 2015

Efficiency gains by sampling 25,00 Relative distribution of private households by provinces (NUTS2) for first wave of EU-SILC 2014 in % 20,00 15,00 10,00 5,00 Population Net sample with disproportinal allocation % Hypothetical net sample with proportional allocation % 0,00 Source: EU-SILC 2014 (unpublished results) www.statistik.at slide 7 17 th September 2015

Efficiency gains by sampling Income register data allow for an economic evaluation of (almost) every address in the sampling frame Selecting addresses by building strata according to household income (HINC_REG) percentiles based on register data is possible Optimal allocation (Neyman allocation, cf. Cochran 1977) based on register income distribution to gain smaller standard error of AROP Since AROP is a component of AROPE, this may also yield smaller standard error for main indicator AROPE Ongoing work will be finalised in 2016 www.statistik.at slide 8 17 th September 2015

2. Improving weighting with income register data www.statistik.at slide 9 17 th September 2015

Cross-sectional weighting procedure Three steps of SILC weighting procedure: household weight = = design weight * nonresponse weight * adjustment weight 1) Design weight: inverse selection probability (S strata) d s = N s n s 2) Unit nonresponse weight: Inverse estimated response probability rh (based on stepwise logistic regression) multiplied to design weight b h = d s rh s {1,, S} h {1,, n (r) } 3) Adjustment of weights to external sources Calibration to known marginal distributions => Calibrated household weights (first year wave): ( weight adjustment factors) www.statistik.at slide 10 17 th September 2015

Calibration using marginal distribution from administrative income registers Auxiliary information available for sampling frame and thus also for gross sample of first wave Same source for variables in sample and in marginal distribution New marginal distributions provided by variables from wage tax register: Number of persons receiving income from employment (at least 15 years old) Number of persons receiving income from old-age benefits www.statistik.at slide 11 17 th September 2015

Cross-sectional weighting procedure: Calibration to external marginal distribution Household level Province (NUTS2) Household size Tenure status Burgenland Carinthia Low er Austria Upper Austria Salzburg Styria Tyrol Vorarlberg Vienna Personal level 119,482 1 Person 1,391,569 Ow ner 1,870,325 245,103 2 Persons 1,120,747 Not ow ner 1,891,465 695,689 3 Persons 567,120 606,248 4+ Persons 682,355 230,563 524,291 308,263 157,532 874,619 Age & sex Age Men Women Austria Not Austria 0-13 578,094 548,868 6,240,604 860,356 14-34 1,116,783 1,090,455 35-64 1,770,060 1,799,043 65+ 649,693 850,378 Citizenship (persons aged 16+) Recipients of unemployment benefits Employees (at least 60 days) (persons aged 15+) Source: Statistics Austria, EU-SILC 2014, M icro-census 2014, social security and wage tax register 2013 Retirees 597,568 3,813,936 1,989,331 www.statistik.at slide 12 17 th September 2015

Unit nonresponse analysis Research question: Is there a bias in the main income based indicator (AROP) caused by selective unit nonresponse? Household income based only on income registers (HINC_REG) is used as study variable Y for nonresponse analysis Highly correlated with overall household income (HINC) Unit nonresponse rate first wave of EU-SILC 2014: 36% High rate of persons with potential data in income registers (99%) in EU-SILC 2014 www.statistik.at slide 13 17 th September 2015

Unit nonresponse analysis: definitions Nonresponse All observations are missing for units of a selected sample (Unit-Nonresponse) Nonresponse of a unit is a random variable Occurrence of unit nonresponse has a certain probability (cf. Groves et al. 2004) Bias Systematic deviation of the expected value of an estimator Y from the true value Y in the population (cf. Särndal 2003) Unit nonresponse (UNR) bias Bias caused by unit nonresponse: Systematic deviation of the expected value of an estimator Y based in respondent set r from the value Y in entire gross sample s (cf. Groves 2006, Särndal & Lundström 2005) www.statistik.at slide 14 17 th September 2015

Unit nonresponse bias analysis Comparison with design weighted estimate Estimate of absolute bias Estimate of relative bias weight dataset mean(hinc_reg) (1) d s gross sample 47566 0 0 (2) d s net sample 49161 1595 3.35% (3) (version 1) net sample 48122 557 1.17% d s 1/rh d s 1/rh d s 1/ch 1/rh d s 1/ch 1/rh (4) (version 2) net sample 48302 736 1.55% (5) (version 1) net sample 47912 346 0.73% (6) (version 2) net sample 48213 648 1.36% (7) Calibration, base weight: (2) net sample 47316-250 -0.53% (8) Calibration, base weight: (3) net sample 46905-660 -1.39% (9) Calibration, base weight: (5) net sample 46837-729 -1.53% (10) Calibration, base weight: (4) net sample 47140-425 -0.89% (11) Calibration, base weight: (6) net sample 46966-599 -1.26% Calibration, base weight (4) (only persons present in hh) net sample 46923-642 -1.35% Source: Statistics Austria, EU-SILC 2014 (unpublished results) rh - estimated respose rate ch- estimated contact rate rh - estimated adjusted response rate www.statistik.at slide 15 17 th September 2015

Unit nonresponse bias analysis Comparison with sampling frame dataset mean(hinc_reg) Estimate of absolute bias Estimate of relative bias none sampling frame 46948 0 0 (1) d s gross sample 47566 617 1.31% (2) d s net sample 49161 2213 4.71% (3) (version 1) net sample 48122 1174 2.50% d s 1/rh d s 1/rh d s 1/ch 1/rh d s 1/ch 1/rh weight (4) (version 2) net sample 48302 1354 2.88% (5) (version 1) net sample 47912 963 2.05% (6) (version 2) net sample 48213 1265 2.69% (7) Calibration, base weight: (2) net sample 47316 367 0.78% (8) Calibration, base weight: (3) net sample 46905-43 -0.09% (9) Calibration, base weight: (5) net sample 46837-111 -0.24% (10) Calibration, base weight: (4) net sample 47140 192 0.41% (11) Calibration, base weight: (6) net sample 46966 18 0.04% Calibration, base weight (4) (only persons present in hh) net sample 46923-25 -0.05% Source: Statistics Austria, EU-SILC 2014 (unpublished results) www.statistik.at slide 16 17 th September 2015

Conclusions of unit nonresponse analysis Assuming missing completely at random (MCAR) nonresponse mechanism could lead to substantial bias Modeling unit nonresponse in two steps seems to have a slight effect on reducing bias Calibration seems to reduce bias even if applied directly to the design weights Calibrated weights using register income only for persons who where actually in the net sample shows very similar results to sampling frame for HINC_REG www.statistik.at slide 17 17 th September 2015

3. Measuring improvement of precision caused by calibration weights www.statistik.at slide 18 17 th September 2015

Precision of EU-SILC indicators Variance estimation of AROP and AROPE does not include calibration effect, yielding a conservative estimate Therefore new precision requirements demand a more precise estimation of standard error (SE) for estimates of indicators p SE(p ) < p (1 p ) a N + b Austrian case for AROPE of EU-SILC 2014 (p =19.2%) SE p = 0.7011% > p(1 p) a N+b = 0.5975% Taking into account calibration variables correlated with AROP may have a reducing effect on standard error (cf. Berger/Skinner 2003, Deville 1999, Deville/Särndal 1992) www.statistik.at slide 19 17 th September 2015

Precision of EU-SILC indicators: Evaluation Instead of AROP the residuals ε n of the regression of AROP on the K calibration variables x k are used (Cf. Eurostat 2013, p. 13) AROP n = K k=1 β k x nk ε n = AROP n -AROP n Results show a reduction of the estimated standard error Standard error of AROP on person level in % dichotomous indicator linearized variable bootstrapping (1000 resamples) indicator without calibration effect 0.6262 0.4809 0.6195 residuals including calibration effect 0.6005 0.4662 0.6061 Change of standard error (in%) caused by calibration effect -4.1-3.1-2.2 Source: EU-SILC 2014 (unpublished results) www.statistik.at slide 20 17 th September 2015

Precision of EU-SILC indicators: Evaluation Using residuals AROPE n -AROPE n instead of AROPE shows a reduction of the estimated standard error by 14.7% Standard error of AROPE on person level in % dichotomous indicator indicator without calibration effect 0.7011 residuals including calibration effect 0.5981 Change of standard error (in%) caused by calibration effect -14.7 Source: EU-SILC 2014 (unpublished results) Incorporating calibration in variance estimation yields a slightly smaller estimated standard error and could make it possible to meet new precision requirements www.statistik.at slide 21 17 th September 2015

Concluding remarks The availability of income register data opens various opportunities for improving quality of indicators by sampling, weighting and variance estimation For sampling, optimal allocation may reduce standard error For weighting, marginal distributions from income registers may reduce bias For variance estimation, using residuals from the regression of calibration variables (including variables form income registers) on indicators instead of point estimators make it possible to measure this improvement in efficiency correctly in order to meet precision requirements www.statistik.at slide 22 17 th September 2015

Bibliography Berger, Y. G.; Skinner, C. J. (2003): Variance Estimation for a low income proportion. Applied Statistics 52, Part 4, 457-468. Cochran, W. G. (1977): Sampling Techniques. New York. Wiley Deville, J. C. (1999): Variance Estimation for complex statistics and estimators: linearization and residual techniques. Survey Methodology 25, 193-203. Deville, J. C.; Särndal, C.-E (1992): Calibration estimators in Survey Sampling. Eurostat (2013): Standard error estimation for the EU-SILC indicators of poverty and social exclusion. Eurostat Statistical Working Papers. Luxembourg. Groves, Robert M.; Fowler Jr., Floyd J.; Couper, Mick P.; Lepkowski, James M.; Singer, Eleanor & Tourangeau, Roger (2004): Survey Methodology. Hoboken. Wiley. Groves, Robert M. (2006): Nonresponse Rates and Nonresponse Bias in Household Surveys. Public Opinion Quarterly 70 (5, Special Issue), 646 675. DOI: 10.1093/poq/nfl033. Särndal, C.-E.; Lundström, S. (2005): Estimation in Surveys with Nonresponse. West-Sussex. Wiley. www.statistik.at slide 23 17 th September 2015

Thank you for your attention! Contact: Guglgasse 13, 1110 Wien Tel: +43 (1) 71128-7039 thomas.glaser@statistik.gv.at Any questions or comments? http://www.statistik.at www.statistik.at slide 24 17 th September 2015