Weighting issues in EU-LFS Carlo Lucarelli, Frank Espelage, Eurostat LFS Workshop May 2018, Reykjavik carlo.lucarelli@ec.europa.eu, frank.espelage@ec.europa.eu 1
1. Introduction The current legislation behind the EU-LFS does not provide any details about weighting scheme procedures, and the same holds for the forthcoming IESS Framework Regulation. However, general requirements related to the topic exist: Council Regulation (EC) No 577/98 on the EU- LFS stipulates "The weighting factors are calculated taking into account in particular the probability of selection and external data relating to the distribution of the population being surveyed, by sex, age (five-year age groups) and region (NUTS 2 level), where such external data are held to be sufficiently reliable by the Member States concerned (Article 3(5))". Against this background, countries adopted diverse weighting procedures based on the broader weighting theory, but usually adapted to their national situation, e.g. the sampling strategy applied. Since no common framework was established by EU-LFS regulations, there were no detailed rules to be followed by the countries. Nevertheless over time a straightforward convergence towards post-stratification and calibration procedures encompassing a wider range of auxiliary variables has happened. Weighting schemes became more complex than just using census distributions; auxiliary variables coming from demographic registers and also non demographic sources were introduced in calibration processes. 2. State of play The methods of calculating the weights differ considerably between countries. Two main methods are used, depending on the detail of the external information available and whether or not this external information can be cross-tabulated: Post-stratification where the inverse of the selection probabilities are a-posteriori adjusted to the population s distribution by sex, age groups and other external (administrative) sources; Calibration which consist in different variations of adjusting to marginal totals 1. Most countries adjust for non-response either directly in the weighting process or in a preliminary step before adjusting the weights to external sources. All countries use data on sex in the weighting process. Due to the complexity and number of factors taken into account in some of the weighting schemes, the requirement of the Council Regulation to use five-year age groups is not implemented in all countries, but almost all countries adjust the weighting factors to regional levels even if the regions used may not necessarily correspond to the NUTS 2 classification. An overall picture of the variables used in the weighting procedure by country is shown in Table 1. Other external distributions or sources that are used both for stratification and weighting are urban/rural distinction, nationality, and ethnicity, size classes of regions or local areas and register statistics on employment/unemployment. Thirteen countries, namely Belgium, the Czech Republic, Germany, Estonia, Ireland, Portugal, Romania, Slovenia, Slovakia, Finland, Sweden, Iceland and Norway, gross the sample to the total population, i.e. including people living in institutional households, although some of them do not (Belgium, the Czech Republic, Ireland and Slovenia) or only partially (Portugal, Romania, Slovakia) cover the institutional population in the data collection. All countries adopt some non-response adjustment method in their weighting procedure, mainly through post-stratification/calibration. 1 The most common reference is Deville J.C., Särndal, 1992. 2
TABLE 1: VARIABLES USED IN THE WEIGHTING PROCEDURE IN 2016 Country Weighting method Sex Five-year age groups Variables in the weighting procedure NUTS II Nationality Urban/rural Employment/unemplo yment register info Belgium Post-stratification Yes Yes NUTS2 No No No Bulgaria Calibration Yes Yes NUTS3 No Yes No Czech Republic Post-stratification Yes Yes* LAU1 No No No Denmark Calibration Yes Yes* NUTS2 Yes No Yes Germany Post-stratification Yes Age groups: 0-14, 15-44, 45 and more NUTS2-NUTS3 Yes No No Estonia Calibration Yes Yes NUTS4 Yes Yes No Ireland Calibration Yes Yes NUTS3 Yes No No Greece Post-stratification Yes Ten-years age groups NUTS2 No No No Spain Calibration Yes Yes* NUTS3 Yes No No France Calibration Yes Yes NUTS2 No No No Croatia Calibration Yes Yes NUTS2-NUTS3 No Yes No Italy Calibration Yes Yes NUTS3 Yes No No Cyprus Calibration Yes Yes NUTS2 No Yes No Latvia Post-stratification Yes Yes NUTS3 No Yes Yes Lithuania Calibration Yes Yes NUTS4 No Yes No Luxembourg Post-stratification Yes Yes* NUTS3 Yes No No Hungary Calibration (generalized raking) Yes Yes NUTS3 No Yes No Malta Calibration Yes Yes NUTS4 No No No Netherlands Multiple post-stratifications Yes Yes NUTS3-NUTS4 Yes No Yes Austria Calibration Yes Yes NUTS2 Yes No Yes Poland Post-stratification Yes Yes* NUTS2 No Yes No Portugal Calibration Yes Yes NUTS3 No No No Romania Calibration Yes Yes NUTS2 No Yes No Slovenia Calibration Yes Ten-years age groups (between 20-69) NUTS3 No No No Slovakia Post-stratification Yes Yes NUTS3 No No No Finland Calibration Yes Yes NUTS3 No No Yes Sweden Generalised regression estimation Yes Yes NUTS3 Yes No Yes United Kingdom Calibration Yes Yes LAU1 No No No Iceland Post-stratification Yes Yes NUTS2 No No No Norway Post stratification / calibration Yes Yes NUTS3 No No Yes Switzerland Calibration Yes Yes NUTS3 Yes No Yes Former Yugoslav Republic of Macedonia Calibration Yes Yes NUTS3 No No No Turkey Calibration Yes Yes* NUTS2 No Yes No * Five-year age groups up to 64 years old. 3. Specific issues Sub-sampling and wave approach Commission Regulation No 377/2008 established the conditions for the use of a sub-sample for the collection of data on structural variables. Annex I of the Regulation includes: "The word yearly [ ] identifies structural variables which optionally need only to be surveyed as annual averages, using a sub-sample of independent observations with reference to 52 weeks, rather than as quarterly averages. Core variables to be surveyed each quarter are identified as quarterly. [ ] Consistency between annual sub-sample totals and full-sample annual averages shall be ensured for employment, unemployment and inactive population by sex and for the following age groups: 15 to 24, 25 to 34, 35 to 44, 45 to 54, 55 +". The implementation of the wave approach is aimed at reducing response burden and in consequence also non-response, but can affect the reliability of estimates. For that reason addressing the questions related to the structural variables in the first wave is advisable in order to have a higher response rate and avoid the attrition effect. In 2017 twelve countries implemented sub-sampling for the core LFS. Regarding ad-hoc modules (hereinafter AHM) the situation is a bit more advanced, with more than half of the 3
countries applying a kind of sub-sampling there (see Table 2). Denmark, Luxembourg, Finland and Sweden collect household information using sub-sampling. In all 12 countries the method of calculating the weights for the core subsample is similar to the one used for the quarterly weights, but even more variables are included in the calibration to assure the consistency of annual sub-sample totals with given full-sample annual average figures for the ILO status by sex and ten year age groups. TABLE 2: COUNTRIES ADOPTING SUBSAMPLING IN 2016/2017 Subsampling Number of countries Countries Structural variables * 12 Ad hoc modules ** 16 Household * 4 DK, LU, FI, SE * Information refers to 2017 ** Information refers to 2016 BE, BG, CZ, ES, FR, LV, LU, NL, FI, UK, NO, CH BG, CZ, DK, DE, ES, FR, LV, LU, MT, NL, AT, FI, SE, UK, NO, CH The IESS regulation introduces just minor changes to sub-sampling for the core LFS, namely a minimum size of the subsample (at least one wave per quarter, not less than one eighth of the full quarterly sample) and, for the consistency requirements, a split of the age group 55+ into 55-64 and 65+. Otherwise core LFS sub-sampling remains an option countries can choose (or not). As the weighting of annual core sub-samples works quite well already, it can be expected to work equally well in the future. The future regular modules under IESS will be the only blocks of variables for which yearly subsampling will become compulsory. Some proposals have been discussed in TF3 which take into account different ways of implementing sub-sampling and its impact on of the sampling design (See Eurostat (a) 2017). Overall there will be several clearly defined possibilities in the future, and the situation could vary a lot from households surveys which collect yearly variables including household information for the whole sample and just AHM variables for a wave subsample (usually the first wave or the last) to individual surveys with maximum use of subsampling, which collect yearly, households and AHM variables e.g. in the same wave (the first one is preferable to avoid the effect of panel attrition). Household weighting The current LFS regulations do not provide any recommendation on household weighting, but LFS based household estimates are disseminated by Eurostat and also by several countries at national level. Eurostat uses the same 'household' weights 2 to produce estimates at household level (like number and size of households) 3 as well as for estimates at individual level against the household background (like employment by household composition/by number of children). The weights should hence be appropriately calculated to provide correct results for both purposes. 2 Special household weights in case of household subsampling, yearly weights for other countries. Yearly weights can be further distinguished into real yearly weights in case of core subsampling, and the average of the quarterly weights for the rest. 3 So far household level results have been calculated using the weight of the reference person, but the December 2017 LAMAS decided to switch to the average weight of the household members. 4
So, how do the countries perform household weighting? Around 20 countries use the standard core weighting procedure for computing both individual and household weights, with further special adaptation in countries like France and Poland. 4. Finland, Sweden, the United Kingdom and Norway adopt different calibration procedure for core weights and household weights. In five countries (Bulgaria, Greece, Ireland, Malta and Slovakia) households weight are not computed at all. Apart from information at individual level, 20 countries make also use of information at household level in their weighting procedure: number of households, household size, household type or composition. This information is derived mainly from demographic registers which often represent the same base used for the sample. Six countries (Germany, Estonia, Latvia, Portugal, Slovenia and Sweden) rely only on survey data about the household structure in their household weighting. The weighting procedure in more than half of the countries produces individual weights which are identical for all household members, while in few others this is limited to adults. A discussion on household weighting took place during several LAMAS meetings in 2015 5 and 2017 6. Summarizing the discussions, the following recommendations for household weights might become part of the future LFS implementing act under IESS: At household level: 1. Household weighting should take the number of households and household size into account. 2. The households, weighted with the average household weight, should equal the number of households in the country. At individual level: 3. The weighted number of records should equal the total population (i.e. 0+). 4. The weighted number of records defined as adults should equal a similar figure in the population (and in consequence the one of children fulfil a similar requirement). 5. The ILO labour status distribution should equal full sample annual average results (by sex, age group ) General: 6. If possible, weights should be equal for all persons in a household. As only few countries apply real household sub-sampling and Eurostat uses either yearly or even quarterly weights for the other countries, these conditions might be requested for the latter as well as the requirements concern household results for all countries. Ad-hoc module weighting As for household weighting, the current LFS regulations do neither provide any recommendation on ad-hoc module weighting. However, the situation is a bit simpler, as it generally concerns results at individual level only. Looking at module sampling requirements, Council Regulation 545/2014 amending Council Regulation (EC) No 577/98 stipulates (Article 7a): 4 Poland calculates special household weights for all household members, including the ones that did not participate in the LFS and have a quarterly weight equal to 0. As household weight, Poland calculates the mean value of the individual weights of all household members (including the non-respondents). 5 See Eurostat 2015. 6 See Eurostat (b) 2017 5
"2. The sample used to collect information on ad hoc modules shall also provide information on structural variables. 3. The sample used to collect information on ad hoc modules shall fulfil one of the following conditions: (a) collecting the information on ad hoc modules in the 52 reference weeks and being subject to the same requirements as any yearly estimate; or (b) collecting the information on ad hoc modules in the complete sample of at least one quarter." AHM sampling started in one quarter in the past, while the sampling throughout the reference year will be the future requirement under IESS. The AHM implementation followed by the countries in 2016 is summarized in Table 3. TABLE 3. AD-HOC MODULE IMPLEMENTATION IN 2016 AHM sampling Number of countries Countries Complete sample in at least one quarter * sub-sample wave approach ** sub-sample no wave approach * One quarter: Q2 for all countries except EE (Q2 and Q4) ** All 4 quarters except MT (Q1 and Q2). 16 14 1 DE BE, EE, IE, EL, HR, IT, CY, LV, LT, HU, PL, PT, RO, SI, SK, TR BG, CZ, DK, ES, FR, LU, MT, NL, AT, FI, SE, UK, NO, CH The AHM weighting basically follows the approach chosen (Table 4). All countries who survey the AHM in one specific quarter apply the quarterly core weights, while most with annual subsampling of core and AHM apply the yearly weights. Under IESS, it is envisaged to define weighting and consistency requirements for the regular modules and ad-hoc subject in line with the approach for yearly sub-sampling: 1. Weighting taking the distribution of the population being surveyed by sex and age group(s) into account, and 2. Consistency between annual module sub-sample totals and full-sample annual averages for employment, unemployment and inactive population by sex and age group(s) Please note that module sub-sampling for a possible annual sub-sample used for general core LFS sub-sampling should automatically fulfil these conditions. TABLE 4. AD-HOC MODULE WEIGHTING IN 2016 AHM weights Number of countries Countries Quarterly core weights 15 Annual sub-sample core weights Specific for AHM 8 BE, DK, IE, EL, HR, IT, CY, LV, LT, HU, PT, RO, SI, SK, TR 8 BG, CZ, ES, LU, NL, FI, UK, NO DE, EE, FR, MT, AT, PL, SE, CH 6
Use of registers in weighting In the current legislation there are no clear rules on the use of administrative data in the weighting. Council Regulation (EC) No 577/98 states "whereas the use of existing administrative sources should be encouraged insofar as they can usefully supplement the information obtained through interviews or serve as a sampling basis" but it does not directly refer to weighting. Currently 9 countries (Denmark, France, Latvia, the Netherlands, Austria, Finland, Sweden, Norway and Switzerland) declare making use of non-demographic administrative data in their weighting procedures. The most widely used source is the unemployment register (8 countries) followed by employment registers and income/tax registers, both adopted in 5 countries. The procedure adopted by the countries for the implementation of register information in their weighting procedure mostly involves adding variables, derived from non-demographic register(s), to their national LFS datasets to be used for post-stratification and/or calibration. The use of registers in the LFS weighting procedures can support and improve accuracy and internal comparability, but also coherence with other sources. On the other hand, different concepts or definitions between LFS and register information have to be taken into account when integration between different sources is performed. Following a consultation and careful evaluation, LAMAS in its December 2017 meeting agreed that no country directly calibrates a variable determining the ILO labour status to match non-demographic administrative data. The impact of the use of administrative sources in the weighting procedure is not documented in a standardised and comparable way yet; Eurostat is reflecting about a better coverage of this issue in the quality reports. In the future most of the 9 countries plan to introduce improvements as regards the use of nondemographic data in their weighting scheme, in particular introducing other variables coming from further administrative sources. Other countries like Italy and United Kingdom are also interested in the introduction of administrative information in their weighting procedures. Under IESS, the use of registers is encouraged as well as methods based on integration of different sources, as far as these techniques can enhance the quality or reduce the costs of the survey 7. However, any use of register employment or unemployment in the weighting must not impact on the harmonised measurement of the ILO labour force status in the LFS. 4. Conclusion The weighting procedure is an important and sensitive step in the whole process of LFS data production. The weighting procedure is also directly influenced by organizational choices linked to the cost reduction of the survey and the reduction of the statistical burden on respondents. Approaches like sub-sampling and use of administrative data allow producing information even in the presence of such constraints. This document provided an overview of the different (sub-)samples for which weighting might be needed, described the current legal and practical situation and possible future requirements. It very broadly also assessed the situation regarding the use of register information in the weighting. The future LFS implementing regulation(s) will have to take these aspects into account and provide a general framework to which national weighting approaches should adapt in order to enhance the comparability between labour market estimates of different countries. The future LFS implementing act(s) will include more weighting rules than in the past, explicitly adding module and household weighting requirements and further consistency requirements across 7 See European Commission 2016. 7
sub-samples. The right balance between all possible requirements proposed and practical restrictions in their implementation has still to be discussed. References Commission Regulation N 377/2008, implementing Council Regulation (EC) No 577/98 on the organisation of a labour force sample survey in the Community as regards the codification to be used for data transmission from 2009 onwards, the use of a sub-sample for the collection of data on structural variables and the definition of the reference quarters, 9 March 1998. Council Regulation N 577/98, on the organisation of a labour force sample survey in the Community, 9 March 1998. Council Regulation N 545/2014, amending Council Regulation (EC) No 577/98 on the organisation of a labour force sample survey in the Community, 15 May 2014. Deville J.C., Särndal, "Calibration Estimators in Survey Sampling", JASA, 1992 pp376-382. European Commission, Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL establishing a common framework for European statistics relating to persons and households, based on data at individual level collected from samples, Brussels, 25 August 2016. Eurostat (a) "Item 6 Subsampling in the future LFS", TF3 Meeting LAMAS Working Group, Luxembourg, 7-8 February 2017 Eurostat (b) "Item2.6 TF3: Household weighting Results of the LAMAS consultation and comparison of different weighting methods", LAMAS Working Group, Luxembourg, 6-7 December 2017 Eurostat (c) "Item2.5 TF3: Use of registers for weighting Results of the LAMAS consultation", LAMAS Working Group, Luxembourg, 6-7 December 2017 Eurostat "Item2.3 Household subsampling in the future LFS", LAMAS Working Group, Luxembourg, 17-19 June 2015 Jørn Ivar Hamre, "Towards a new weighting procedure in the Norwegian LFS", Statistics Norway, 2015. Jørn Ivar Hamre, Johan Heldal "Improved calculation and dissemination of coefficients of variation in the Norwegian LFS", Statistics Norway, Notater Documents N 46/2013. Meraner A., Gumprecht D., Kovarik A., "Weighting Procedure of the Austrian Microcensus using Administrative Data", Austrian Journal of Statistics, September 2016, pp. 3-14 ONS, "Family and Households", Information paper Quality and Methodology Information, 17 November 2015, Newport Statistics Denmark, "The Danish LFS and register based statistics", 2017. 8