Correcting for non-response bias using socio-economic register data

Similar documents
Analysis of nonresponse bias for the Swedish Labour Force Surveys (LFS)

CONSUMPTION POVERTY IN THE REPUBLIC OF KOSOVO April 2017

LABOUR MARKET. People in the labour market employment People in the labour market unemployment Labour market policy and public expenditure

RESULTS OF THE KOSOVO 2015 LABOUR FORCE SURVEY JUNE Public Disclosure Authorized. Public Disclosure Authorized. Public Disclosure Authorized

LABOUR FORCE SURVEY 2017 MAIN RESULTS

Nemat Khuduzade, Deputy Head Labour Statistics Department, SSC of Azerbaijan

MONTENEGRO. SWTS country brief. December Main findings of the ILO SWTS

FINAL QUALITY REPORT EU-SILC

An Imputation Model for Dropouts in Unemployment Data

Flow statistics. Work towards longitudinal analyses based on the LFS.

Sweden 2000: Survey Information

Response Mode and Bias Analysis in the IRS Individual Taxpayer Burden Survey

The Interaction of Workforce Development Programs and Unemployment Compensation by Individuals with Disabilities in Washington State

Labour force, Employment and Unemployment First quarter 2018

Structure of Earnings Survey Finland Quality evaluation report

Sample Design of the National Population Health Survey

Labour force, Employment and Unemployment First quarter 2017

Central Statistical Bureau of Latvia FINAL QUALITY REPORT RELATING TO EU-SILC OPERATIONS

UK Labour Market Flows

The Moldovan experience in the measurement of inequalities

Demographic and economic assumptions used in actuarial valuations of social security and pension schemes

The Impact of Demographic Change on the. of Managers and

Survey on Income and Living Conditions (SILC)

Economically Active Population Flow Statistics. Methodology for the calculation of flows in absolute values

2015 Social Protection Performance Monitor (SPPM) dashboard results

1. ECONOMIC ACTIVITY

Transition Between Labour Market Statuses a Comparison Between the LFS and the Labour Market Account (LMA) in Denmark

THE CAYMAN ISLANDS LABOUR FORCE SURVEY REPORT SPRING 2017

EU Survey on Income and Living Conditions (EU-SILC)

in focus Statistics Contents Labour Mar k et Lat est Tr ends 1st quar t er 2006 dat a Em ploym ent r at e in t he EU: t r end st ill up

CCHS and NPHS An improved Health Survey Program at Statistics Canada

2000 HOUSING AND POPULATION CENSUS

Downloads from this web forum are for private, non-commercial use only. Consult the copyright and media usage guidelines on

Unemployment rate fell in November compared with one year earlier

INSTITUTO NACIONAL DE ESTADÍSTICA. Descriptive study of poverty in Spain Results based on the Living Conditions Survey 2004

PART B Details of ICT collections

Current Population Survey (CPS)

REPUBLIC OF MOLDOVA. SWTS country brief. December Main findings of the ILO SWTS

COVER NOTE The Employment Committee Permanent Representatives Committee (Part I) / Council EPSCO Employment Performance Monitor - Endorsement

Weighting issues in EU-LFS

Introduction to Survey Weights for National Adult Tobacco Survey. Sean Hu, MD., MS., DrPH. Office on Smoking and Health

ZAMBIA. SWTS country brief January Main findings of the ILO SWTS

The use of linked administrative data to tackle non response and attrition in longitudinal studies

Day 1, Session 3, UN Workshop on Improving the Intergation of a Gender Perspective inti Official Chiba, Japan April 2013

THE CAYMAN ISLANDS LABOUR FORCE SURVEY REPORT FALL. Published March 2017

P R E S S R E L E A S E Risk of poverty

SERBIA. SWTS country brief. December Main findings of the ILO SWTS

LEBANON. SWTS country brief. December Main findings of the ILO SWTS

A Review of the Sampling and Calibration Methodology of the Survey on Income and Living Conditions (SILC)

Russia Longitudinal Monitoring Survey (RLMS) Sample Attrition, Replenishment, and Weighting in Rounds V-VII

Health Insurance Coverage in Oklahoma: 2008

JORDAN. SWTS country brief. December Main findings of the ILO SWTS

MALAWI. SWTS country brief October Main findings of the ILO SWTS

CASEN 2011, ECLAC clarifications Background on the National Socioeconomic Survey (CASEN) 2011

SCIP: Survey Sample Size

POTENTIAL OF LABOUR MARKET AND ECONOMIC DEPENDENCY THE MODELS OF ESTIMATED DEVELOPMENT OF LABOUR MARKET

60% of household expenditures on housing, food and transport

Effective Retirement Age in Jari Kannisto Development Manager 5 Feb. 2015

Pensions and other age-related expenditures in Europe Is ageing too expensive?

Copies can be obtained from the:

2014/2015. Social Protection in the Nordic Countries. Scope, Expenditure and Financing

Unemployment rate estimated at 13.7%

Discussion paper 1 Comparative labour statistics Labour force survey: first round pilot February 2000

Alice Nabalamba, Ph.D. Statistics Department African Development Bank Group

GOVERNMENT PAPER. Challenged by globalisation and ageing of population; the Finnish baby boom cohorts were born in

Labour. Labour market dynamics in South Africa, statistics STATS SA STATISTICS SOUTH AFRICA

Copies can be obtained from the:

in focus Statistics T he em ploym ent of senior s in t he Eur opean Union Contents POPULATION AND SOCIAL CONDITIONS 15/2006 Labour market

Quarterly Labour Market Report. December 2016

Digital Divide: From Computer Access to Online Activities A Micro Data Analysis

Anomalies under Jackknife Variance Estimation Incorporating Rao-Shao Adjustment in the Medical Expenditure Panel Survey - Insurance Component 1

LabourForceSurveyin Belarus: determinationof samplesize, sampledesign, statisticalweighting

Community Survey on ICT usage in households and by individuals 2010 Metadata / Quality report

Quarterly Labour Force Survey

INTERMEDIATE QUALITY REPORT

Phase 1 Evaluation of The Training Incentive Allowance

EU-SILC: Impact Study on Comparability of National Implementations

Downloads from this web forum are for private, non-commercial use only. Consult the copyright and media usage guidelines on

Nicholas C Garganas: The ageing of Europe s population: consequences and reforms with particular reference to Greece

Using the British Household Panel Survey to explore changes in housing tenure in England

Strathprints Institutional Repository

INCOME DISTRIBUTION AND INEQUALITY IN LUXEMBOURG AND THE NEIGHBOURING COUNTRIES,

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse 1

1.1. increase the adult minimum wage from $16.50 to $17.70 per hour from 1 April 2019;

An Evaluation of Nonresponse Adjustment Cells for the Household Component of the Medical Expenditure Panel Survey (MEPS) 1

A Profile of Payday Loans Consumers Based on the 2014 Canadian Financial Capability Survey. Wayne Simpson. Khan Islam*

The employability of young graduates in Europe

MAIN LABOUR FORCE SURVEY RESULTS FOR THE FOURTH QUARTER OF 2013

COUNCIL OF THE EUROPEAN UNION. Brussels, 5 November /01 LIMITE SOC 415 ECOFIN 310 EDUC 126 SAN 138

There were 2,275 employing organisations in Guernsey in March 2015, which is two fewer than in March 2014.

Peterborough Sub-Regional Strategic Housing Market Assessment

Economic activity framework

FINNISH CENTRE FOR PENSIONS, REPORTS. Pension Indicators 2016

The Public Reallocation of Resources across Age: A Comparison of Austria and Sweden

COUNCIL OF THE EUROPEAN UNION. Brussels, 13 June /1/13 REV 1 SOC 409 ECOFIN 444 EDUC 190

JOINT EMPLOYMENT REPORT STATISTICAL ANNEX

7 Construction of Survey Weights

Labour Market: Analysis of the NIDS Wave 1 Dataset

A STATISTICAL PROFILE OF WOMEN IN THE SASKATCHEWAN LABOUR MARKET

Chapter 7 INTERNATIONAL GENDER PERSPECTIVE

Transcription:

Correcting for non-response bias using socio-economic register data Liisa Larja & Riku Salonen liisa.larja@stat.fi / riku.salonen@stat.fi Introduction Increasing non-response is a problem for population surveys like LFS in many countries. Non-response may cause bias for survey estimates, if the response activity is not equally distributed in the population. This is the case especially when the non-response distribution is correlated with the distribution of labour market status in the population groups. Typically, demographic data (e.g. age, sex, region) is used in the estimation process to correct for this bias. This paper will discuss how also socio-economic register data can be used to correct for non-response related bias in the estimates. In the Finnish LFS, register unemployment has been part of the estimation procedure already since 1997, when it was noted that the LFS the post-stratification design based on sex, age groups and region did not capture overrepresentation of unemployed among the nonresponding population (Djerf, 1997). At present, we have noticed some shortcomings in our estimation procedure and are investigating possibilities of adding new socioeconomic data to our model. Research questions In this paper, we present results from Finnish LFS where we already employ data on register unemployment and investigate possibilities to add data on education and income. First, we will show how non-response varies in different groups. Then we will present how adding these data to the calibration model will affect the level and precision of the estimates on employment and unemployment. Current estimation procedure The Finnish LFS has been conducted by Statistics Finland since 1959. Over the years the estimation procedure has changed several times. Until 1997, only demographic data was used in in the estimation process (post-stratification). The current estimation procedure was introduced in 1997. Register data on unemployment were used as auxiliary information at the estimation stage. The use of such socio-economic auxiliary data significantly improved estimates on unemployment by reducing sampling errors and nonresponse bias (Djerf 1997). GREG estimation was used in this procedure. 1 Since the introduction of the current estimation procedure in 1997, the non-response has increased considerably from 8 % in 1996 to 33 % in 2017. The most problematic part of non-response is, that it may 1 In the Finnish LFS, the non-response adjustment is the two-step re-weighting procedure. In the first step, the post-stratification is used to improve the precision of estimation. 194 post-strata are constructed: a stratum of Mainland Finland has been divided into 192 post-strata by sex (2 groups), age (6 groups) and region (16 groups) and another stratum of the Autonomous Territory of the Åland Islands has been divided into two post-strata by sex. To allow the use of more variables, the post-stratified weights are calibrated according to sex (2 groups), age (12 groups), region (20 groups), reference week (4 or 5 groups) and status in Ministry of Labour s job-seeker register (8 groups). This will match the distributions of all variables included in the calibration process to the distributions in the calibration frame.

not be evenly distributed in the population and that the distribution of non-response in the population may be correlated with the distribution of labour market status. In the case of FI-LFS, we see that the nonresponse of the least educated (ISCED 0-2) has grown to as high as 40 percent whereas the non-response rate for the persons with higher education (ISCED 5-8) is only 25 percent (Figure 1). Figure 1: Non-response rate in the Finnish Labour Force Survey according to the level of education 45 40 35 30 25 20 15 ISCED 0-2 ISCED 3-4 ISCED 5-8 10 5 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Furthermore, education is highly correlated with labour market status, as the respondents with higher education have an employment rate of 73 % and those with least education only 31 % (Figure 2). The same pattern is repeated also with the status in job-seeker register as well as with migration status. Those who are the most frequent respondents, i.e., not registered job seekers and Finland-Swedes, also tend to have a better labour market position. Hence, as the more frequent respondents are overrepresented in our data, and as they often are in a better labour market position, there is a probability that we overestimate the number of employed persons in the population. Figure 2: Non-response rate and employment rate in the Finnish Labour Force Survey in 2017

2017 0 10 20 30 40 50 60 70 80 ISCED 0-2 ISCED 3-4 ISCED 5-8 registered job seeker not registered job seeker Foreign background Finnish Finland-swedish Non-response rate Employment rate (among the respondents) We see effect of this bias when comparing LFS data on educational attainment of the population to the data from the Finnish register of completed degrees (with very good coverage, expect for foreign degrees). As presented in the Figure 3, as the tertiary educational attainment of 30-34 years old population seems very high, and even increasing in LFS, the data from the register shows exactly the opposite. Figure 3: Tertiary educational attainment, % of population aged 30 34 47 46 45 44 43 42 41 40 LFS Register of completed degrees 39 38 37 36 2008 2009 2010 2011 2012 2013 2014 2015 2016 Due to these development in the non-response, and based on our previous good experiences in using the data on status in job seeker register, we started studying the possibility to add more socio-economic data to the calibration model.

Methodology To study the effect of bias and uncertainty caused by the non-response on the estimates of labour market status, we compare the estimation results with different calibration models using the Finnish Labour Force Survey data from March 2018. As a baseline, we present estimates with calibration using only the demographic variables (sex, age, region). After that, we add other socio-economic variables to the model (status in the job-seeker register, educational attainment, native/foreign origin, urban/rural). We will look at the effect of new proposed calibration on the level as well as on the precision (standard error) of the estimates. Seven different weighting schemes were constructed as follows: GREG_base: sex, 5-year age group and region (20 areas based on NUTS3) GREG_current: GREG_base + status in the unemployment register GREG1: GREG_base + level of education GREG2: GREG_base + origin GREG3: GREG_base + urban/rural GREG7: GREG_current + level of education GREG8: GREG_current + level of education + origins + urban/rural Results We will first present the technical and quality details of the alternative calibrated weights. After that, the impact of the different weights to the LFS headline indicators are presented. Impacts on weights The calibrated weights are obtained by the ETOS, a program developed by Statistics Sweden for calibration and GREG estimation. It is well known that one goal of calibration weighting system is to determine the calibrated weights as close as possible to the pre-calibrated weights. We evaluate the correlations between the pre-calibrated weights and alternative calibrated weights (see Table 1 below). Table 1: Correlations between the pre-calibrated weights and alternative calibrated weights for the March of 2018 Calibrated weights Pre-calibrated weights GREG_base 0.94303 GREG_current 0.91491 GREG1 0.86442 GREG2 0.88131

GREG3 0.92906 GREG7 0.83603 GREG8 0.80812 Table 1 indicates that the closeness between the pre-calibrated weights and alternative calibrated weights depends on the calibration variables which are chosen (e.g. level of education). It is well known, also, that another goal of calibration weighting system is to prevent negative and extreme weights. We made comparison of distributions for alternative calibrated weights (see Table 2 below). The results show that alternative calibration schemes do not lead to undesirable weights. So we can assume that categories of calibration variables are quite well specified.

Table 2: Distributions of alternative calibrated weights for the March of 2018 Weights Minimum Maximum Average Respondents GREG_base 151 1240 503.81 8181 GREG_current 152 1424 503.81 8181 GREG1 130 1411 503.81 8181 GREG2 142 1228 503.81 8181 GREG3 146 1301 503.81 8181 GREG7 135 1417 503.81 8181 GREG8 126 1450 503.81 8181 Estimates on employment Comparing the results on the estimates of employment we see that with only demographic data (greg_base), the estimate is as high as 2 506 000 persons (Figure 4). Controlling for either the status in the job-seeker register (greg_current) or the level of education (greg1) in addition to the demographic distribution, the estimate falls some 27 000 30 000 persons, that is, 0,7-0,9 percentage points of the employment rate. This means that the demographic data alone is not sufficient for controlling the bias caused by non-response and that the socio-economic auxiliary information, such as the level of education or status in unemployment register can help to correct for this bias. The effect of information on foreign/native background (greg2) or degree of urbanity of the place of residence (greg3) has somewhat smaller effect, dropping the estimate for employed persons with 3 000-13 000 persons (or 0,1-0,4 percentage points of the employment rate). By adding information on both the status in the job-seeker register and the level of education (greg7) to the calibration model, we see a decrease of 50 000 persons in the estimate on employed population. This means a 1,4 percentage points decrease in the employment rate. As the effect is larger than with either of the variables alone, we conclude that the variables are not entirely overlapping and hence including them both in the calibration model would be beneficial. The effect of adding further data on foreign background and urbanity (greg8) has a very limited influence to the estimate, but may have important implications within estimates on subpopulation such as the number of foreign born employed. These analyses remain, however, outside the scope of this paper. In addition to correcting for the bias caused by non-response, using socio-economic auxiliary information in the calibration model may help to increase the precision of the estimates. Using only demographic auxiliary data, the standard error for the estimate of employment is 18 538, but adding appropriate socio-economic data decreases the standard error to 16 500 16 700 (greg_current, greg7 and greg8). However, adding more data to the model may also increase the standard error, if the resulting population classes become too small. Figure 4a: Estimates on employed population (15-74-years) using alternative calibrated weights for March 2018

GREG_base: sex, 5-year age group and region (20 areas based on NUTS3) GREG_current: GREG_base + status in the unemployment register Employed population 15-74 2360 000 2400 000 2440 000 2480 000 2520 000 2560 000 GREG1: base + level of education GREG2: base + origins GREG3: base + urban/rural GREG7: current + level of education GREG8: current + level of education + origins + urban/rural Figure 4b: Estimates on the employment rate (15-64-years) using alternative calibrated weights for March 2018 GREG_base: sex, 5-year age group and region (20 areas based on NUTS3) GREG_current: GREG_base + status in the unemployment register Employment rate, 15-64 66,0 67,0 68,0 69,0 70,0 71,0 72,0 73,0 GREG1: base + level of education GREG2: base + origins GREG3: base + urban/rural GREG7: current + level of education GREG8: current + level of education + origins + urban/rural Estimates on unemployment The results for the estimates on (ILO-) unemployment are presented in the Figures 5a and 5b. As compared to the calibration model with only demographic variables (greg_base), adding auxiliary data on the status in

the job-seeker register (greg_current) increases the number of (ILO-) unemployed by 12 000 persons, or, 0,5 percentage points in the unemployment rate. Furthermore, the precision of the estimate is improved as the standard error decreases from 10 865 to 9 690. The current calibration variable is divided into 8 categories (e.g. three indicators according to the length of being unemployed in the register and four indicators which tell if the person in question is included in the unemployed job-seekers register). We have also tested the use of registered unemployment indicator classified in four categories (Male 15-24, Female 15-24, Male 25-74 and Female 25-74) and the estimation results show that, there are substantial gains in efficiency in these categories. In the contrary to the results on employment estimates, adding further socio-economic auxiliary data has little effect on the estimates on unemployment. The model with all tested auxiliary information (greg 8) is almost exactly the same as our current model (greg_current). Figure 5a: Estimates on unemployed population (15-74-years) using alternative calibrated weights for March 2018 GREG_base: sex, 5-year age group and region (20 areas based on NUTS3) GREG_current: GREG_base + status in the job-seeker register Unemployed persons, 15-74 GREG1: base + level of education - 100 000 200 000 300 000 GREG2: base + foreign origin GREG3: base + urban/rural GREG7: current + level of education GREG8: current + level of education + origins + urban/rural Figure 5a: Estimates on unemployment rate (15-74-years) using alternative calibrated weights for March 2018

GREG_base: sex, 5-year age group and region (20 areas based on NUTS3) GREG_current: GREG_base + status in the job-seeker register GREG1: base + level of education Unemployment rate, 15-74 - 2,0 4,0 6,0 8,0 10,0 12,0 GREG2: base + foreign origin GREG3: base + urban/rural GREG7: current + level of education GREG8: current + level of education + origins + urban/rural Estimates on the population not in the labour force persons The results on the estimates of population not in the labour force reflect the pattern observed already for the estimates on employment. With a calibration model using only the demographic variables (greg_base, Figure 6), the number of persons not in the labour force is underestimated by some 40 000 persons, or by 2,8 %, when compared to the model using all tested auxiliary information (greg8, greg7). Similarly as with the results on employment, we see that both the information on the status in the job-seeker register as well as the data on the level of education seem to have an independent effect on the bias, as using both variables in the model (greg 7) has a larger effect on the estimate than either variable alone (greg_current, greg1). Again, the data on foreign/native origin as well as the degree of urbanity of the area of residence seem to have a smaller effect than the education and job-seeker data. However, the effect of these auxiliary data should be reviewed separately for the estimates of the relevant sub-populations, which is outside the scope of this paper. Figure 6: Estimates on the population not in the labour force (15-74-years) using alternative calibrated weights for March 2018

GREG_base: sex, 5-year age group and region (20 areas based on NUTS3) GREG_current: GREG_base + status in the job-seeker register GREG1: base + level of education GREG2: base + foreign origin GREG3: base + urban/rural GREG7: current + level of education GREG8: current + level of education + origins + urban/rural Not in the labour force, 15-74 1320 000 1360 000 1400 000 1440 000 Discussion This paper presents the first results of the work of Statistics Finland on improving the estimation in the Labour Force Survey. We demonstrate that using appropriate socio-economic auxiliary data in the estimation process may significantly improve estimation by correcting the bias caused by non-response and by improving the precision of the estimates. As compared to the models using only demographic auxiliary data, our results on estimates using auxiliary socio-economic data show bias as large as 1,5 percentage points in the employment rate. Auxiliary data on the status in the job-seeker register was introduced to our calibration model already in 1997, as it was shown to improve precision and decrease bias (Djerf, 1997). Since then, we have witnessed a significant increase in the non-response. In this paper we show, that the current model is not up-to-date anymore and would benefit from adding of new auxiliary data. Based on our results, adding at least data on the level of education would be important, as the current model seems to overestimate the number of employed by some 20 000 persons (or 0,5 percentage points in the employment rate). Accordingly, the number of persons outside the labour force is underestimated by the corresponding amount. The effects on the estimates on unemployment are small, as our current model which already uses the job-seeker register data, seems to capture the non-response bias sufficiently. In this paper, we have shown the first results on the work aiming to the improvement of the estimation model. Further analyses on the effect of other auxiliary data, such as income, student status, or the age of the youngest child remain to be done. Also, we have presented results only on the headline indicators of the Labour Force Survey. Analyses on the effects of sub-populations (men/women, youth, elderly, foreignborn, highly/least educated, etc.), as well as on other indicators (NEET-rate, early leavers from education, working time, etc.) remain to be conducted. References Djerf, K. (1997). Effects of Post-Stratification on the Estimates of the Finnish Labour Force Survey. Journal of Official Statstics, Vol. 13(1), pp. 29-39.