Survey Methodology. - Lasse Sluth, - Søren Kühl,

Similar documents
Aspects of Sample Allocation in Business Surveys

Considerations for Sampling from a Skewed Population: Establishment Surveys

A Stratified Sampling Plan for Billing Accuracy in Healthcare Systems

PART B Details of ICT collections

Description of the Sample and Limitations of the Data

(iii) Under equal cluster sampling, show that ( ) notations. (d) Attempt any four of the following:

The Use of Administrative Data to Improve Quality of Business Statistics Concerning Micro-Enterprises.

Anomalies under Jackknife Variance Estimation Incorporating Rao-Shao Adjustment in the Medical Expenditure Panel Survey - Insurance Component 1

7 Construction of Survey Weights

PRESS RELEASE INCOME INEQUALITY

Sampling for the European Social Survey Round V: Principles and Requirements

Audit Sampling: Steering in the Right Direction

TRANSACTION- BASED PRICE INDICES

Lecture 22. Survey Sampling: an Overview

Lecture Neyman Allocation vs Proportional Allocation and Stratified Random Sampling vs Simple Random Sampling

Intermediate Quality Report for the Swedish EU-SILC, The 2007 cross-sectional component

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse 1

Ch4. Variance Reduction Techniques

FINAL EXAM STAT 5201 Fall 2017

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

VARIANCE ESTIMATION FROM CALIBRATED SAMPLES

Chapter 2 Uncertainty Analysis and Sampling Techniques

STATISTICS OF INCOME PARTNERSHIP STUDIES: EVALUATION OF THE REVISED SAMPLING PLAN

STA 220H1F LEC0201. Week 7: More Probability: Discrete Random Variables

Intermediate quality report EU-SILC The Netherlands

FINAL EXAM STAT 5201 Spring 2011

Comparison of OLS and LAD regression techniques for estimating beta

Russia Longitudinal Monitoring Survey (RLMS) Sample Attrition, Replenishment, and Weighting in Rounds V-VII

Use of Administrative Data in Statistics Canada s Business Surveys The Way Forward

Linear Regression with One Regressor

(ECB/2001/18) the Statute stipulates that the NCBs shall carry out, to the extent possible, the tasks described in Article 5.1.

Package optimstrat. September 10, 2018

Correcting for Coverage Errors in Enterprise Surveys A Register-based Approach Anders Wallgren, Britt Wallgren, Statistics Sweden

Stratified Sampling in Monte Carlo Simulation: Motivation, Design, and Sampling Error

A CLASS OF PRODUCT-TYPE EXPONENTIAL ESTIMATORS OF THE POPULATION MEAN IN SIMPLE RANDOM SAMPLING SCHEME

DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 2003 MOCK EXAMINATIONS STA 105-M (BASIC STATISTICS) READ THE INSTRUCTIONS BELOW VERY CAREFULLY.

The American Panel Survey. Study Description and Technical Report Public Release 1 November 2013

Final Quality report for the Swedish EU-SILC. The longitudinal component

FINAL QUALITY REPORT EU-SILC

Efficiency and Distribution of Variance of the CPS Estimate of Month-to-Month Change

Final Quality report for the Swedish EU-SILC. The longitudinal component. (Version 2)

Precision Requirements in SASU

This document is meant purely as a documentation tool and the institutions do not assume any liability for its contents

Community Survey on ICT usage in households and by individuals 2010 Metadata / Quality report

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Survey conducted by GfK On behalf of the Directorate General for Economic and Financial Affairs (DG ECFIN)

Calibration Estimation under Non-response and Missing Values in Auxiliary Information

Monte Carlo Methods for Uncertainty Quantification

Calibration Approach Separate Ratio Estimator for Population Mean in Stratified Sampling

Survey conducted by GfK On behalf of the Directorate General for Economic and Financial Affairs (DG ECFIN)

NATIONAL EMPLOYMENT AND SOCIAL OFFICE. QUALITY REPORT on the Structure of Earnings Survey 2006 in Hungary

An Imputation Model for Dropouts in Unemployment Data

Econ 300: Quantitative Methods in Economics. 11th Class 10/19/09

Ratio-cum-product and dual to ratio-cum-product estimators

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Response Mode and Bias Analysis in the IRS Individual Taxpayer Burden Survey

Small Area Estimation for Government Surveys

The Serbia 2013 Enterprise Surveys Data Set

Statistics for Business and Economics

GTSS. Global Adult Tobacco Survey (GATS) Sample Weights Manual

Improving Timeliness and Quality of SILC Data through Sampling Design, Weighting and Variance Estimation

LOCALLY ADMINISTERED SALES AND USE TAXES A REPORT PREPARED FOR THE INSTITUTE FOR PROFESSIONALS IN TAXATION

Guidelines on Statistical Business Registers

European Social Survey ESS 2012 Documentation of the Spanish sampling procedure

CLUSTER SAMPLING. 1 Estimation of a Population Mean and Total. 1.1 Notations. 1.2 Estimators. STAT 631 Survey Sampling Fall 2003

North West Los Angeles Average Price of Coffee in Licensed Establishments

Determination of the Optimal Stratum Boundaries in the Monthly Retail Trade Survey in the Croatian Bureau of Statistics

Current Population Survey (CPS)

EMPLOYMENT AND EARNINGS

COMPARISON OF RATIO ESTIMATORS WITH TWO AUXILIARY VARIABLES K. RANGA RAO. College of Dairy Technology, SPVNR TSU VAFS, Kamareddy, Telangana, India

Statistical Evidence and Inference

The Armenia 2013 Enterprise Surveys Data Set

The Macedonia 2013 Enterprise Surveys Data Set

The Importance (or Non-Importance) of Distributional Assumptions in Monte Carlo Models of Saving. James P. Dow, Jr.

Fitting financial time series returns distributions: a mixture normality approach

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

PRMIA Exam 8002 PRM Certification - Exam II: Mathematical Foundations of Risk Measurement Version: 6.0 [ Total Questions: 132 ]

Statistics New Zealand - Te Tari Tatau. Article: Changes to the Quarterly Wholesale Trade Survey

Modified ratio estimators of population mean using linear combination of co-efficient of skewness and quartile deviation

Ralph S. Woodruff, Bureau of the Census

The Baumol-Tobin and the Tobin Mean-Variance Models of the Demand

Healthy Incentives Pilot (HIP) Interim Report

Quality Report. The Labour Cost Survey Norway

Calibration approach estimators in stratified sampling

The Simple Regression Model

New SAS Procedures for Analysis of Sample Survey Data

Examining the Revisions in Monthly Retail and Wholesale Trade Surveys Under a Rotating Panel Design

Available online at (Elixir International Journal) Statistics. Elixir Statistics 44 (2012)

MANAGEMENT SCIENCE doi /mnsc ec

The Ethiopia 2011 Enterprise Surveys Data Set

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Session 7 Eurostat 2017 SBR User Survey

January 2015 Data Release

Nepal Living Standards Survey III 2010 Sampling design and implementation

CSR disclosure in Real Estate and Property sector s companies Listed at Indonesian Stock Exchange (IDX)

Applications of Data Analysis (EC969) Simonetta Longhi and Alita Nandi (ISER) Contact: slonghi and

The Simple Regression Model

REGRESSION WEIGHTING METHODS FOR SIPP DATA

Introduction to Survey Weights for National Adult Tobacco Survey. Sean Hu, MD., MS., DrPH. Office on Smoking and Health

Probability and Sampling Distributions Random variables. Section 4.3 (Continued)

Transcription:

Survey Methodology - Lasse Sluth, lbs@dst.dk - Søren Kühl, ska@dst.dk

Contents Populations Stratification and allocation Rotating panel designs Estimation Quality indicators 2

Contents Populations Stratification and allocation Rotating panel design Estimation Quality indicators 3

The different populations Target population is the population the survey seeks to provide information about eg all enterprises in the Retail industry with an annual turnover of more than 350k Euro Sampling Frame is the population from where the sample is drawn eg a subset of a Statistical Business Register (SBR) 4

Frame imperfections Coverage problems, both underand overcoverage Misclassifications Time-lags Variables that could be used for size-measures? eg. employment and/or turnover 5

Illustration of Coverage errors Sample Sample Frame Target population If the sample is probalistic and representative then it s possible to asses the overcoverage Undercoverage is a more serious problem Both will lead to bias if not attended to 6

From Danish RTI Target population: Retail trade enterprises with more than 2.500.000 DKK yearly turnover. Frame: From SBR all active retail trade enterprises are drawn with monthly turnover from VAT. Yearly turnover is grossed up (missing months) Yearly turnover is used for cut-off (and later for stratification) 7

From Danish RTI Coverage issues: SBR is synchronized with CBR (Central Business Register) Enterprises can change their code of activity in CBR - > SBR Misclassifications Cause both overcoverage and undercoverage Can cause bias in the estimation Time-lag from time of drawing the frame to actual use: Births and deaths Corrections (activity, turnover) 8

Contents Populations Stratification and allocation Rotating panel design Estimation Quality indicators 9

Stratification and allocation Focus will be on the most commonly used design in business surveys - stratified simple random sampling (STSI) Use classification variables (eg. size and industry) to divide the frame into disjoint subsets called strata Choose a simple random sample from each strata Given a total sample size n - How large should the sample size in each strata be? 10

Stratification and allocation Proportional allocation: The sample size n is distributed among strata propotional with the number of enterprises in the H strata: 11

Stratification and allocation Optimal allocation (Neymann allocation). The sample size n is distributed among strata proportional with both the number of enterprises and standard deviation of the strata 12

Stratification and allocation Why is it called optimal? -because it minimizes the expected sample variance. An important quality indicator A good stratification ensures that the enterprises respond as homogenousley as possible within strata, such that most of the variance occurs between - and not with in - strata. 13

Stratification and allocation The probability that an enterprise will be included in the sample is called the inclusion probability In take-all strata the inclusion probability is 1 for all enterprises In other strata it will be If the allocation is done proportional all enterprises will have the same inclusion probability. This is called a balanced design 14

Stratification and allocation In Statistics Denmark allocation is done by weighting different allocations. More than one variable a trade off Domains and total a trade off Stratification is done by employment or turnover Analysis of where to select all Gradual transition based on more years The gradual transition is important. Otherwise outliers and external events can skew the allocation. 15

From Danish RTI Sample is renewed yearly by methodology dep. Stratification Stratum 0 1 2 3 4 Yearly turnover (mill. DKK/year) <2.5 2.5-5 5-10 10-20 >20 There is an alternative strata 1 (1-5 mill. DKK/year) for certain industries. 16

From Danish RTI Sample allocation: Stratum 0 1 2 3 4 Coverage (% by number of ent.) 0 15-35 15-55 40-100 100 Total coverage is about 35% (3.500 out of 10.000) Weighted coverage is larger than 80% 17

Contents Populations Stratification and allocation Rotating panel design Estimation Quality indicators 18

Rotating panel design A rotating panel design is a design, where you reselect a pre-determined proportion of the sample from the previous period of time. Panels are used to optimize estimation of changes, reduce sampling costs and reduce response burdens The challenge in having such a design is to keep the sample representative. 19

Rotating panel designs 20

Rotating panel design Such a rotation scheme is only unbiased if the frame is constant over time. There are several methods to account for this. One of them is called the permanent random number technique and has recently been implemented in danish RTI. 21

From Danish RTI Rotation: All enterprises get a random number between 0 and 1 And they get a random rotation group number (1,2 or 3) When sampling in a stratum, all enterprises with lower than the inclusion probability of that given stratum are included in the sample Every year the enterprises of one rotation group (starting with group 1) have 1/3 added to. New enterprises get a random and rotation group number. 22

From Danish RTI Consequences: In strata with lower than 1/3, enterprises are in the sample for 3 years and then out for at least 6 If is between 1/3 and 2/3, enterprises are in the sample between 3 and 6 six years and out for at least 3. If is larger than 2/3, enterprises are in for more than 6 years but still out for at least 3. 23

Contents Populations Stratification and allocation Rotating panel design Estimation Quality indicators 24

Estimation The usual technique for estimating a population total consists in summing appropiately weigthed variable values for the responding enterprises in a sample Different weigthing systems can come into consideration 25

Estimation One can use the design weights, given by inverting the inclusion probabilities. This gives the Horwitz-Thompson (HT) estimator. With a STSI-design every enterprise have equal inclusion probabilities within strata, namely n h /N h. If the variable of interest is denoted y, the estimator is given by: Which is unbiased if all sampled enterprises respond 26

Estimation Suppose you have an auxiliary variable, x, which is known for every enterprise in the frame then the ratio estimator is given by This leads to a weighting scheme, where each design weight is adjusted with an equal factor. It is possible to do the ratio adjustment per strata or industry. This is often the best solution and is called the seperate ratio estimator This estimator is only effective if the auxiliary variable is correlated with the variable of interest eg VAT reports from administrative sources and retail trade turnover 27

Estimation The ratio estimator is a popular choice in business surveys. It is easy to calculate If the correlation between the auxiliary variable and the variable of interest is strong it reduces the sample variance In case of non-response it decreases bias if the auxiliary variable is correlated with the response probability 28

Estimation In the case of non-response bias will occur, espicially if the respone probability is correlated with the variable of interest -eg enterprises with decreasing (or a small amount of) turnover might not find the time to answer the survey In that case you have to adjust the weights for non-response. 29

Estimation A simple solution is assuming that every enterprise responds with the same probabilty within strata, and thus modify the design weights proportionally. More advanced methods use auxiliary information to model the response probalities, and then adjust the design weights accordingly. The ratio estimator is a simple example of this. 30

From Danish RTI Ratio estimate with grossing up population updated quarterly. VAT turnover as auxiliary variable Ratio estimation by industries (not strata) Imputation only for special units 31

From Danish RTI Month-to-month chain-linking Re-estimation of previous month Same grossing up population Calculation of monthly growth rate Eliminates impact of structural changes (in frame, grossing up population, sample) 32

Contents Populations Stratification and allocation Rotating panel design Estimation Quality indicators 33

Quality indicators There exists a vast number of quality indicators for surveys. Some of them are simple, such as response rate or weighted response rate 34

Quality indicators Focus will be on 2 important indicators Sampling error Mean square error 35

Quality indicators If the design is unstratified simple random sampling (SRS) and the estimator is the HT-estimator. Then the sampling error is estimated by: 1 36

Quality indicators The quotient between the sampling error and the estimated total is called the coefficient of variation (CV) In take-all strata there is no sampling error The standard deviations are the only stochastic elements under a fixed design and allocation The ratio estimator reduces the standard deviations 37

Quality indicators The sampling error doesn t account for bias. In fact an estimtate corrected for non-response bias will typically have a higher sampling error than the uncorrected biased estimate The mean square error is defined as: The magnitude of bias will often be a jugdement call. 38

Quality indicators Many more quality indicators are available through ESS-guidelines 39