Applications of Data Analysis (EC969) Simonetta Longhi and Alita Nandi (ISER) Contact: slonghi and

Similar documents
Lectures 04, 05, 06: Sample weights

EXAMPLE 6: WORKING WITH WEIGHTS AND COMPLEX SURVEY DESIGN

Survey Sampling, Fall, 2006, Columbia University Homework assignments (2 Sept 2006)

Description Quick start Menu Syntax Options Remarks and examples Acknowledgment Also see

Medical Expenditure Panel Survey. Household Component Statistical Estimation Issues. Copyright 2007, Steven R. Machlin,

The British Household Panel Survey (BHPS) and its successor, Understanding Society (US)

Chapter 6 Part 3 October 21, Bootstrapping

National Statistics Opinions and Lifestyle Survey Technical Report January 2013

CLS Cohort. Studies. Centre for Longitudinal. Studies CLS. Nonresponse Weight Adjustments Using Multiple Imputation for the UK Millennium Cohort Study

Final Quality report for the Swedish EU-SILC. The longitudinal component

Final Quality report for the Swedish EU-SILC. The longitudinal component. (Version 2)

Consumer Research: overdrafts and APR. Technical Report. December 2018

Reminders. Quiz today - please bring a calculator I ll post the next HW by Saturday (last HW!)

Changes to work and income around state pension age

Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse 1

9. Methodology Shaun Scholes National Centre for Social Research Kate Cox National Centre for Social Research

PART B Details of ICT collections

The Serbia 2013 Enterprise Surveys Data Set

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

The Armenia 2013 Enterprise Surveys Data Set

The Macedonia 2013 Enterprise Surveys Data Set

New SAS Procedures for Analysis of Sample Survey Data

South Africa - National Income Dynamics Study , Wave 2

NBER WORKING PAPER SERIES MAKING SENSE OF THE LABOR MARKET HEIGHT PREMIUM: EVIDENCE FROM THE BRITISH HOUSEHOLD PANEL SURVEY

Your State Pension Choice Pension now or extra pension later: A guide to State Pension Deferral

National Statistics Opinions and Lifestyle Survey Technical Report. February 2013

CYPRUS FINAL QUALITY REPORT

You created this PDF from an application that is not licensed to print to novapdf printer (

Gamma Distribution Fitting

Community Survey on ICT usage in households and by individuals 2010 Metadata / Quality report

DE CHAZAL DU MEE BUSINESS SCHOOL AUGUST 2003 MOCK EXAMINATIONS STA 105-M (BASIC STATISTICS) READ THE INSTRUCTIONS BELOW VERY CAREFULLY.

The following content is provided under a Creative Commons license. Your support

Guide to managing your workforce

Intermediate Quality Report for the Swedish EU-SILC, The 2007 cross-sectional component

CYPRUS FINAL QUALITY REPORT

CYPRUS FINAL QUALITY REPORT

Stratification Analysis. Summarizing an Output Variable by a Grouping Input Variable

Claim form for Winter Fuel Payment for past winters 1998/99, 1999/00, 2000/01, 2001/02, 2002/03 and 2003/04

Universal Credit: further information for families

Lab#3 Probability

Incorporating a Finite Population Correction into the Variance Estimation of a National Business Survey

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

LIS Self Teaching Package Stata version. Part II: Gender, employment, and wages

Sampling & Confidence Intervals

Chapter 23: accuracy of averages

Supplementary materials

Internet use and attitudes Metrics Bulletin

Conover Test of Variances (Simulation)

Planning Sample Size for Randomized Evaluations Esther Duflo J-PAL

The use of linked administrative data to tackle non response and attrition in longitudinal studies

WILL QUESTIONNAIRE. Section 1: Your details. Client 1 Client 2. Your title: Your full name (include middle names): Have you ever used any other names?

When to fill in form APSS227

The coverage of young children in demographic surveys

EXAMPLE 4: DISTRIBUTING HOUSEHOLD-LEVEL INFORMATION TO RESPONDENTS

User guide for employers not using our system for assessment

The American Panel Survey. Study Description and Technical Report Public Release 1 November 2013

New Construction Program Participating Owner Survey

Russia Longitudinal Monitoring Survey (RLMS) Sample Attrition, Replenishment, and Weighting in Rounds V-VII

Does Growth make us Happier? A New Look at the Easterlin Paradox

Allison notes there are two conditions for using fixed effects methods.

VALIDATING MORTALITY ASCERTAINMENT IN THE HEALTH AND RETIREMENT STUDY. November 3, David R. Weir Survey Research Center University of Michigan

STRATEGIES FOR THE ANALYSIS OF IMPUTED DATA IN A SAMPLE SURVEY

Final Quality Report for the Swedish EU-SILC

CSC Advanced Scientific Programming, Spring Descriptive Statistics

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

mpi A Stata command for the Alkire-Foster methodology Christoph Jindra 9 November 2015 OPHI Seminar Series - Michaelmas 2015

A Stratified Sampling Plan for Billing Accuracy in Healthcare Systems

SPSS I: Menu Basics Practice Exercises Target Software & Version: SPSS V Last Updated on January 17, 2007 Created by Jennifer Ortman

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

National Child Development Study and 1970 British Cohort Study Technical Report:

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

STEP Survey Weighting Procedures Summary (Based on The World Bank Weight Requirement) Lao PDR. October 11, 2013

The Ethiopia 2011 Enterprise Surveys Data Set

AMS7: WEEK 4. CLASS 3

1) The Effect of Recent Tax Changes on Taxable Income

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Lecture 1: Review and Exploratory Data Analysis (EDA)

Introduction to Descriptive Statistics

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

From the date of your certificate you will be legally recognised in your acquired gender.

hhid marst age1 age2 sex1 sex2

Guide for Investigators. The American Panel Survey (TAPS)

Appendix for Incidence, Salience and Spillovers: The Direct and Indirect Effects of Tax Credits on Wages

SIMULATION OF ELECTRICITY MARKETS

Name: 1. Use the data from the following table to answer the questions that follow: (10 points)

Two-Sample T-Test for Non-Inferiority

We will use an example which will result in a paired t test regarding the labor force participation rate for women in the 60 s and 70 s.

Design of a Multi-Stage Stratified Sample for Poverty and Welfare Monitoring with Multiple Objectives

B003 Applied Economics Exercises

Chapter 6 Part 6. Confidence Intervals chi square distribution binomial distribution

Final Exam - section 1. Thursday, December hours, 30 minutes

Problem max points points scored Total 120. Do all 6 problems.

QUALITY REPORT ESSPROS CORE SYSTEM MEMBER STATE: REFERENCE YEAR: 2015

Anomalies under Jackknife Variance Estimation Incorporating Rao-Shao Adjustment in the Medical Expenditure Panel Survey - Insurance Component 1

FINAL EXAM STAT 5201 Spring 2011

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

Increases to minimum contributions

Advancing Methodology on Measuring Asset Ownership from a Gender Perspective

Transcription:

Applications of Data Analysis (EC969) Simonetta Longhi and Alita Nandi (ISER) Contact: slonghi and anandi; @essex.ac.uk Week 2 Lecture 1: Sampling (I) Constructing Sampling distributions and estimating their characteristics Example: estimating mean number of children among women (II) Computing unbiased estimates with correct standard errors Example: estimating mean pay/wage in UK Input datasets: Week2Lecure1.dta Do file in Week2Lecure1_DoFile.pdf New commands ci, test, tabstat, aweight, pweight, ciplot, svyset, svydes, svy: mean, estat, 1 W e e k 2 L e c t u r e 1

(I) Constructing Sampling distributions and estimating their characteristics Example: estimating mean number of children among women Suppose there is a population of 6 women with children. The distribution of children is shown in Table 1. We are interested in estimating the average number of children of these women and so would like to draw a sample and estimate this number from that sample. In this exercise, we will compare the characteristics of the sampling distribution of the sample mean children of two different sampling plans or sample designs. In the first sampling plan we will draw a sample of 2 women from the population of 6 women. In the second one, we will draw a sample of 3 women. Table 1: Distribution of children across a population of 6 women Women Total no. of children 1 4 2 5 3 3 4 3 5 7 6 8 Total 30 Population Mean = 30/6 = 5 Sampling Plan 1 Table 2: All possible size 2 samples and corresponding sample averages Sample No. Women in the sample Mean number of children per woman in the sample (d) 1 1,2 4.5 2 1,3 3.5 3 1,4 3.5 4 1,5 5.5 5 1,6 6 6 2,3 4 7 2,4 4 8 2,5 6 9 2,6 6.5 10 3,4 3 11 3,5 5 12 3,6 5.5 13 4,5 5 14 4,6 5.5 15 5,6 7.5 2 W e e k 2 L e c t u r e 1

Table 3: Sampling distribution of d d Frequency (f) 3 1 3.5 2 4 2 4.5 1 5 2 5.5 3 6 2 6.5 1 7.5 1 Total 15 di fi i 1 Expected value of d, d 9 = 5 f 9 i 1 ( di i 1 Standard error of d = 9 9 i 1 i d ) f i 2 f i =1.21 Bias = 0 MSE=Bias + Sampling Variance = Bias + (Standard Error^2) = 0 + (1.21^2) = 1.47 RMSE=Square root of MSE = 1.21 Sampling Plan 2 Table 4: All possible size 3 samples and corresponding sample averages Fill in the blanks Sample No. Women in the sample Mean number of children per woman in the sample (d) 1 1,2,3 4.67 2 1,2,4 4.00 3 1,2,5 4.00 4 1,2,6 5.33 5 1,3,4 3.33 6 1,3,5 4.67 7 1,3,6 5.00 8 1,4,5 4.67 9 1,4,6 5.00 10 11 3 W e e k 2 L e c t u r e 1

12 13 14 15 16 17 18 19 20 Compute the sampling distribution of d Calculate the expected value and standard deviation of the sampling distribution of d Calculate the bias, standard error, MSE and RMSE of d How does this sampling plan 2 compare with sampling plan 1 in terms of bias and standard error of d? (II) Computing unbiased estimates with correct standard errors Example: estimating mean pay/wage in UK Input dataset: Week2Lecure1.dta Do file in Week2Lecure1_DoFile.pdf We have provided a dataset Week2Lecure1.dta from the 15 th wave (corresponds to year 2005) of the British Household Panel Survey (BHPS) see previous section on how to create it. Before doing any analysis and estimation using survey data ask yourself these questions: What is the population of interest, i.e., the population that you want the results of your analysis based on the survey sample to generalize to? What is the survey design? Specifically is it a clustered, stratified sample? And is there unequal selection probabilities? Is there nonresponse (the answer is almost always yes!)? Are weights provided which account for unequal selection probabilities and/or non-response? The BHPS The original BHPS is a clustered, stratified sample but with an almost equal probability sampling design. This was designed to be representative of Great Britain in 1990 south of the Caledonian Canal. So, the original sample included households in England, Wales and Scotland and they all had (almost) equal selection probabilities. In later waves over-samples or boosts from Wales, Scotland, and Northern Ireland were added to the original sample, i.e., proportion of households who are from Wales, Scotland, and Northern Ireland in the BHPS sample is much higher than in the UK population. In other words, sample units from the four countries had unequal selection probabilities. While the Scottish and Welsh boosts 4 W e e k 2 L e c t u r e 1

had a similar clustered, stratified design as the original sample, the Northern Ireland boost was a simple random sample. Variables are provided which help identify the primary sampling unit and strata from which the sample household was drawn. Weights are provided which account for the unequal selection probability and non-response. In this example we will only look at cross-sectional respondent weights, i.e., weights for respondents that account for unequal selection probability, non-response (at the household and individual level) and post-stratification. In other words, weighted estimates using cross-section respondent weight for wave 15 will provide unbiased estimates of the corresponding parameter for the UK population in 2005. The BHPS is conducted primarily by face-to-face interviews. Some respondents refuse to participate but opt for a telephone interview or a proxy interview (someone from their household answers on their behalf). Note respondent cross-sectional weight is zero for proxy and telephone respondents and missing for dead and out-of-scopes. Inspecting the data Open Week2Lecure1.dta First inspect the different variables, particularly find out which variables represent wages (wage), response (ivfio), weights (xrwtuk1), stratification (strata) and clustering (psu) variables. Let s see what these variables look like: What are the variable names, value labels, their mean, standard deviation, frequency distribution? Are there any missing values? What are the weights for proxy and telephone respondents? Examine weights and the sample identifier, memorig Look at distributions of cross-sectional respondent weight and see how they vary by sample. Notes: tabstat xrwtuk1, stat(mean min max sd) by(memorig) longstub nototal tabstat displays summary statistics for a series of numeric variables (specified right after this) in one table, possibly broken down by a second variable. stat() requires the list of statistics to be displayed for each of the variables specified by() requires variable by which the table is to be broken down. longstub specifies that the left stub of the table be made wider so that it can include names of the statistics or variables in addition to the categories of by(). nototal does not report overall statistics; use with by(). Why are the weights, on average, higher for the Original sample than for the country boosts? 5 W e e k 2 L e c t u r e 1

Examine the variable of interest, wage Why is this missing for some people? Hint: Use ivfio, employed Do weights matter: are weighted estimates likely to be different from unweighted estimates? Yes, if there is variation across observations in weights. This variation in weights is more prominent with the introduction of extension samples (Wales, etc) as there are large differences in selection probabilities. Stata allows for four types of weights: pweight, aweight, fweight and iweight. pweight & aweight are the ones that we will be using. See Stata Manual for more explanation. PWEIGHT are probability or sampling weights, i.e., it is the inverse of the probability that the observation is included in the sample. As the BHPS weights are probability weights the Stata weight command that we should ALWAYS use is pweight. However, Stata does not allow pweight for certain commands such as summarize, it only allows aweight (http://www.stata.com/support/faqs/stat/supweight.html) The estimated mean and standard deviation using pweight & aweight are the same, but not the standard error (& confidence interval). If Stata will not allow pweight and you have to use aweight be careful about its interpretation. aweight represents analytical weights which are inversely proportional to the variance of the observation. An example of analytical weights: If the observations are averages, then the number of observations used to compute the averages would be analytical weights. (See [U] 20.18 Weighted estimation in the Stata Manual) To compute mean and standard deviation use summarize, table, tabstat summ wage To compute the mean, standard error and confidence intervals use ci ci wage Compute unweighted mean, standard errors and confidence interval for wage. To compute weighted mean, standard errors, confidence interval and standard deviation for wage but without correcting for clustering and stratification, there are two options: First you could use summarize and ci with the option for weights. But for these commands Stata only allows you to use aweight option which means the weights will be treated as analytical weights. This will produce the weighted mean estimate as when using pweight which treats the weights as probability weights but will produce different estimates of 6 W e e k 2 L e c t u r e 1

standard errors. As the BHPS weights are not analytical weights but probability weights this is not the best choice (see note above and Stata Help). summ wage [aweight = xrwtuk1] ci wage [aweight = xrwtuk1] Compute weighted (but without correcting for clustering and stratification) mean, standard errors, confidence interval and standard deviation for wage using summarize and ci. Stata does not allow pweight with summarize and ci and if you do use Stata will give an error message and the program will stop running. The part in this box is optional To see what happens if you use pweight instead, type summ wage [pweight = xrwtuk1] To see what happens if you use pweight instead but to prevent Stata from stopping, type capture noisily summ wage [pweight = xrwtuk1] capture tells Stata to not show the error message and to continue running the program inspite of the error noisily tells Stata to show the output Together capture noisily asks Stata to show the error message but to continue running the program The second option is using Stata s svy suite of commands. Here you tell Stata what the survey design is and then Stata computes the correct estimates taking the survey design into account. The other advantage of using this option is that for this Stata does treat the weights as probability weights. As the BHPS weights are probability weights (as will be the weights in almost all such micro-panel surveys) this will produce the correct estimates of standard errors. To do this you need to first inform Stata about the survey design. For this part of the exercise, we will ignore the clustering and stratification aspect of the survey and just focus on the weights. svyset [pweight = xrwtuk1] And then to compute the weighted means, standard error and confidence intervals svy: mean wage If you want to produce estimates of the sample standard deviation estat sd svyset instructs Stata that the dataset is a complex survey data. The different features of this complex survey dataset are given by the commands pweight, strata and psu. All these options need not be specified. Once we have told Stata what the survey design is then whatever commands we type in the format svy: command, Stata will carry out the command after taking into account the structure of the dataset. 7 W e e k 2 L e c t u r e 1

estat displays scalar- and matrix-valued statistics after estimation; it complements predict, which calculates variables after estimation. Exactly what statistics estat can calculate depends on the previous estimation command. Compute weighted (but without correcting for clustering and stratification) mean, standard errors, confidence interval and standard deviation for wage, treating the BHPS weights as probability weights. What will happen if you use aweight instead of pweight in the above command? Next we now want to take into account the complete survey design, i.e., we want to compute estimates of mean, standard errors and confidence interval of wage that corrects for clustering and stratification in addition to unequal selection probability and non-response. To do this we will again need to use the svy suite of commands but this time specify the strata and psu variables in addition to the weight variable. To do this you need to first clear Stata s memory of any previous svy instructions svyset, clear Next inform Stata about the survey design variables svyset [pweight = xrwtuk1], psu(psu) strata(strata) Then compute the weighted means etc. as before svy: mean wage Compute weighted mean, standard errors, confidence interval and standard deviation for wage after correcting for clustering and stratification and treating the BHPS weights as probability weights. This returns mean income, but does not return standard error or confidence interval: Find out why? Hint: Use svydes which describes the structure of the survey data. svydes You will find that there is a stratum (-8) with just 1 unit (psu) within it. Which sample is that? tab memorig if strata==-8 NB The values of psu and strata for all cases in Northern Ireland is "-8" because the NI sample is a simple random sample, i.e., no clustering or stratification. Stata cannot compute correct standard errors if a part of the sample has a different sampling design. So, exclude the Northern Ireland sample from the analysis svy: mean wage if memorig ~= 7 8 W e e k 2 L e c t u r e 1

Computing estimates of mean wages in the different countries of UK Next, we would like to compare the average hourly wage of the four countries of UK and for that we would need to compute weighted mean wage for the different countries and test the difference. Look at distributions of cross-section respondent weights and see how these vary by the four countries There are two ways to compute the estimates for sub-samples: either use subpop or over commands. Using if statement to estimate weighted means of sub populations will result in incorrect standard error estimations (see Stata Survey Data Reference Manual Release 11, pp 53). To use subpop command option svy: mean var, subpop (varname) This asks Stata to compute estimates for the ONE single subpopulation identified by varname. The subpopulation is defined by the observations for which varname!=0. Typically, varname =1 defines the subpopulation, and varname =0 indicates observations not belonging to the subpopulation. For observations whose subpopulation status is uncertain, varname should be set to a missing value; such observations are dropped from the estimation sample. Alternatively, an if condition can be used with varname svy: mean var, subpop (if varname=x) To use over command option: svy: mean var, over (varname) This asks Stata to compute the estimates for ALL categorical values of the categorical variable varname. You can use more than one variable in varname, separated by space Now you have the tools to compute estimates of mean wage in each country. Remember to avoid complications because of Northern Ireland and missing region/country variables eliminate those cases from the sample. drop if memorig == 7 drop if country==. Compute the unweighted mean wage for each country Estimate mean wage for each country separately using the if statement option. What happens? Estimate mean wage for each country by using the svypop command option Estimate mean wage for the four countries by using the over command option Are the estimates obtained by the two methods (subpop and over) exactly the same? subpop and over can be used to compute estimates for multiple subpopulations. Here are a couple of tasks to illustrate that: Estimate mean wage for men and women in the four countries by using the over command option 9 W e e k 2 L e c t u r e 1

Estimate mean wage for men and women in the four countries by using the subpop command option You can also test the differences in these estimated means across the four countries. Note the test command should follow immediately after the estimation. svy: mean wage, over(country) test [wage]england = [wage]scotland = [wage]wales Test if men earn higher wages than women in England, Scotland and Wales. Design Effect (deff): Is the ratio of the variance of a statistic based on the actual sample design to the variance of this statistic had the sample design been a SRS (simple random sample) of the same size. In other words, it indicates by how much the variance is inflated due to the sampling design. deft is the square root of deff. Compute design effects, design factor and effective sample size: quietly svy: mean wage estat effects, deff deft [Optional] Plot the weighted mean and the confidence interval. ciplot shows means and confidence intervals. Means are shown by point symbols and intervals by capped bars. ci is used for the calculations. If it is not already installed then use findit to find it and then install it ciplot paygu, by(country) saving(graph1, replace) ciplot paygu [aw=xrwtuk1], by(country) saving(graph2, replace) 10 W e e k 2 L e c t u r e 1

[Optional] To estimate mean wage in all four countries of UK, i.e., including Northern Ireland We had asked you to drop Northern Ireland from the dataset because it has a different sample design and Stata cannot handle data with mixed sample design one part being clustered and stratified and the other a simple random sample. It has been suggested that Stata can be tricked into believing that the SRS part of the sample is also a clustered stratified sample simply by using the unique household identifier (whid) as the psu variable (wpsu). In other words, each such psu has just one household observation. Remember the current dataset does not include the Northern Ireland sub-sample as we have dropped it earlier.so, open Week2Lecure1.dta again and this time instead of dropping the Northern Ireland sub-sample, replace the value of wpsu for the Northern Ireland sub-sample with the whid. Then follow the earlier steps to estimate mean wage in all four countries. [Optional] How to create dataset Week2Lecure1.dta We have provided these datasets, but if you wanted to create these yourself, here is a guide to do that. See Week2_dataprep_DoFile.pdf which contains the corresponding do file for this. 1. Get information on strata and primary sampling unit from mhhsamp.dta (you can use any wave from onwards wave 12/wave L) 2. Get information about the household that was asked in the household questionnaire from mhhresp.dta: number of children of different ages in the household (use the same wave that you have used for step 1) 3. Get information about the individual that was asked in the individual questionnaire from mindresp.dta: individual level respondent weight, interview outcome, education, marital status, number of own children in the household, wages, work hours, employment status, health problem, region of residence (use the same wave that you have used for step 1) 4. Get some fixed info from xwavedat.dta: race, sex, sample origin and date of birth In addition to these variables always remember to include the appropriate unique identifiers in each of the datasets pid, hid & pno. As a general rule we have dropped the wave prefix for each of these files. This is because it makes it easier to use the same program for a different wave data. This is not necessary, just convenient. 5. Merge all these datasets sequentially. Finally, keep only those observations present in all datasets. Points to remember about merging: Datasets being merged should be sorted on the variable or variables that are being used to merge these Check _merge to see how many cases were available in both, how many in only one _merge is created by Stata at every merge and so if you don t drop _merge or rename it to something else after each merge, Stata will produce an error message saying _merge already exists and will not allow you to perform merge until you have dropped _merge or renamed it. 6. Create the following variables 11 W e e k 2 L e c t u r e 1

(i) Usual hourly wage rate (does NOT include overtime pay) using PAYGU (Usual gross pay per month: current job) and JBHRS (hours expected to work in a normal week). Drop cases that have negative values for PAYGU or JBHRS. Also drop cases for which JBHRS=0 which means no expected working hours (ii) Create a 0-1 dummy variable that takes on 1 if currently employed using JBHAS (did paid work last week) and JBOFF (no work last week but has job) (iii) Create a variable for number of young children (defined as being < 5years old) in the household (we are not differentiating between other s and own children) using NCH02 NCH34. There were some discrepancies in the data some households had more number of young children than NCHILD, the number of own children in the household. So, restrict the number of young children to the number of own children in the household (iv) Create 0-1 dummies for country of residence using REGION. Also create a categorical variable to represent the country of residence. (v) Create a 0-1 dummy variable that takes on the value of 1 if the person s ethnicity is white and 0 otherwise 7. Sample restriction for Week2Lecure1.dta (i) Restrict the sample to those who are not self-employed (ii) Drop those for whom if EMPLOYED is missing (iii) Drop when wage is missing even though the person is employed and interviewed face-toface 12 W e e k 2 L e c t u r e 1