Session 1B: Exercises on Simple Random Sampling Please join Channel 41 National Council for Applied Economic Research Sistemas Integrales Delhi, March 18, 2013
We will now address some issues about Simple Random Sampling You will find the answers using in your computer the simulator Juan showed in the previous session Open the Excel book Lesson1, enable macros and wait for the questions If you don t have a computer, sit next to a colleague and observe attentively Please join Channel 41 2
Warming up Let us reproduce Juan s experiment Open the Lesson1 workbook with the following initial parameters: Population size: 1,000 electors (N=1,000) Prevalence of Green: (52%) Sample size: 100 electors (n=100) What is the standard error? (Remember that the standard error is the limit value of the root mean square error) 3
N=1,000 n=100 P=52% What is the standard error? 1. About 47 % 2. About 4.7 % 3. About 0.47 % 4. About 0.047 % 5. None of the above 0% 0% 0% 0% 0% 1 2 3 4 5
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 5
Effect of the population size Remember that in Juan s example N=1,000, n=100, P=52% e=4.73% Now suppose that the town is twice the size of Juan s town, but our budget does not permit a bigger sample N=2,000, n=100, P=52% What would be the standard error? Then suppose that the town is a lot bigger that Juan s, but we still cannot afford a bigger sample N=5,000, n=100, P=52% What would be the standard error? Then suppose that the town one half the size of than Juan s N=500, n=100, P=52% What would be the standard error? 6
If the town is twice the size of Juan s town N=2,000 n=100 P=52% What is the standard error? 1. About twice as much as in Juan s town (4.73% x 2 = 9.46%) 2. A little more than in Juan s town 3. The same as in Juan s town (4.73%) 0% 0% 0% 1 2 3
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 8
If the town is a lot bigger than Juan s town N=5,000 n=100 P=52% What is the standard error? 1. About five times as much as in Juan s town (4.73% x 5 = 23.6%) 2. A little more than in Juan s town 3. The same as in Juan s town (4.73%) 0% 0% 0% 1 2 3
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 10
If the town is one half the size of Juan s town N=500 n=100 P=52% What is the standard error? 1. About one half as much as in Juan s town (4.73% / 2 = 2.36%) 2. A little less than in Juan s town 3. The same as in Juan s town (4.73%) 0% 0% 0% 1 2 3
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 12
Effect of the population size Conclusion: The size of the population has very little influence on the precision of a sample of a given size Explore at home the case of much smaller towns 13
Effect of the sample size Remember that in Juan s example N=1,000, n=100, P=52% 4.73% Now suppose we could duplicate the sample size N=1,000, n=200, P=52% What would be the standard error? Then suppose that we had to reduce the sample size to one half N=1,000, n=50, P=52% What would be the standard error? 14
If the sample is twice the size of Juan s N=1,000 n=200 P=52% What is the standard error? 1. About the same as with Juan s sample (4.73%) 2. About half of with Juan s sample (4.73% / 2 = 2.36%) 3. Less than with Juan s sample but more than one half 4. More than with Juan s sample 0% 0% 0% 0% 1 2 3 4
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 16
If the sample is one half the size of Juan s N=1,000 n=50 P=52% What is the standard error? 1. About the same as with Juan s sample (4.73%) 2. About twice of with Juan s sample (4.73% x 2 = 9.46%) 3. More than with Juan s sample but less than twice 4. Less than with Juan s sample 0% 0% 0% 0% 1 2 3 4
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 18
Conclusion: Effect of the sample size The error is reduced when the sample size is increased, but it is not inversely proportional to the sample size It is inversely proportional to the square root of the sample size 19
Effect of the prevalence Remember that in Juan s example N=1,000, n=100, P=52% e=4.73% Let us find the standard error for other prevalences N=1,000, n=100, P=40% N=1,000, n=100, P=25% N=1,000, n=100, P=75% N=1,000, n=100, P=10% N=1,000, n=100, P=1% 20
N=1,000, n=100, P=40% What is the standard error? 1. The same as in Juan s example (4.73%) 2. A little less in Juan s example (more than 4%) 3. A lot less in Juan s example (less that 4%) 4. More than in Juan s example 0% 0% 0% 0% 1 2 3 4
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 22
N=1,000, n=100, P=25% What is the standard error? 1. The same as in Juan s example (4.73%) 2. A little less than in Juan s example (more than 4%) 3. A lot less than in Juan s example (less that 4%) 4. More than in Juan s example 0% 0% 0% 0% 1 2 3 4
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 1,000 100 25 % 4.11 % 24
N=1,000, n=100, P=75% What is the standard error? 1. The same as in Juan s example (4.73%) 2. A little more than in Juan s example (less than 5%) 3. A lot more than in Juan s example (more than 5%) 4. Less than in Juan s example 0% 0% 0% 0% 1 2 3 4
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 1,000 100 25 % 4.11 % 1,000 100 75 % 4.11 % e = 1 n N P(1 P) n 26
N=1,000, n=100, P=10% What is the standard error? 1. The same as in Juan s example (4.73%) 2. A little less than in Juan s example (more than 3%) 3. A lot less than in Juan s example (less that 3%) 4. More than in Juan s example 0% 0% 0% 0% 1 2 3 4
Population size Sample size Prevalence Standard error N N P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 1,000 100 25 % 4.11 % 1,000 100 75 % 4.11 % 1,000 100 10 % 2.85 % 28
N=1,000, n=100, P=1% What is the standard error? 1. The same as in Juan s example (4.73%) 2. Less than in Juan s example but more than 2% 3. Less that 2% 4. More than in Juan s example 0% 0% 0% 0% 1 2 3 4
Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 1,000 100 25 % 4.11 % 1,000 100 75 % 4.11 % 1,000 100 10 % 2.85 % 1,000 100 1 % 0.94 % 30
Population size Sample size Prevalence Standard error Relative error N n P e e/p 1,000 100 52 % 4.74 % 9 % 2,000 100 52 % 4.87 % 9 % 5,000 100 52 % 4.96 % 10 % 500 100 52 % 4.47 % 9 % 1,000 200 52 % 3.16 % 6 % 1,000 50 52 % 6.89 % 13 % 1,000 100 40 % 4.65 % 12 % 1,000 100 25 % 4.11 % 16 % 1,000 100 75 % 4.11 % 8 % e = 1 n N P(1 P) n 1,000 100 10 % 2.85 % 28 % 1,000 100 1 % 0.94 % 94 % 31
P(1 P) 0.50 0.40 Effect of the prevalence The maximum is flat: Error does not change much between P=0.2 and P=0.8 Error is maximum when P=0.5 0.30 0.20 0.10 When P goes down, absolute error goes down too, but relative error grows 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 P 32
Effect of the prevalence Conclusions: Error is maximum when the prevalence is 50% The maximum is flat: If the prevalence is neither too small nor too large, the standard error is close to the maximum If the prevalence is very low, The standard error goes down But the relative standard error goes up This is a problem and a limitation of Simple Random Sampling for the study of rare events (disability, unemployment, ) Juan will tell us how to deal with this in the next session 33
Summary and conclusions Population size Sample size Prevalence Standard error Relative error N n P e e/p 1,000 100 52 % 4.74 % 9.11 % 2,000 100 52 % 4.87 % 9.36 % 5,000 100 52 % 4.96 % 9.51 % 500 100 52 % 4.47 % 8.59 % 1,000 200 52 % 3.16 % 6.08 % 1,000 50 52 % 6.89 % 13.24 % 1,000 100 40 % 4.65 % 11.62 % 1,000 100 25 % 4.11 % 16.43% 1,000 100 75 % 4.11 % 7.75 % 1,000 100 10 % 2.85 % 28.46 % 1,000 100 1 % 0.94 % 94.34% Population size doesn t matter much Sample size matters, but can be expensive Prevalence only matters when it is very small Error is maximum when P=50% 34
Switching to organic cotton farming
Switching to organic cotton farming The ACFAP (Association of Cotton Farmers of Andhra Pradesh) wants to know which percent of its members would be willing to switch to organic farming The association has a database with the names and phone numbers of its members A telephone survey is proposed, but calling all members would be too costly and time consuming How could we help? How big a sample do we need?
What do we need to solve this problem? 1. A sampling strategy 2. A sample frame 3. A margin of error and a confidence level 4. The prevalence 5. The total number of cotton farmers in the association 0% 0% 0% 0% 0% 1 2 3 4 5
What do we need? A checklist: A sampling strategy and sample frame Simple random sampling from ACFAP s database Margin of error and confidence level For instance, 5 percent points at the 95% confidence level Guess what the prevalence might be If we have no clue, we put ourselves in the worst case scenario: 50 percent Total number of number of ACFAP members It is 10,356, but we don t really need this. We can also put ourselves in the worst case scenario:
e = 1 n N P(1 P) For an infinite population (N= ) e = n = n P(1 P) n P(1 P) e 2 For a maximum error E at a given confidence level α n = t 2 P(1 P) E 2 Of course, we also need some formulas, but remember that, in sampling, insights are much more important than formulas For a population size N n N = n 1 + n N With t 95% =1.96, t 99% =2.58, etc.
For an infinite population n = t 2 α P(1 E 2 P) Confidence level α = 95% t α = 1.96 Prevalence P = 50 percent P = 0.5 Maximum error E = 5 percent E = 0.05 n = 1.96 2 0.5 (1 0.5) 2 0.05?
How many farmers do we need to call? N = 1.96 2 0.5 (1-0.5) / 0.05 2 =? 1. 384 farmers 2. 3,842 farmers 3. 10,356 farmers 4. 100 farmers 5. None of the above 0% 0% 0% 0% 0% 1 2 3 4 5
If we wanted to account for the actual number of ACFAP members n n = N 1+ n / N n = 384 N = 10,356 n = 384 1+ 384 /10,356?
If we accounted for the actual number of ACFAP members, how many farmers do we need to call? n N = 384 / ( 1 + 384 / 10,356 ) =? 1. 384 members 2. 370 members 3. 10,356 members 4. 399 members 5. None of the above 0% 0% 0% 0% 0% 1 2 3 4 5