Session 1B: Exercises on Simple Random Sampling

Similar documents
19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

The Assumption(s) of Normality

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

Chapter 8 Statistical Intervals for a Single Sample

Numerical Descriptive Measures. Measures of Center: Mean and Median

ECON Microeconomics II IRYNA DUDNYK. Auctions.

We use probability distributions to represent the distribution of a discrete random variable.

ECO155L19.doc 1 OKAY SO WHAT WE WANT TO DO IS WE WANT TO DISTINGUISH BETWEEN NOMINAL AND REAL GROSS DOMESTIC PRODUCT. WE SORT OF

Finance Mathematics. Part 1: Terms and their meaning.

Chapter 18: The Correlational Procedures

Statistical Intervals (One sample) (Chs )

How to Invest in the Real Estate Market

SUPPLEMENTARY LESSON 1 DISCOVER HOW THE WORLD REALLY WORKS ASX Schools Sharemarket Game THE ASX CHARTS

A Formula for Annuities

FOREX LEARNING BY MADIBA MALEBO

Risk-Based Performance Attribution

2.0. Learning to Profit from Futures Trading with an Unfair Advantage! Income Generating Strategies Essential Trading Tips & Market Insights

Club Accounts - David Wilson Question 6.

2015 Performance Report

Free signal generator for traders

Life Insurance Buyer s Guide

ExcelBasics.pdf. Here is the URL for a very good website about Excel basics including the material covered in this primer.

STAT Chapter 7: Confidence Intervals

``Liquidity requirements, liquidity choice and financial stability by Diamond and Kashyap. Discussant: Annette Vissing-Jorgensen, UC Berkeley

Complete the statements to work out the rules of negatives:

SNIDER

Continuing Divergence How to trade it and how to manage it Vladimir Ribakov s Divergence University

Lesson Exponential Models & Logarithms

1. Consider the aggregate production functions for Wisconsin and Minnesota: Production Function for Wisconsin

5-1 pg ,4,5, EOO,39,47,50,53, pg ,5,9,13,17,19,21,22,25,30,31,32, pg.269 1,29,13,16,17,19,20,25,26,28,31,33,38

Running Manager Level Reports

Chapter 6. Stock Valuation

The Boomerang. Introduction. NipThePips Trading Method

Corporate Finance, Module 3: Common Stock Valuation. Illustrative Test Questions and Practice Problems. (The attached PDF file has better formatting.

Discrete Mathematics for CS Spring 2008 David Wagner Final Exam

Math 140 Introductory Statistics

Asset Lending. Hard Money ASSET LENDING OR HARD MONEY

Chapter 17. The. Value Example. The Standard Error. Example The Short Cut. Classifying and Counting. Chapter 17. The.

Invest now or temporarily hold your cash?

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

Algebra 2: Lesson 11-9 Calculating Monthly Payments. Learning Goal: 1) How do we determine a monthly payment for a loan using any given formula?

HOW YOU CAN INVEST YOUR MONEY IN TODAY S MARKET THROUGH PRIVATE MONEY LENDING

Chapter 23: accuracy of averages

2015 Performance Report Forex End Of Day Signals Set & Forget Forex Signals

Decision Trees: Booths

Exploiting the Inefficiencies of Leveraged ETFs

Intro to Trading Volatility

Tests for the Odds Ratio in a Matched Case-Control Design with a Binary X

The Two-Sample Independent Sample t Test

Zacks Method for Trading: Home Study Course Workbook. Disclaimer. Disclaimer

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Risk Disclosure and Liability Disclaimer:

NAME: Econ 302 Mid-term 3

***SECTION 8.1*** The Binomial Distributions

Cost Benefit Analysis, G ch. 8 Estimate a project's net benefit ($ value) for a community. Net benefit = user benefit + indirect benefit cost. 1.

Let me turn it over now and kind of get the one of the questions that s burning in all of our minds is about Social Security and what can we expect.

What the Affordable Care Act means for you

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

APPM 2360 Project 1. Due: Friday October 6 BEFORE 5 P.M.

A useful modeling tricks.

Machine Learning (CSE 446): Pratical issues: optimization and learning

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Chapter 5. Finance 300 David Moore

Non-Inferiority Tests for Two Means in a 2x2 Cross-Over Design using Differences

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Follow Price Action Trends By Laurentiu Damir Copyright 2012 Laurentiu Damir

The Double in a Day Forex trading Technique

FORECASTING & BUDGETING

Pro Strategies Help Manual / User Guide: Last Updated March 2017

Synthetic Positions. OptionsUniversity TM. Synthetic Positions

Game Theory and Economics Prof. Dr. Debarshi Das Department of Humanities and Social Sciences Indian Institute of Technology, Guwahati

Math 361. Day 8 Binomial Random Variables pages 27 and 28 Inv Do you have ESP? Inv. 1.3 Tim or Bob?

1. f(x) = x2 + x 12 x 2 4 Let s run through the steps.

18.440: Lecture 35 Martingales and the optional stopping theorem

300 total 50 left handed right handed = 250

SA2 Unit 4 Investigating Exponentials in Context Classwork A. Double Your Money. 2. Let x be the number of assignments completed. Complete the table.

Benchmarking. Club Fund. We like to think about being in an investment club as a group of people running a little business.

Intro to GLM Day 2: GLM and Maximum Likelihood

Section 0: Introduction and Review of Basic Concepts

An Introduction to the Mathematics of Finance. Basu, Goodman, Stampfli

Factors of 10 = = 2 5 Possible pairs of factors:

BUYING YOUR FIRST HOME: THREE STEPS TO SUCCESSFUL MORTGAGE SHOPPING MORTGAGES

Tests for One Variance

Why casino executives fight mathematical gambling systems. Casino Gambling Software: Baccarat, Blackjack, Roulette, Craps, Systems, Basic Strategy

Yosemite Trip Participants

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Christiano 362, Winter 2006 Lecture #3: More on Exchange Rates More on the idea that exchange rates move around a lot.

1. Confidence Intervals (cont.)

Professor Scholz Posted March 1, 2006 Brief Answers for Economics 441, Problem Set #2 Due in class, March 8, 2006

TIM 50 Fall 2011 Notes on Cash Flows and Rate of Return

Swing TradING CHAPTER 2. OPTIONS TR ADING STR ATEGIES

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Perspectives on Stochastic Modeling

Allstate Agency Value Index 2011 Year Review

Business Statistics 41000: Probability 3

Problem Set 1 (Part 2): Suggested Solutions

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

A better approach to Roth conversions

Remember, your job is to manage working capital (cash) and ensure your company has enough of it to grow and to weather economic downturns.

Transcription:

Session 1B: Exercises on Simple Random Sampling Please join Channel 41 National Council for Applied Economic Research Sistemas Integrales Delhi, March 18, 2013

We will now address some issues about Simple Random Sampling You will find the answers using in your computer the simulator Juan showed in the previous session Open the Excel book Lesson1, enable macros and wait for the questions If you don t have a computer, sit next to a colleague and observe attentively Please join Channel 41 2

Warming up Let us reproduce Juan s experiment Open the Lesson1 workbook with the following initial parameters: Population size: 1,000 electors (N=1,000) Prevalence of Green: (52%) Sample size: 100 electors (n=100) What is the standard error? (Remember that the standard error is the limit value of the root mean square error) 3

N=1,000 n=100 P=52% What is the standard error? 1. About 47 % 2. About 4.7 % 3. About 0.47 % 4. About 0.047 % 5. None of the above 0% 0% 0% 0% 0% 1 2 3 4 5

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 5

Effect of the population size Remember that in Juan s example N=1,000, n=100, P=52% e=4.73% Now suppose that the town is twice the size of Juan s town, but our budget does not permit a bigger sample N=2,000, n=100, P=52% What would be the standard error? Then suppose that the town is a lot bigger that Juan s, but we still cannot afford a bigger sample N=5,000, n=100, P=52% What would be the standard error? Then suppose that the town one half the size of than Juan s N=500, n=100, P=52% What would be the standard error? 6

If the town is twice the size of Juan s town N=2,000 n=100 P=52% What is the standard error? 1. About twice as much as in Juan s town (4.73% x 2 = 9.46%) 2. A little more than in Juan s town 3. The same as in Juan s town (4.73%) 0% 0% 0% 1 2 3

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 8

If the town is a lot bigger than Juan s town N=5,000 n=100 P=52% What is the standard error? 1. About five times as much as in Juan s town (4.73% x 5 = 23.6%) 2. A little more than in Juan s town 3. The same as in Juan s town (4.73%) 0% 0% 0% 1 2 3

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 10

If the town is one half the size of Juan s town N=500 n=100 P=52% What is the standard error? 1. About one half as much as in Juan s town (4.73% / 2 = 2.36%) 2. A little less than in Juan s town 3. The same as in Juan s town (4.73%) 0% 0% 0% 1 2 3

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 12

Effect of the population size Conclusion: The size of the population has very little influence on the precision of a sample of a given size Explore at home the case of much smaller towns 13

Effect of the sample size Remember that in Juan s example N=1,000, n=100, P=52% 4.73% Now suppose we could duplicate the sample size N=1,000, n=200, P=52% What would be the standard error? Then suppose that we had to reduce the sample size to one half N=1,000, n=50, P=52% What would be the standard error? 14

If the sample is twice the size of Juan s N=1,000 n=200 P=52% What is the standard error? 1. About the same as with Juan s sample (4.73%) 2. About half of with Juan s sample (4.73% / 2 = 2.36%) 3. Less than with Juan s sample but more than one half 4. More than with Juan s sample 0% 0% 0% 0% 1 2 3 4

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 16

If the sample is one half the size of Juan s N=1,000 n=50 P=52% What is the standard error? 1. About the same as with Juan s sample (4.73%) 2. About twice of with Juan s sample (4.73% x 2 = 9.46%) 3. More than with Juan s sample but less than twice 4. Less than with Juan s sample 0% 0% 0% 0% 1 2 3 4

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 18

Conclusion: Effect of the sample size The error is reduced when the sample size is increased, but it is not inversely proportional to the sample size It is inversely proportional to the square root of the sample size 19

Effect of the prevalence Remember that in Juan s example N=1,000, n=100, P=52% e=4.73% Let us find the standard error for other prevalences N=1,000, n=100, P=40% N=1,000, n=100, P=25% N=1,000, n=100, P=75% N=1,000, n=100, P=10% N=1,000, n=100, P=1% 20

N=1,000, n=100, P=40% What is the standard error? 1. The same as in Juan s example (4.73%) 2. A little less in Juan s example (more than 4%) 3. A lot less in Juan s example (less that 4%) 4. More than in Juan s example 0% 0% 0% 0% 1 2 3 4

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 22

N=1,000, n=100, P=25% What is the standard error? 1. The same as in Juan s example (4.73%) 2. A little less than in Juan s example (more than 4%) 3. A lot less than in Juan s example (less that 4%) 4. More than in Juan s example 0% 0% 0% 0% 1 2 3 4

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 1,000 100 25 % 4.11 % 24

N=1,000, n=100, P=75% What is the standard error? 1. The same as in Juan s example (4.73%) 2. A little more than in Juan s example (less than 5%) 3. A lot more than in Juan s example (more than 5%) 4. Less than in Juan s example 0% 0% 0% 0% 1 2 3 4

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 1,000 100 25 % 4.11 % 1,000 100 75 % 4.11 % e = 1 n N P(1 P) n 26

N=1,000, n=100, P=10% What is the standard error? 1. The same as in Juan s example (4.73%) 2. A little less than in Juan s example (more than 3%) 3. A lot less than in Juan s example (less that 3%) 4. More than in Juan s example 0% 0% 0% 0% 1 2 3 4

Population size Sample size Prevalence Standard error N N P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 1,000 100 25 % 4.11 % 1,000 100 75 % 4.11 % 1,000 100 10 % 2.85 % 28

N=1,000, n=100, P=1% What is the standard error? 1. The same as in Juan s example (4.73%) 2. Less than in Juan s example but more than 2% 3. Less that 2% 4. More than in Juan s example 0% 0% 0% 0% 1 2 3 4

Population size Sample size Prevalence Standard error N n P e 1,000 100 52 % 4.74 % 2,000 100 52 % 4.87 % 5,000 100 52 % 4.96 % 500 100 52 % 4.47 % 1,000 200 52 % 3.16 % 1,000 50 52 % 6.89 % 1,000 100 40 % 4.65 % 1,000 100 25 % 4.11 % 1,000 100 75 % 4.11 % 1,000 100 10 % 2.85 % 1,000 100 1 % 0.94 % 30

Population size Sample size Prevalence Standard error Relative error N n P e e/p 1,000 100 52 % 4.74 % 9 % 2,000 100 52 % 4.87 % 9 % 5,000 100 52 % 4.96 % 10 % 500 100 52 % 4.47 % 9 % 1,000 200 52 % 3.16 % 6 % 1,000 50 52 % 6.89 % 13 % 1,000 100 40 % 4.65 % 12 % 1,000 100 25 % 4.11 % 16 % 1,000 100 75 % 4.11 % 8 % e = 1 n N P(1 P) n 1,000 100 10 % 2.85 % 28 % 1,000 100 1 % 0.94 % 94 % 31

P(1 P) 0.50 0.40 Effect of the prevalence The maximum is flat: Error does not change much between P=0.2 and P=0.8 Error is maximum when P=0.5 0.30 0.20 0.10 When P goes down, absolute error goes down too, but relative error grows 0.00 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 P 32

Effect of the prevalence Conclusions: Error is maximum when the prevalence is 50% The maximum is flat: If the prevalence is neither too small nor too large, the standard error is close to the maximum If the prevalence is very low, The standard error goes down But the relative standard error goes up This is a problem and a limitation of Simple Random Sampling for the study of rare events (disability, unemployment, ) Juan will tell us how to deal with this in the next session 33

Summary and conclusions Population size Sample size Prevalence Standard error Relative error N n P e e/p 1,000 100 52 % 4.74 % 9.11 % 2,000 100 52 % 4.87 % 9.36 % 5,000 100 52 % 4.96 % 9.51 % 500 100 52 % 4.47 % 8.59 % 1,000 200 52 % 3.16 % 6.08 % 1,000 50 52 % 6.89 % 13.24 % 1,000 100 40 % 4.65 % 11.62 % 1,000 100 25 % 4.11 % 16.43% 1,000 100 75 % 4.11 % 7.75 % 1,000 100 10 % 2.85 % 28.46 % 1,000 100 1 % 0.94 % 94.34% Population size doesn t matter much Sample size matters, but can be expensive Prevalence only matters when it is very small Error is maximum when P=50% 34

Switching to organic cotton farming

Switching to organic cotton farming The ACFAP (Association of Cotton Farmers of Andhra Pradesh) wants to know which percent of its members would be willing to switch to organic farming The association has a database with the names and phone numbers of its members A telephone survey is proposed, but calling all members would be too costly and time consuming How could we help? How big a sample do we need?

What do we need to solve this problem? 1. A sampling strategy 2. A sample frame 3. A margin of error and a confidence level 4. The prevalence 5. The total number of cotton farmers in the association 0% 0% 0% 0% 0% 1 2 3 4 5

What do we need? A checklist: A sampling strategy and sample frame Simple random sampling from ACFAP s database Margin of error and confidence level For instance, 5 percent points at the 95% confidence level Guess what the prevalence might be If we have no clue, we put ourselves in the worst case scenario: 50 percent Total number of number of ACFAP members It is 10,356, but we don t really need this. We can also put ourselves in the worst case scenario:

e = 1 n N P(1 P) For an infinite population (N= ) e = n = n P(1 P) n P(1 P) e 2 For a maximum error E at a given confidence level α n = t 2 P(1 P) E 2 Of course, we also need some formulas, but remember that, in sampling, insights are much more important than formulas For a population size N n N = n 1 + n N With t 95% =1.96, t 99% =2.58, etc.

For an infinite population n = t 2 α P(1 E 2 P) Confidence level α = 95% t α = 1.96 Prevalence P = 50 percent P = 0.5 Maximum error E = 5 percent E = 0.05 n = 1.96 2 0.5 (1 0.5) 2 0.05?

How many farmers do we need to call? N = 1.96 2 0.5 (1-0.5) / 0.05 2 =? 1. 384 farmers 2. 3,842 farmers 3. 10,356 farmers 4. 100 farmers 5. None of the above 0% 0% 0% 0% 0% 1 2 3 4 5

If we wanted to account for the actual number of ACFAP members n n = N 1+ n / N n = 384 N = 10,356 n = 384 1+ 384 /10,356?

If we accounted for the actual number of ACFAP members, how many farmers do we need to call? n N = 384 / ( 1 + 384 / 10,356 ) =? 1. 384 members 2. 370 members 3. 10,356 members 4. 399 members 5. None of the above 0% 0% 0% 0% 0% 1 2 3 4 5