Comparing Estimators

Similar documents
We use probability distributions to represent the distribution of a discrete random variable.

Math 140 Introductory Statistics

The Central Limit Theorem

Sampling Distributions and the Central Limit Theorem

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Numerical Descriptive Measures. Measures of Center: Mean and Median

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

7.1 Graphs of Normal Probability Distributions

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

Lesson 12: Describing Distributions: Shape, Center, and Spread

Sampling Distributions

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Unit2: Probabilityanddistributions. 3. Normal distribution

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Chapter 7: Point Estimation and Sampling Distributions

Describing Data: One Quantitative Variable

1. Variability in estimates and CLT

CHAPTER 5 SAMPLING DISTRIBUTIONS

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Normal Cumulative Distribution Function (CDF)

Stat 139 Homework 2 Solutions, Fall 2016

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

Chapter 15: Sampling distributions

Review: Population, sample, and sampling distributions

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Math 140 Introductory Statistics

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Part V - Chance Variability

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

1 Sampling Distributions

Sampling Distributions

Expected Value of a Random Variable

Chapter 5: Statistical Inference (in General)

Chapter 17. The. Value Example. The Standard Error. Example The Short Cut. Classifying and Counting. Chapter 17. The.

Lecture 6: Chapter 6

Math 243 Lecture Notes

On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2

STAB22 section 1.3 and Chapter 1 exercises

STAT 157 HW1 Solutions

Chapter 5: Summarizing Data: Measures of Variation

Chapter 6: Random Variables

Honors Statistics. 3. Discuss homework C2# Discuss standard scores and percentiles. Chapter 2 Section Review day 2016s Notes.

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

6. THE BINOMIAL DISTRIBUTION

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

CHAPTER 5 Sampling Distributions

Sampling & Confidence Intervals

Sampling Distributions

Putting Things Together Part 1

and µ Asian male > " men

AMS7: WEEK 4. CLASS 3

MgtOp S 215 Chapter 8 Dr. Ahn

23.1 Probability Distributions

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes. Standardizing normal distributions The Standard Normal Curve

Problem Set 08 Sampling Distribution of Sample Mean

Chapter 7. Random Variables

Probability Models. Grab a copy of the notes on the table by the door

22.2 Shape, Center, and Spread

4.2 Probability Distributions

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

1 Inferential Statistic

3. Probability Distributions and Sampling

Sampling Distributions Chapter 18

MgtOp 215 TEST 1 (Golden) Spring 2016 Dr. Ahn. Read the following instructions very carefully before you start the test.

The normal distribution is a theoretical model derived mathematically and not empirically.

ECON 214 Elements of Statistics for Economists 2016/2017

The Central Limit Theorem (Solutions) COR1-GB.1305 Statistics and Data Analysis

5-1 pg ,4,5, EOO,39,47,50,53, pg ,5,9,13,17,19,21,22,25,30,31,32, pg.269 1,29,13,16,17,19,20,25,26,28,31,33,38

Volatility Lessons Eugene F. Fama a and Kenneth R. French b, Stock returns are volatile. For July 1963 to December 2016 (henceforth ) the

ECON 214 Elements of Statistics for Economists

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

work to get full credit.

Lecture 6: Normal distribution

Lecture 8 - Sampling Distributions and the CLT

BIOL The Normal Distribution and the Central Limit Theorem

The Standard Deviation as a Ruler and the Normal Model. Copyright 2009 Pearson Education, Inc.

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Homework: (Due Wed) Chapter 10: #5, 22, 42

CABARRUS COUNTY 2008 APPRAISAL MANUAL

Basics. STAT:5400 Computing in Statistics Simulation studies in statistics Lecture 9 September 21, 2016

Module 4: Probability

AP * Statistics Review

Section The Sampling Distribution of a Sample Mean

Monte Carlo Simulation (General Simulation Models)

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

The Assumptions of Bernoulli Trials. 1. Each trial results in one of two possible outcomes, denoted success (S) or failure (F ).

Introduction to Statistical Data Analysis II

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

Chapter 3. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition. Chapter 3 1. The Normal Distributions

Continuous Probability Distributions

Data Analysis and Statistical Methods Statistics 651

Transcription:

Comparing Estimators The Median For the sake of discussion, assume that we are measuring the heights of randomly selected adult men from the U.S. Also for the sake of discussion, let's suppose that this population follows a N(70", 3") distribution, with height measured in inches. Suppose we plan to take a random sample of size 10. (Obtaining such a sample isn't easy, but assume we can do it.) Here's one such simulated sample > (def x (+ (* 3 (normal-rand 10)) 70)) X > x (70.4137146579604 66.64387360086086 65.02215265160767 68.59955985316095 67.50177488231989 72.28927908590187 72.43601901835297 72.1645650707586 66.94914404152433 69.76333582979076) >(median x) The median of this list is 69.18 (rounding off), which is pretty close. The error is -0.82. But one wonders whether we were this close just because of luck, or whether this is typical. So take another sample: (68.33798052106891 70.54899024675639 73.90930552039957 71.20938752129774 69.13832575867107 70.79102487857833 68.91376948312231 69.9605805576679 70.79784104468536 71.2276041211891) > (median x) 70.67000756266737 The error is +0.67. Even closer. To understand how this would do in the long run, we simulate 1000 repetitions: > (def manymedians (sample-distr #'median 70 1000 10 3)) (This is a home-brewed function that takes as input an estimator-function (median, preceded by #' to make sure the LISP evaluates it as a function and not a value), the mean (70), the number of simulations (1000), the size of each sample (10), and the standard deviation (3).) Lets treat this list, manymedians, as we would any data set, and look at a picture:

66 68 70 72 74 76 The sampling distribution appears rather symmetric, and indeed appears very close to normal. The median of the medians is 70.11; the average of the medians is 70.09, (which strongly suggests that the median of the sample is an unbiased estimator of this population mean -- at least for this normal population), and the SE is approximately 1.136, which is smaller than the SD of the population (3). Compare this to the theoretical SE for the average, sigma/sqrt(n) = 3/sqrt(10) = 0.9486, and you see that they are roughly in the same ballpark. In practice, of course, we don't know the mean and standard deviation and median before we begin. But this simulation tells us that if we take the sample median of a sample of 10 people, we can be fairly certain our median will be close (within 1.136, approximately, about 68% of the time, using the "empirical rule".) Standard-Deviation Suppose we wish to estimate the standard deviation of this population. There are two popular estimators. The so-called "sample standard deviation" divides by (n-1); the "population standard deviation", which has the distinction of also being the maximum likelihood estimator, divides by n. Let's call the first s1, the second s2. Imagine we take a sample of size 10, again, and compute both estimators. Here's the sampling distribution of s1:

0 2 4 6 It looks normal, too. But in fact it is not. For one thing, negative values are impossible. The result is a distribution skewed to the right (although only slightly in this case). Note that some of our samples had quite small standard deviations, and some quite large. But these were rare. The average of this list is 2.92, very close to the "target" value of 3. Now for s2: The histogram looks pretty much the same. (We did an entirely new simulation, so the histograms are not exactly the same.) The average of s2 was 2.77. A little less close to the target. As you can see from some algebra, s1 = (sqrt (n/ n-1))s2, which shows that s1 is always slightly larger than s2. This is because s2 is biased downwards: it consistently gives values that are smaller than the target. The "correction" factor fixes this. Incidentally, you can get a better sense of the skewed sampling distribution if we instead sample from a N(70,.5) distribution. Now here's a histogram of 1000 such simulations:

0 0.2 0.4 0.6 0.8 1 1.2 This should look a little more skewed to you. Sample Mean In the case of the sample mean, we have well-known theory to guide us. We know that the sample mean (the average of the sample) will be N(70,.9486). Let's check: 66 68 70 72 74 The average of these averages is 70.0018, and the SD is 0.94568. Both quite close.

The situation in reverse Suppose we know only that we have sampled from a N(?, 3) population. We see these numbers: (64.09791552690261 67.03127774411125 71.23499658281865 74.64023086443113 72.74757233302203 74.18724752373062 69.96287696287911 66.52040596863715 64.2658942174255 72.9117640212181) What's the true value of the mean? Of course there's no way of knowing. The average of this list is 69.76. But our theory tells us that sample averages are almost always (at least 95% of the time) within 2 SEs of the true mean. So the true mean must be somewhere between 69.76-2*.9486 and 69.76 + 2*.9486 or (67.86, 71.66). Of course there's no guarantee. We only know that 95% of all samples will produce averages that are within 2 SEs of the true mean, and so we can be reasonably confident that ours is one of them. In fact, the true mean here was 68, so we win. Appendix: Here's the code for sample-distr: (defun sample-distr (estimator center M n sigma) (let ( resultlist '()) (dolist (i (iseq 1 M) resultlist) (def x (normal-rand n)) (def newx (+ (* x sigma) center)) (def resultlist (append (list (funcall estimator newx)) resultlist))))) # example (sample-distr #'mean 10 1000 100 4) or (sample-distr #'median 10 1000 100 4) #