Module 4: Point Estimation Statistics (OA3102)

Similar documents
Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Chapter 7 - Lecture 1 General concepts and criteria

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Chapter 8. Introduction to Statistical Inference

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Sampling and sampling distribution

Review of key points about estimators

Chapter 7. Sampling Distributions

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Applied Statistics I

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Chapter 6: Point Estimation

Chapter 7: Point Estimation and Sampling Distributions

Sampling Distributions

Chapter 8 Statistical Intervals for a Single Sample

Much of what appears here comes from ideas presented in the book:

Point Estimation. Edwin Leuven

Statistics for Business and Economics

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 6 Confidence Intervals

Homework: (Due Wed) Chapter 10: #5, 22, 42

Institute for the Advancement of University Learning & Department of Statistics

Chapter 7. Sampling Distributions and the Central Limit Theorem

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

BIO5312 Biostatistics Lecture 5: Estimations

Back to estimators...

Chapter 5. Sampling Distributions

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Chapter 7. Sampling Distributions and the Central Limit Theorem

8.1 Estimation of the Mean and Proportion

Statistical analysis and bootstrapping

Confidence Intervals Introduction

STA Module 3B Discrete Random Variables

Learning Objectives for Ch. 7

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Review of key points about estimators

STAT Chapter 7: Confidence Intervals

Section 0: Introduction and Review of Basic Concepts

Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

Probability & Statistics

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Bias Reduction Using the Bootstrap

Chapter 8: Sampling distributions of estimators Sections

Statistics 13 Elementary Statistics

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Chapter 3 Discrete Random Variables and Probability Distributions

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Section 7-2 Estimating a Population Proportion

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

High Volatility Medium Volatility /24/85 12/18/86

MATH 3200 Exam 3 Dr. Syring

Comparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations

The Estimation of Expected Stock Returns on the Basis of Analysts' Forecasts

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 23

Chapter 16. Random Variables. Copyright 2010 Pearson Education, Inc.

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

The Two-Sample Independent Sample t Test

Week 1 Quantitative Analysis of Financial Markets Basic Statistics A

Lecture 12: The Bootstrap

Chapter 10 Estimating Proportions with Confidence

Midterm Exam. b. What are the continuously compounded returns for the two stocks?

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Time Observations Time Period, t

Econ 8602, Fall 2017 Homework 2

Chapter 5: Statistical Inference (in General)

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

Section 1.4: Learning from data

Sampling Distributions and the Central Limit Theorem

Statistics & Statistical Tests: Assumptions & Conclusions

Data analysis methods in weather and climate research

Numerical Descriptive Measures. Measures of Center: Mean and Median

Week 7 Quantitative Analysis of Financial Markets Simulation Methods

The Assumption(s) of Normality

Statistical Intervals (One sample) (Chs )

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Expected Value and Variance

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Linear functions Increasing Linear Functions. Decreasing Linear Functions

STAT 201 Chapter 6. Distribution

Econ 300: Quantitative Methods in Economics. 11th Class 10/19/09

Chapter 3 Discrete Random Variables and Probability Distributions

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

E509A: Principle of Biostatistics. GY Zou

The Simple Regression Model

Chapter 16. Random Variables. Copyright 2010, 2007, 2004 Pearson Education, Inc.

A useful modeling tricks.

12 The Bootstrap and why it works

Review: Population, sample, and sampling distributions

Stat 139 Homework 2 Solutions, Fall 2016

Transcription:

Module 4: Point Estimation Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 8.1-8.4 Revision: 1-12 1

Goals for this Module Define and distinguish between point estimates vs. point estimators Discuss characteristics of good point estimates Unbiasedness and minimum variance Mean square error Consistency, efficiency, robustness Quantify and calculate the precision of an estimator via the standard error Discuss the Bootstrap as a way to empirically estimate standard errors Revision: 1-12 2

Welcome to Statistical Inference! Problem: We have a simple random sample of data We want to use it to estimate a population quantity (usually a parameter of a distribution) In point estimation, the estimate is a number Issue: Often lots of possible estimates x x E.g., estimate E(X) with,, or??? This module: What s a good point estimate? Module 5: Interval estimators Module 6: Methods for finding good estimators Revision: 1-12 3

Point Estimation A point estimate of a parameter q is a single number that is a sensible value for q I.e., it s a numerical estimate of q We ll use q to represent a generic parameter it could be m, s, p, etc. The point estimate is a statistic calculated from a sample of data The statistic is called a point estimator Using hat notation, we will denote it as qˆ For example, we might use x to estimate m, so in this case ˆ m x Revision: 1-12 4

Definition: Estimator An estimator is a rule, often expressed as a formula, that tells how to calculate the value of an estimate based on the measurements contained in a sample Revision: 1-12 5

An Example You re testing a new missile and want to estimate the probability of kill (against a particular target under specific conditions) You do a test with n=25 shots The parameter to be estimated is p k, the fraction of kills out of the 25 shots Let X be the number of kills In your test you observed x=15 A reasonable estimator and estimate is estimator: pˆ k X n x 15 estimate: pˆ k 0.6 n 25 Revision: 1-12 6

A More Difficult Example On another test, you re estimating the mean time to failure (MTTF) of a piece of electronic equipment Measurements for n=20 tests (in units of 1,000 hrs): Turns out a normal distribution fits the data quite well So, what we want to do is to estimate m, the MTTF How best to do this? Revision: 1-12 7

Example, cont d Here are some possible estimators for m and their values for this data set: 20 1 (1) ˆ m X, so x xi 27.793 20 i1 27.94 27.98 (2) ˆ m X, so x 27.960 2 min( Xi) max( Xi) 24.46 30.88 (3) ˆ m Xe, so xe 27.670 2 2 18 1 (4) ˆ m X tr(10), so xtr (10) x( i) 27.838 16 i3 Which estimator should you use? I.e., which is likely to give estimates closer to the true (but unknown) population value? Revision: 1-12 8

Another Example In a wargame computer simulation, you want to estimate a scenario s run-time variability (s 2 ) The run times (in seconds) for eight runs are: Two possible estimates: 1 n 1 n 2 2 2 (1) ˆ s S X X, so s 0.25125 1 n i1 X 2 i X 2 i n 2 (2) ˆ s, so estimate 0.220 i1 Why prefer (1) over (2)? Revision: 1-12 9

Bias Definition: Let qˆ ˆ q E ˆ q q, ˆ q be a point estimator for a q is an unbiased estimator if E ˆ q If is said to be biased q Definition: The bias of a point estimator given by B ˆ q E ˆ q q qˆ is E.g.: Revision: 1-12 10 * Figures from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Proving Unbiasedness Proposition: Let X be a binomial r.v. with parameters n and p. The sample proportion ˆp X n is an unbiased estimator of p. Proof: Revision: 1-12 11

Remember: Rules for Linear Combinations of Random Variables For random variables X 1, X 2,, X n Whether or not the X i s are independent E a X a X a X 1 1 2 2 n n a E X a E X a E X 1 1 2 2 If the X 1, X 2,, X n are independent Var a X a X a X 1 1 2 2 n n a Var X a Var X a Var X 2 2 2 1 1 2 2 n n n n Revision: 1-12 12

Example 8.1 Let Y 1, Y 2,, Y n be a random sample with E(Y i )=m and Var(Y i )=s 2. Show that n 2 1 S ' Y Y is a biased estimator for s 2, while S 2 is an unbiased estimator of s 2. Solution: 2 i n i 1 Revision: 1-12 13

Example 8.1 (continued) Revision: 1-12 14

Example 8.1 (continued) Revision: 1-12 15

Another Biased Estimator Let X be the reaction time to a stimulus with X~U[0,q], where we want to estimate q based on a random sample X 1, X 2,, X n Since q is the largest possible reaction time, consider the estimator However, unbiasedness implies that we can observe values bigger and smaller than q Why? Thus, qˆ must be a biased estimator 1 ˆ 1 max X1, X 2,, X n q Revision: 1-12 16

Fixing the Biased Estimator For the same problem consider the estimator ˆ max 2 X,,, 1 X 2 X n q n 1 n Show this estimator is unbiased Revision: 1-12 17

Revision: 1-12 18

One Criterion for Choosing Among Estimators Principle of minimum variance unbiased estimation: Among all estimators of q that are unbiased, choose the one that has the minimum variance The resulting estimator is called the minimum variance unbiased estimator (MVUE) of q qˆ Estimator ˆ q is preferred to ˆ q 1 2 Revision: 1-12 19 * Figure from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Example of an MVUE Let X 1, X 2,, X n be a random sample from a normal distribution with parameters m and s. Then the estimator ˆ is the MVUE for m m X Proof beyond the scope of the class Note this only applies to the normal distribution When estimating the population mean E(X)=m for other distributions, X may not be the appropriate estimator E.g., for Cauchy distribution E(X)=! Revision: 1-12 20

How Variable is My Point Estimate? The Standard Error The precision of a point estimate is given by its standard error The standard error of an estimator is its standard deviation s ˆ q Var ˆ q If the standard error itself involves unknown parameters whose values are estimated, substitution of these estimates into s qˆ yields the estimated standard error The estimated standard error is denoted by s or Revision: 1-12 21 qˆ ˆqˆ s ˆ q

Deriving Some Standard Errors (1) Proposition: If Y 1, Y 2,, Y n are distributed iid with variance s 2 then, for a sample of size n, Var Y s n. Thus. Proof: 2 s Y s n Revision: 1-12 22

Deriving Some Standard Errors (2) Proposition: If Y i ~Bin(n,p), i=1,,n, then, where q=1-p and. s pq n ˆp Y n ˆp Proof: Revision: 1-12 23

If populations are independent Expected Values and Standard Errors of Some Common Point Estimators Target Parameter q Sample Size(s) Point Estimator ˆ E qˆ m n Y m Y p n pˆ p n m 1 -m 2 n 1 and n 2 Y Y m 1 -m 2 p 1 -p 2 n 1 and n 2 pˆ pˆ p 1 -p 2 q 1 2 1 2 Standard Error s n p q n s ˆ q s n pq n 2 2 1 2 1 2 s n p q n 1 1 2 2 1 2 Revision: 1-12 24

However, Unbiased Estimators Aren t Always to be Preferred Sometimes an estimator with a small bias can be preferred to an unbiased estimator Example: More detailed discussion beyond scope of course just know unbiasedness isn t necessarily required for a good estimator Revision: 1-12 25

Mean Square Error Definition: The mean square error (MSE) of a point estimator qˆ is MSE MSE of an estimator qˆ variance and its bias ˆ q E ˆ q q 2 is a function of both its I.e., it can be shown (extra credit problem) that 2 MSE ˆ q E ˆ q q Var ˆ q B ˆ q 2 So, for unbiased estimators MSE ˆ q Var ˆ q Revision: 1-12 26

Error of Estimation Definition: The error of estimation e is the distance between an estimator and its target parameter: e ˆ q q Since qˆ is a random variable, so it the error of estimation, e But we can bound the error: Pr ˆ q q b Pr b ˆ q q b Pr q b ˆ q q b Revision: 1-12 27

Bounding the Error of Estimation Tchebysheff s Theorem. Let Y be a random variable with finite mean m and variance s 2. Then for any k > 0, Pr Y m ks 11 k 2 Note that this holds for any distribution It is a (generally conservative) bound E.g., for any distribution we re guaranteed that the probability Y is within 2 standard deviations of the mean is at least 0.75 So, for unbiased estimators, a good bound to use on the error of estimation is b 2s Revision: 1-12 28 ˆ q

Example 8.2 In a sample of n=1,000 randomly selected voters, y=560 are in favor of candidate Jones. Estimate p, the fraction of voters in the population favoring Jones, and put a 2-s.e. bound on the error of estimation. Solution: Revision: 1-12 29

Example 8.2 (continued) Revision: 1-12 30

Example 8.3 Car tire durability was measured on samples of two types of tires, n 1 =n 2 =100. The number of miles until wear-out were recorded with the following results: s y 26, 400 miles y 25,100 miles 1 2 2 2 1 s2 1, 440, 000 miles 1,960, 000 miles Estimate the difference in mean miles to wear-out and put a 2-s.e. bound on the error of estimation Revision: 1-12 31

Example 8.3 Solution: Revision: 1-12 32

Example 8.3 (continued) Revision: 1-12 33

Other Properties of Good Estimators An estimator is efficient if it has a small standard deviation compared to other unbiased estimators An estimator is robust if it is not sensitive to outliers, distributional assumptions, etc. That is, robust estimators work reasonably well under a wide variety of conditions An estimator qˆn is consistent if ˆ q q e 0 as n P n For more detail, see Chapter 9.1-9.5 Revision: 1-12 34

A Useful Aside: Using the Bootstrap to Empirically Estimate Standard Errors x1 x2 Population ~F The Hard Way to Empirically Estimate Standard Errors xr Draw multiple (R) samples from the population, where x i ={x 1i,x 2i,,x ni } ˆ q x ˆ q x ˆ 1 2 i1 q x R R 1 s.e.[ q( X)] ˆ q( ) ˆ i q( ) R 1 x x Calculate multiple parameter estimates Estimate s.e. of the parameter using the std. dev. of the estimates Revision: 1-12 35 2

The Bootstrap The hard way is either not possible or is wasteful in practice Bootstrap is: Useful when you don t know or, worse, simply cannot analytically derive sampling distribution Provides a computer-intensive method to empirically estimate sampling distribution Only feasible recently with the widespread availability of significant computing power Revision: 1-12 36

Plug-in Principle We ve been doing this throughout the class If you need a parameter for a calculation, simply plug in the equivalent statistic For example, we defined 2 Var( X ) E X E( X ) and then we sometimes did the calculation using X for EX ( ) Relevant for the bootstrap as we will plug in the empirical distribution in place of the population distribution Revision: 1-12 37

Empirically Estimating Standard Errors Using the Bootstrap x x 1, x2,..., x n ˆ ~F s * x 1 * x 2 Revision: 1-12 38 * x B ˆ * q x1 ˆ * q x2 ˆ * q x B 1 ˆ( ) B * * ˆ q i q q B 1 x i1 2 where q 1 ˆ Draw multiple (B) resamples from the data, where B * * q xi B i1 x Calculate multiple bootstrap estimates Estimate s.e. from bootstrap estimates x, x,... x * * * * i 1i 2i ni

Some Key Ideas Bootstrap samples are drawn with replacement from the empirical distribution So, observations can actually occur in the bootstrap sample more frequently than they occurred in the actual sample Empirical distribution substitutes for the actual population distribution Can draw lots of bootstrap samples from the empirical distribution to calculate the statistic of interest Make B as big as can run in a reasonable timeframe Bootstrap resamples are of same size as orignal sample (n) Because this is all empirical, don t need to analytically solve for the sampling distribution of the statistic of interest Revision: 1-12 39

What We Covered in this Module Defined and distinguished between point estimates vs. point estimators Discussed characteristics of good point estimates Unbiasedness and minimum variance Mean square error Consistency, efficiency, robustness Quantified and calculated the precision of an estimator via the standard error Discussed the Bootstrap as a way to empirically estimate standard errors Revision: 1-12 40

Homework WM&S chapter 8.1-8.4 Required exercises: 2, 8, 21, 23, 27 Extra credit: 1, 6 Useful hints: Problem 8.2: Don t just give the obvious answer, but show why it s true mathematically Problem 8.8: Don t do the calculations for the estimator Extra credit problem 8.6: The a term is a constant with qˆ 4 0a 1 Revision: 1-12 41