Review of key points about estimators

Similar documents
Review of key points about estimators

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Chapter 7 - Lecture 1 General concepts and criteria

Applied Statistics I

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Statistical analysis and bootstrapping

8.1 Estimation of the Mean and Proportion

MATH 3200 Exam 3 Dr. Syring

Point Estimation. Edwin Leuven

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

Statistical estimation

Chapter 8. Introduction to Statistical Inference

Back to estimators...

STAT Chapter 6: Sampling Distributions

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

Chapter 8: Sampling distributions of estimators Sections

Confidence Intervals Introduction

MVE051/MSG Lecture 7

Module 4: Point Estimation Statistics (OA3102)

1 Introduction 1. 3 Confidence interval for proportion p 6

Chapter 7: Point Estimation and Sampling Distributions

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Statistics for Business and Economics

Statistics for Managers Using Microsoft Excel 7 th Edition

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Section The Sampling Distribution of a Sample Mean

MATH 10 INTRODUCTORY STATISTICS

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

may be of interest. That is, the average difference between the estimator and the truth. Estimators with Bias(ˆθ) = 0 are called unbiased.

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Statistical Intervals (One sample) (Chs )

Chapter 8: Sampling distributions of estimators Sections

Econ 300: Quantitative Methods in Economics. 11th Class 10/19/09

Sampling and sampling distribution

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Chapter 4: Estimation

Chapter 7. Sampling Distributions and the Central Limit Theorem

STAT Chapter 7: Confidence Intervals

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Maximum Likelihood Estimation

The Constant Expected Return Model

BIO5312 Biostatistics Lecture 5: Estimations

The Assumption(s) of Normality

Midterm Exam III Review

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Section 0: Introduction and Review of Basic Concepts

Chapter 5. Statistical inference for Parametric Models

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Statistics and Probability

MA131 Lecture 9.1. = µ = 25 and σ X P ( 90 < X < 100 ) = = /// σ X

Chapter 7. Sampling Distributions and the Central Limit Theorem

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Chapter 6: Point Estimation

Stat 139 Homework 2 Solutions, Fall 2016

12 The Bootstrap and why it works

Probability. An intro for calculus students P= Figure 1: A normal integral

MTH6154 Financial Mathematics I Stochastic Interest Rates

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 5: Statistical Inference (in General)

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Sampling Distributions

Parameter Estimation II

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

STATISTICS and PROBABILITY

Lecture 6: Chapter 6

The following content is provided under a Creative Commons license. Your support

Simulation Wrap-up, Statistics COS 323

χ 2 distributions and confidence intervals for population variance

STAT 111 Recitation 3

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

LET us say we have a population drawn from some unknown probability distribution f(x) with some

Statistics 431 Spring 2007 P. Shaman. Preliminaries

CS340 Machine learning Bayesian model selection

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

Chapter 5. Sampling Distributions

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

Statistics 13 Elementary Statistics

CPSC 540: Machine Learning

STA215 Confidence Intervals for Proportions

Math 140 Introductory Statistics

STA Module 3B Discrete Random Variables

Discrete Random Variables

Introduction to Business Statistics QM 120 Chapter 6

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Probability & Statistics

Linear Regression with One Regressor

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

IEOR E4703: Monte-Carlo Simulation

Statistics 511 Additional Materials

Transcription:

Review of key points about estimators Populations can be at least partially described by population parameters Population parameters include: mean, proportion, variance, etc. Because populations are often very large (maybe infinite, like the output of a process) or otherwise hard to investigate, we often have no way to know the exact values of the paramters Statistics or point estimators are used to estimate population pararmeters An estimator is calculated using a function that depends on information taken from a sample from the population We are interested in evaluating the goodness of our estimator - topic of sections 8.1-8.4 To evaluate goodness, it s important to understand facts about the estimator s sampling distribution, its mean, its variance, etc.

Different estimators are possible for same parameter In everyday life, people who are working with the same information arrive at different ideas/decisions based on the same information Given the same sample measurements/data, people may derive different estimators for the population parameter (mean, variance, etc.) For this reason, we need to evaluate the estimators on some criteria (bias, etc.) to determine which is best Complication: the criteria that are used to judge estimators may differ Example: For estimating σ 2 (variance), which is better: s 2 = 1 n 1 n i=1 (x i x) 2 (sample variance) or some other estimator s 2 = 1 n n i=1 (x i x) 2 (which more closely resembles population variance)

Repeated estimation yields sampling distribution If you use an estimator once, and it works well, is that enough proof for you that you should always use that estimator for that parameter? Visualize calculating an estimator over and over with different samples from the same population, i.e. take a sample, calculate an estimate using that rule, then repeat This process yields sampling distribution for the estimator We look at the mean of this sampling distribution to see what value our estimates are centered around We look at the spread of this sampling distribution to see how much our estimates vary

Bias We may want to make sure that the estimates are centered around the paramter of interest (the population parameter that we re trying to estimate) One measurement of center is the mean, so may want to see how far the mean of the estimates is from the parameter of interest bias Assume we re using the estimator ˆθ to estimate the population parameter θ Bias(ˆθ) =E(ˆθ) θ If bias equals 0, the estimator is unbiased Two common unbiased estimators are: 1. Sampling proportion ˆp for population proportion p 2. Sample mean X for population mean µ

Bias and the sample variance What is the bias of the sample variance, s 2 = 1 n 1 n i=1 (x i x) 2? Contrast this case with that of the estimator s 2 = 1 n n i=1 (x i x) 2, which looks more like the formula for population variance.

Variance of an estimator Say your considering two possible estimators for the same population parameter, and both are unbiased Variance is another factor that might help you choose between them. It s desirable to have the most precision possible when estimating a parameter, so you would prefer the estimator with smaller variance (given that both are unbiased). For two of the estimators that we have discussed so far, we have the variances: 1. Var(ˆp) = p(1 p) n 2. Var( X) = σ2 n

Mean square error of an estimator If one or more of the estimators are biased, it may be harder to choose between them. For example, one estimator may have a very small bias and a small variance, while another is unbiased but has a very large variance. In this case, you may prefer the biased estimator over the unbiased one. Mean square error (MSE) is a criterion which tries to take into account concerns about both bias and variance of estimators. MSE(ˆθ) =E[(ˆθ θ) 2 ] the expected size of the squared error, which is the difference between the estimate ˆθ and the actual parameter θ

MSE can be re-stated Show that the MSE of an estimate can be re-stated in terms of its variance and its bias, so that MSE(ˆθ) =Var(ˆθ)+[Bias(ˆθ)] 2

Moving from one population of interest to two Parameters and sample statistics that have been discussed so far only apply to one population. What if we want to compare two populations? Example: We want to calculate the difference in the mean income in the year after graduation between economics majors and other social science majors µ 1 µ 2 Example: We want to calculate the difference in the proportion of students who go on to grad school between economics majors and other social science majors p 1 p 2

Comparing two populations Try to develop a point estimate for these quantities based on estimators we already have For the difference between two means, µ 1 µ 2, we try the estimator x 1 x 2 For the difference between two proportions, p 1 p 2, we try the estimator ˆp 1 ˆp 2 We want to evaluate the goodness of these estimators. What do we know about the sampling distributions for these estimators? Are they unbiased? What is their variance?

Mean and variance of x 1 x 2 Show that x 1 x 2 is an unbiased estimator for µ 1 µ 2. Also show that the variance of this estimator is σ2 1 n 1 + σ2 2 n 2

Mean and variance of ˆp 1 ˆp 2 Show that ˆp 1 ˆp 2 is an unbiased estimator for p 1 p 2. Also show that the variance of this estimator is p 1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2

Summary of two sample estimators We have just shown that x 1 x 2 and ˆp 1 ˆp 2 are unbiased estimators, as were x and ˆp The CLT doesn t apply to these estimators since they are not sample means - they are differences of sample means Other theorems do state that given at least moderate (n 30) sample sizes, these estimators have sampling distributions that are approximately normal

Estimation errors Imagine that we have a point estimate ˆθ for population parameter θθ Even with a good point estimate, there is very likely to be some error (ˆθ = θ not likely) We can express this error of estimation, denoted ε, asε = ˆθ θ This is the number of units that our estimate, ˆθ, isofffromθ (doesn t take into account the direction of the error) Since the estimator ˆθ is a RV, the error ε is also random We can use the sampling distribution of ˆθ to help place some bounds on how big the error is likely to be

Placing bounds on the error of estimation We know that this estimate might not be exactly correct, but we would like to use it to calculate an interval such that the error of estimation is less than some number b with a certain probability, π We can write this as P (ε <b)=π This can be re-written as P ( ˆθ θ <b)=p ( b <ˆθ θ<b)=π In a given sample, we cannot know whether ε<b, because we do not know the actual value of the population parameter θ However, in repeated sampling (and therefore repeated calculation of ˆθ), we can say that for approximately π fraction of samples taken, the estimate ˆθ is within b units of the parameter θ

Example We want to compare the mean family income in two states. For state 1, we had a random sample of n 1 = 100 families with a sample mean of x 1 = 35000. For state 2, we had a random sample of n 2 = 144 families with a sample mean of x 2 = 36000. Past studies have shown that for both states σ = 4000. Estimate µ 1 µ 2 and place a two-standard-error bound on the error of estimation.