Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Similar documents
Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

6 Central Limit Theorem. (Chs 6.4, 6.5)

Point Estimation. Copyright Cengage Learning. All rights reserved.

5.3 Statistics and Their Distributions

Chapter 6: Point Estimation

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Applied Statistics I

Chapter 7: Point Estimation and Sampling Distributions

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Back to estimators...

Chapter 7 - Lecture 1 General concepts and criteria

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Module 4: Point Estimation Statistics (OA3102)

Chapter 8. Introduction to Statistical Inference

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Statistical Intervals (One sample) (Chs )

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Lecture 10: Point Estimation

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

8.1 Estimation of the Mean and Proportion

STAT Chapter 6: Sampling Distributions

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Business Statistics 41000: Probability 4

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Chapter 8: Sampling distributions of estimators Sections

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Lecture 6: Chapter 6

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

STA Module 3B Discrete Random Variables

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Business Statistics 41000: Probability 3

Statistics 431 Spring 2007 P. Shaman. Preliminaries

2011 Pearson Education, Inc

MidTerm 1) Find the following (round off to one decimal place):

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

MATH 3200 Exam 3 Dr. Syring

Chapter 5. Statistical inference for Parametric Models

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 5: Statistical Inference (in General)

Sampling and sampling distribution

Chapter 8 Estimation

Chapter 7: Estimation Sections

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

STAT 241/251 - Chapter 7: Central Limit Theorem

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

Confidence Intervals Introduction

Chapter 4 Continuous Random Variables and Probability Distributions

Statistics and Probability

. (i) What is the probability that X is at most 8.75? =.875

Statistics for Business and Economics

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

The Normal Distribution

ECON 214 Elements of Statistics for Economists 2016/2017

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Discrete Random Variables

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

What was in the last lecture?

The Normal Distribution

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Point Estimation. Edwin Leuven

UNIT 4 MATHEMATICAL METHODS

MAS187/AEF258. University of Newcastle upon Tyne

Central Limit Theorem, Joint Distributions Spring 2018

Chapter 16. Random Variables. Copyright 2010 Pearson Education, Inc.

The Bernoulli distribution

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Statistical estimation

Sampling Distributions and the Central Limit Theorem

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Chapter 7. Sampling Distributions

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Math 227 Elementary Statistics. Bluman 5 th edition

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

5.2 Random Variables, Probability Histograms and Probability Distributions

STRESS-STRENGTH RELIABILITY ESTIMATION

1. Variability in estimates and CLT

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

AMS7: WEEK 4. CLASS 3

Expected Value of a Random Variable

Commonly Used Distributions

Continuous Distributions

Transcription:

6 Point Estimation Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Point Estimation Statistical inference: directed toward conclusions about one or more parameters. We will use the generic Greek letter θ for the parameter of interest. Process: Obtain sample data from each population under study Based on the sample data, estimate θ Conclusions based on sample estimates. The objective of point estimation = estimate θ 2

Some General Concepts of Point Estimation A point estimate of a parameter θ is a value (based on a sample) that is a sensible guess for θ. A point estimate is obtained by a formula ( estimator ) which takes the sample data and produces an point estimate. Such formulas are called point estimators of θ. Different samples produce different estimates, even though you use the same estimator. 3

Example 20 observations on breakdown voltage for some material: 24.46 25.61 26.25 26.42 26.66 27.15 27.31 27.54 27.74 27.94 27.98 28.04 28.28 28.49 28.50 28.87 29.11 29.13 29.50 30.88 Assume that after looking at the histogram, we think that the distribution of breakdown voltage is normal with mean value µ. What are some point estimators for µ? 4

Estimator quality Which estimator is the best? What does best mean? 5

Estimator quality In the best of all possible worlds, we could find an estimator for which = θ always, in all samples. Why doesn t this estimator exist? For some samples, will sometimes be too big, and other times too small. If we write = θ + error of estimation then an accurate estimator would be one resulting in small estimation errors, so that estimated values will be near the true value. It s the distribution of these errors (over all samples) that actually matters for the quality of estimators. 6

Measures of estimator quality A sensible way to quantify the idea of to consider the squared error ( ) 2 being close to θ is and the mean squared error MSE = E[( ) 2 ]. If among two estimators, one has a smaller MSE than the other, the first estimator is usually the better one. Another good quality is unbiasedness: E[( )] = θ Another good quality is small variance, Var[( )] 7

Unbiased Estimators Suppose we have two measuring instruments; one instrument is accurately calibrated, and the other systematically gives readings smaller than the true value. When each instrument is used repeatedly on the same object, because of measurement error, the observed measurements will not be identical. The measurements produced by the first instrument will be distributed about the true value symmetrically, so it is called an unbiased instrument. The second one has a systematic bias, and the measurements are centered around the wrong value. 8

Example: unbiased estimator of proportion If X denotes the number of sample successes, and has a binomial distribution with parameters n and p, then the sample proportion X / n can be used as an estimator of p. Can we show that this is an unbiased estimator? 9

Estimators with Minimum Variance Suppose and are two estimators of θ that are both unbiased. Then, although the distribution of each estimator is centered at the true value of θ, the spreads of the distributions about the true value may be different. Among all estimators of θ that are unbiased, we will always choose the one that has minimum variance. WHY? The resulting is called the minimum variance unbiased estimator (MVUE) of θ. 10

Estimators with Minimum Variance Figure below pictures the pdf s of two unbiased estimators, with having smaller variance than. Then is more likely than to produce an estimate close to the true θ. The MVUE is, in a certain sense, the most likely among all unbiased estimators to produce an estimate close to the true θ. Graphs of the pdf s of two different unbiased estimators 11

Reporting a Point Estimate: The Standard Error Besides reporting the value of a point estimate, some indication of its precision should be given. The standard error of an estimator is its standard deviation. It is the magnitude of a typical or representative deviation between an estimate and the true value θ. Basically, the standard error tells us roughly within what distance of true value θ the estimator is likely to be. 12

The Mean is unbiased Note that the following result shows that the arithmetic average is unbiased: : Proposition Let X 1, X 2,, X n be a random sample from a distribution with mean µ and standard deviation σ. Then 1. E(X) =µ 2. V(X) = 2 /n and X = / p n Thus we see that the arithmetic average is an unbiased estimator for the mean for any random sample of any size from any distribution. 13

General methods for constructing estimators We have: - a sample from a probability distribution ( the model ) - we don t know the parameters of that distribution How do we find the parameters to best match our sample data? Method 1: Methods of Moments (MoM): 1. equate sample characteristics (eg. mean, or variance), to the corresponding population values 2. solve these equations for unknown parameter values 3. the solution formula is the estimator (need to check bias). Method 2: Maximum Likelihood Estimation (MLE) 14

Statistical Moments For k = 1, 2, 3,..., define the k-th population moment, or k-th moment of the distribution f(x), to be E(X k ). and the k-th sample moment is M k = 1 n nx Xi k. Thus the first population moment is E(X) = µ, and the first sample moment is M 1 = 1 nx X i = X. n i=1 The second population and sample moments are E(X 2 ) and M 2 = ΣX i2 /n, respectively. i=1 15

The Method of Moments Let X 1, X 2,..., X n be a random sample from a distribution with pmf or pdf f (x; θ 1,..., θ m ), where θ 1,..., θ m are parameters whose values are unknown. Then the moment estimators ˆ 1 θ, ˆ 1, 2.,...,ˆ.., θ m are obtained by equating the first m sample moments to the corresponding first m population moments and solving for ˆ 1, θ ˆ 1, 2.,...,ˆ.., θ m. If, for example, m = 2, E(X) and E(X 2 ) will be functions of θ 1 and θ 2. Setting E(X) = M 1 and E(X 2 ) = M 2 gives two equations in θ 1 and θ 2. The solution then defines the estimators. 16

Example for MoM Let X 1, X 2,..., X n represent a random sample of service times of n customers at a certain facility, where the underlying distribution is assumed exponential with parameter λ. What is the MOM estimate for λ? 17

Example 2 for MoM Let X 1, X 2,..., X n represent a random sample from a Gamma distribution with parameters a and b. How do we use MoM to estimate a and b? 18

MLE Method 2: Maximum likelihood estimation (MLE) The method of maximum likelihood was first introduced by R. A. Fisher, a geneticist and statistician, in the 1920s. Most statisticians recommend this method, at least when the sample size is large, since the resulting estimators have many desirable mathematical properties. 19

Example for MLE A sample of ten independent bike helmets just made in the factory A was up for testing. 3 helmets are flawed. Let p = P(flawed helmet). The probability of X=3 is: P(X=3) = C(10,3) p 3 (1 p) 7 But the likelihood function is given as: L (p sample data) = p 3 (1 p) 7 Likelihood function = function of the parameter only. For what value of p is the obtained sample most likely to have occurred? bi.e., what value of p maximizes the likelihood? 20

Example MLE cont d Graph of the likelihood function as a function of p: L (p sample data) = p 3 (1 p) 7 21

Example MLE cont d The natural logarithm of the likelihood: log ( L (p sample data)) = l (p sample data)) = 3 log(p) + 7 log(1 p) 22

Example MLE cont d We can verify our visual guess by using calculus to find the actual value of p that maximizes the likelihood. Working with the natural log of the likelihood is often easier than working with the likelihood itself. WHY? How do you find the maximum of a function? 23

Example MLE cont d That is, our MLE estimate that the estimator produced is 0.30. It is called the maximum likelihood estimate because it is the value that maximizes the likelihood of the observed sample. It is the most likely value of the parameter that is supported by the data in the sample. Question: Why doesn t the likelihood care about constants in the pdf? 24

Example 2 - MLE (in book s notation) Suppose X 1,..., X n is a random sample (iid) from Exp(λ). Because of independence, the joint probability of the data = likelihood function is the product of pdf s: How do we find the MLE? What if our data is normally distributed? 25

Estimating Functions of Parameters We ve now learned how to obtain the MLE formulas for several estimators. Now we look at functions of them. The Invariance Principle Let be the mle s of the parameters θ 1, θ 2...θ m. Then the mle of any function h(θ 1, θ 2,..., θ m ) of these parameters is the function h( ) of the mle s. 26

Example In the normal case, the mle s of µ and σ 2 are To obtain the mle of the function substitute the mle s into the function: The mle of σ is not the sample standard deviation S, though they are close unless n is quite small. 27

The Central Limit Theorem 28

Estimators and Their Distributions Any estimator, as it is based on a sample, is a random variable that has its own probability distribution. This probability distribution is often referred to as the sampling distribution of the estimator. This sampling distribution of any particular estimator depends: 1) the population distribution (normal, uniform, etc.) 2) the sample size n 3) the method of sampling The standard deviation of this distribution is called the standard error of the estimator. 29

Random Samples The r.v. s X 1, X 2,..., X n are said to form a (simple) random sample of size n if 1. The X i s are independent r.v. s. 2. Every X i has the same probability distribution. We say that these X i s are independent and identically distributed (iid). 30

Example A certain brand of MP3 player comes in three models: - 2 GB model, priced $80, - 4 GB model priced at $100, - 8 GB model priced $120. Suppose the probability distribution of the cost X of a single randomly selected MP3 player purchase is given by From here, µ = 106, σ 2 = 244 31

Example, cont cont d Suppose on a particular day only two MP3 players are sold. Let X 1 = the revenue from the first sale and X 2 the revenue from the second. X 1 and X 2 are independent, and have the previously shown probability distribution. In other words, X 1 and X 2 constitute a random sample from that distribution. How do we find the mean and variance of this random sample? 32

Example cont cont d The complete sampling distributions of is : Original distribution: µ = 106, σ 2 = 244 s distribution 33

Example cont cont d What are the mean and variance of this estimator? What do you think the mean and variance would be if we had four samples instead of 2? 34

Example cont cont d If there had been four purchases on the day of interest, the sample average revenue would be based on a random sample of four X i s, each having the same distribution. More calculation eventually yields the pmf of for n = 4 as From this, µ x = 106 = µ and = 61 = σ 2 /4. 35

Simulation Experiments With a larger sample size, any unusual x values, when averaged in with the other sample values, still tend to yield an value close to µ. Combining these insights yields a result: based on a large n tends to be closer to µ than does based on a small n. 36

The Distribution of the Sample Mean Let X 1, X 2,..., X n be a random sample from a distribution with mean value µ and standard deviation σ. Then 1. 2. The standard deviation standard error of the mean is also called the Great, but what is the *distribution* of the sample mean? 37

The Case of a Normal Population Distribution Proposition: Let X 1, X 2,..., X n be a random sample from a Normal distribution with mean µ and standard deviation σ. Then for any n, is normally distributed (with mean µ and standard deviation We know everything there is to know about the distribution when the population distribution is Normal. In particular, probabilities such as P(a obtained simply by standardizing. b) can be 38

The Case of a Normal Population Distribution 39

But what if the underlying distribution of X i s is not Normal? The Central Limit Theorem 40

The Central Limit Theorem (CLT) When the X i s are normally distributed, so is sample size n. for every Even when the population distribution is highly nonnormal, averaging produces a distribution more bell-shaped than the one being sampled. A reasonable conjecture is that if n is large, a suitable normal curve will approximate the actual distribution of. The formal statement of this result is one of the most important theorems in probability: CLT 41

The Central Limit Theorem Theorem The Central Limit Theorem (CLT) Let X 1, X 2,..., X n be a random sample from a distribution with mean µ and variance σ 2. Then if n is sufficiently large, normal distribution with has approximately a and The larger the value of n, the better the approximation. 42

The Central Limit Theorem The Central Limit Theorem illustrated 43

Example The amount of impurity in a batch of a chemical product is a random variable with mean value 4.0 g and standard deviation 1.5 g. (unknown distribution) If 50 batches are independently prepared, what is the (approximate) probability that the average amount of impurity in these 50 batches is between 3.5 and 3.8 g? Side note: according to the rule of thumb to be stated shortly, n = 50 is large enough for the CLT to be applicable. 44

The Central Limit Theorem The CLT provides insight into why many random variables have probability distributions that are approximately normal. For example, the measurement error in a scientific experiment can be thought of as the sum of a number of underlying perturbations and errors of small magnitude. A practical difficulty in applying the CLT is in knowing when n is sufficiently large. The problem is that the accuracy of the approximation for a particular n depends on the shape of the original underlying distribution being sampled. 45

The Central Limit Theorem If the underlying distribution is close to a normal density curve, then the approximation will be good even for a small n, whereas if it is far from being normal, then a large n will be required. Rule of Thumb If n > 30, the Central Limit Theorem can be used. 46

R CODE 47