Applied Statistics I

Similar documents
Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Chapter 7 - Lecture 1 General concepts and criteria

Chapter 8. Introduction to Statistical Inference

Chapter 6: Point Estimation

Statistics for Business and Economics

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Chapter 8: Sampling distributions of estimators Sections

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Point Estimation. Copyright Cengage Learning. All rights reserved.

Chapter 8: Sampling distributions of estimators Sections

Confidence Intervals Introduction

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

MATH 3200 Exam 3 Dr. Syring

8.1 Estimation of the Mean and Proportion

Review of key points about estimators

Review of key points about estimators

Back to estimators...

Chapter 5. Statistical inference for Parametric Models

Chapter 7: Point Estimation and Sampling Distributions

Statistical Intervals (One sample) (Chs )

MVE051/MSG Lecture 7

Statistical estimation

Chapter 4: Asymptotic Properties of MLE (Part 3)

Probability & Statistics

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Lecture 10: Point Estimation

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Statistical analysis and bootstrapping

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

BIO5312 Biostatistics Lecture 5: Estimations

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

Chapter 5: Statistical Inference (in General)

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

χ 2 distributions and confidence intervals for population variance

Computer Statistics with R

Actuarial Mathematics and Statistics Statistics 5 Part 2: Statistical Inference Tutorial Problems

5.3 Interval Estimation

Point Estimation. Edwin Leuven

12 The Bootstrap and why it works

Simple Random Sampling. Sampling Distribution

Learning Objectives for Ch. 7

1 Introduction 1. 3 Confidence interval for proportion p 6

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Normal Probability Distributions

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Chapter 7. Inferences about Population Variances

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Practice Exercises for Midterm Exam ST Statistical Theory - II The ACTUAL exam will consists of less number of problems.

STAT Chapter 6: Sampling Distributions

Module 4: Point Estimation Statistics (OA3102)

Chapter 8 Statistical Intervals for a Single Sample

Statistics and Their Distributions

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

Non-informative Priors Multiparameter Models

Much of what appears here comes from ideas presented in the book:

Exercise. Show the corrected sample variance is an unbiased estimator of population variance. S 2 = n i=1 (X i X ) 2 n 1. Exercise Estimation

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

STAT Chapter 7: Confidence Intervals

1 Inferential Statistic

may be of interest. That is, the average difference between the estimator and the truth. Estimators with Bias(ˆθ) = 0 are called unbiased.

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Comparing the Means of. Two Log-Normal Distributions: A Likelihood Approach

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Likelihood Methods of Inference. Toss coin 6 times and get Heads twice.

Analysis of truncated data with application to the operational risk estimation

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

The Constant Expected Return Model

MAS6012. MAS Turn Over SCHOOL OF MATHEMATICS AND STATISTICS. Sampling, Design, Medical Statistics

Homework Problems Stat 479

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

STRESS-STRENGTH RELIABILITY ESTIMATION

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

CIVL Confidence Intervals

Confidence Intervals for an Exponential Lifetime Percentile

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Lecture 22. Survey Sampling: an Overview

Confidence Intervals. σ unknown, small samples The t-statistic /22

STATISTICS and PROBABILITY

MgtOp S 215 Chapter 8 Dr. Ahn

Chapter Seven: Confidence Intervals and Sample Size

Parameter Estimation II

Statistics Class 15 3/21/2012

Statistics 13 Elementary Statistics

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Estimation Y 3. Confidence intervals I, Feb 11,

Two Populations Hypothesis Testing

Lecture 2. Probability Distributions Theophanis Tsandilas

Dealing with forecast uncertainty in inventory models

Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions:

6 Central Limit Theorem. (Chs 6.4, 6.5)

LET us say we have a population drawn from some unknown probability distribution f(x) with some

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Transcription:

Applied Statistics I Liang Zhang Department of Mathematics, University of Utah July 14, 2008 Liang Zhang (UofU) Applied Statistics I July 14, 2008 1 / 18

Point Estimation Liang Zhang (UofU) Applied Statistics I July 14, 2008 2 / 18

Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? Liang Zhang (UofU) Applied Statistics I July 14, 2008 2 / 18

Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? There are a few criteria for us to select the best point estimator: Liang Zhang (UofU) Applied Statistics I July 14, 2008 2 / 18

Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? There are a few criteria for us to select the best point estimator: unbiasedness, Liang Zhang (UofU) Applied Statistics I July 14, 2008 2 / 18

Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? There are a few criteria for us to select the best point estimator: unbiasedness, minimum variance, Liang Zhang (UofU) Applied Statistics I July 14, 2008 2 / 18

Point Estimation Problem: when there are more then one point estimator for parameter θ, which one of them should we use? There are a few criteria for us to select the best point estimator: unbiasedness, minimum variance, and mean square error. Liang Zhang (UofU) Applied Statistics I July 14, 2008 2 / 18

Point Estimation Liang Zhang (UofU) Applied Statistics I July 14, 2008 3 / 18

Point Estimation Definition A point estimator ˆθ is said to be an unbiased estimator of θ if E(ˆθ) = θ for every possible value of θ. If ˆθ is not unbiased, the difference E(ˆθ) θ is called the bias of ˆθ. Liang Zhang (UofU) Applied Statistics I July 14, 2008 3 / 18

Point Estimation Definition A point estimator ˆθ is said to be an unbiased estimator of θ if E(ˆθ) = θ for every possible value of θ. If ˆθ is not unbiased, the difference E(ˆθ) θ is called the bias of ˆθ. Principle of Unbiased Estimation When choosing among several different estimators of θ, select one that is unbiased. Liang Zhang (UofU) Applied Statistics I July 14, 2008 3 / 18

Point Estimation Liang Zhang (UofU) Applied Statistics I July 14, 2008 4 / 18

Point Estimation Proposition Let X 1, X 2,..., X n be a random sample from a distribution with mean µ and variance σ 2. Then the estimators ˆµ = X = n i=1 X i n and ˆσ 2 = S 2 = n i=1 (X i X ) 2 n 1 are unbiased estimator of µ and σ 2, respectively. If in addition the distribution is continuous and symmetric, then X and any trimmed mean are also unbiased estimators of µ. Liang Zhang (UofU) Applied Statistics I July 14, 2008 4 / 18

Point Estimation Liang Zhang (UofU) Applied Statistics I July 14, 2008 5 / 18

Point Estimation Principle of Minimum Variance Unbiased Estimation Among all estimators of θ that are unbiased, choose the one that has minimum variance. The resulting ˆθ is called the minimum variance unbiased estimator ( MVUE) of θ. Liang Zhang (UofU) Applied Statistics I July 14, 2008 5 / 18

Point Estimation Principle of Minimum Variance Unbiased Estimation Among all estimators of θ that are unbiased, choose the one that has minimum variance. The resulting ˆθ is called the minimum variance unbiased estimator ( MVUE) of θ. Theorem Let X 1, X 2,..., X n be a random sample from a normal distribution with mean µ and variance σ 2. Then the estimator ˆµ = X is the MVUE for µ. Liang Zhang (UofU) Applied Statistics I July 14, 2008 5 / 18

Point Estimation Liang Zhang (UofU) Applied Statistics I July 14, 2008 6 / 18

Point Estimation Definition Let ˆθ be a point estimator of parameter θ. Then the quantity E[(ˆθ θ) 2 ] is called the mean square error (MSE) of ˆθ. Liang Zhang (UofU) Applied Statistics I July 14, 2008 6 / 18

Point Estimation Definition Let ˆθ be a point estimator of parameter θ. Then the quantity E[(ˆθ θ) 2 ] is called the mean square error (MSE) of ˆθ. Proposition MSE = E[(ˆθ θ) 2 ] = V (ˆθ) + [E(ˆθ) θ] 2 Liang Zhang (UofU) Applied Statistics I July 14, 2008 6 / 18

Point Estimation Liang Zhang (UofU) Applied Statistics I July 14, 2008 7 / 18

Point Estimation Definition The standard error of an estimator ˆθ is its standard deviation σ ˆθ = V (ˆθ). If the standard error itself involves unknown parameters whose values can be estimated, substitution of these estimates into σˆθ yields the estimated standard error (estimated standard deviation) of the estimator. The estimated standard error can be denoted either by ˆσ ˆθ or by s ˆθ. Liang Zhang (UofU) Applied Statistics I July 14, 2008 7 / 18

Methods of Point Estimation Liang Zhang (UofU) Applied Statistics I July 14, 2008 8 / 18

Methods of Point Estimation The Invariance Principle Let ˆθ be the mle of the parameter θ. Then the mle of any function h(θ) of this parameter is the function h(ˆθ). Liang Zhang (UofU) Applied Statistics I July 14, 2008 8 / 18

Methods of Point Estimation The Invariance Principle Let ˆθ be the mle of the parameter θ. Then the mle of any function h(θ) of this parameter is the function h(ˆθ). Proposition Under very general conditions on the joint distribution of the sample, when the sample size n is large, the maximum likelihood estimator of any parameter θ is approximately unbiased [E(ˆθ) θ] and has variance that is nearly as small as can be achieved by any estimator. Stated another way, the mle ˆθ is approximately the MVUE of θ. Liang Zhang (UofU) Applied Statistics I July 14, 2008 8 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 9 / 18

Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 Liang Zhang (UofU) Applied Statistics I July 14, 2008 9 / 18

Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 We know that both MME and MLE for the population mean µ is the sample mean X, i.e. ˆµ = X = 64.95. How accurate is this estimation? Liang Zhang (UofU) Applied Statistics I July 14, 2008 9 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 X is normally distributed with mean µ and variance σ 2 /n. Therefore, Z = X µ σ/ is a standard normal random variable. n Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 X is normally distributed with mean µ and variance σ 2 /n. Therefore, Z = X µ σ/ is a standard normal random variable. n For the interval [ A, A], how large should A be such that with 95% confidence we are sure Z falls in that interval? Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 X is normally distributed with mean µ and variance σ 2 /n. Therefore, Z = X µ σ/ is a standard normal random variable. n For the interval [ A, A], how large should A be such that with 95% confidence we are sure Z falls in that interval? P( A < Z < A) =.95 Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 X is normally distributed with mean µ and variance σ 2 /n. Therefore, Z = X µ σ/ is a standard normal random variable. n For the interval [ A, A], how large should A be such that with 95% confidence we are sure Z falls in that interval? P( A < Z < A) =.95 A is the 97.5the percentle, which is 1.96. Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 X is normally distributed with mean µ and variance σ 2 /n. Therefore, Z = X µ σ/ is a standard normal random variable. n For the interval [ A, A], how large should A be such that with 95% confidence we are sure Z falls in that interval? P( A < Z < A) =.95 A is the 97.5the percentle, which is 1.96. ( ) P 1.96 < X µ σ/ n < 1.96 =.95 Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 X is normally distributed with mean µ and variance σ 2 /n. Therefore, Z = X µ σ/ is a standard normal random variable. n For the interval [ A, A], how large should A be such that with 95% confidence we are sure Z falls in that interval? P( A < Z < A) =.95 A is the 97.5the percentle, which is 1.96. ( ) P 1.96 < X µ σ/ n < 1.96 =.95 ) σ P (X 1.96 n σ < µ < X + 1.96 n =.95 Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 X is normally distributed with mean µ and variance σ 2 /n. Therefore, Z = X µ σ/ is a standard normal random variable. n For the interval [ A, A], how large should A be such that with 95% confidence we are sure Z falls in that interval? P( A < Z < A) =.95 A is the 97.5the percentle, which is 1.96. ( ) P 1.96 < X µ σ/ n < 1.96 =.95 ) σ P (X 1.96 n σ < µ < X + 1.96 n =.95 ) σ The interval (X 1.96 n σ, X + 1.96 n is called the 95% confidence interval for µ. Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Assume the other parameter σ is known, e.g. σ = 2.7 X is normally distributed with mean µ and variance σ 2 /n. Therefore, Z = X µ σ/ is a standard normal random variable. n For the interval [ A, A], how large should A be such that with 95% confidence we are sure Z falls in that interval? P( A < Z < A) =.95 A is the 97.5the percentle, which is 1.96. ( ) P 1.96 < X µ σ/ n < 1.96 =.95 ) σ P (X 1.96 n σ < µ < X + 1.96 n =.95 ) σ The interval (X 1.96 n σ, X + 1.96 n is called the 95% confidence interval for µ. In our case, 95% confidence interval for µ is (63.28, 66.62). Liang Zhang (UofU) Applied Statistics I July 14, 2008 10 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 11 / 18

Interpretation of Confidence Interval Liang Zhang (UofU) Applied Statistics I July 14, 2008 11 / 18

Interpretation of Confidence Interval The 95% confidence interval for µ (63.28, 66.62) doesn t mean P(µ falls in the interval(63.28, 66.62)) =.95 Liang Zhang (UofU) Applied Statistics I July 14, 2008 11 / 18

Interpretation of Confidence Interval The 95% confidence interval for µ (63.28, 66.62) doesn t mean P(µ falls in the interval(63.28, 66.62)) =.95 It is a long-run effect: if we have 1000 random samples, then for approximately 950 of them, µ falls in the interval (X 1.96 σ n, X + 1.96 σ n ). Liang Zhang (UofU) Applied Statistics I July 14, 2008 11 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 12 / 18

Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ for the normal distribution is unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 We know that both MME and MLE for the population mean µ is the sample mean X, i.e. ˆµ = X = 64.95. We further assume the standard deviation is known to be σ = 2.7. What is the 99% confidence interval for µ? Liang Zhang (UofU) Applied Statistics I July 14, 2008 12 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 13 / 18

Definition A 100(1 α)% confidence interval for the mean µ of a normal population when the value of σ is known is given by ( x z α/2 σ n, x + z α/2 ) σ n or, equivalently, by x z α/2 σ n Liang Zhang (UofU) Applied Statistics I July 14, 2008 13 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 14 / 18

Graphically interpretation: Liang Zhang (UofU) Applied Statistics I July 14, 2008 14 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 15 / 18

Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ for the normal distribution is unknown. Thus we decide to do an experiment in which we manufacture n components to estimate the population mean µ. We know that both MME and MLE for the population mean µ is the sample mean X, i.e. ˆµ = X. We further assume the standard deviation is known to be σ = 2.7. If we want a 99% confidence interval for µ with width 3.34, how large should n be? Liang Zhang (UofU) Applied Statistics I July 14, 2008 15 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 16 / 18

Proposition To obtain a 100(1 α)% confidence interval with width w for the mean µ of a normal population when the value of σ is known, we need a random sample of size at least ( n = 2z α/2 σ ) 2 w Liang Zhang (UofU) Applied Statistics I July 14, 2008 16 / 18

Proposition To obtain a 100(1 α)% confidence interval with width w for the mean µ of a normal population when the value of σ is known, we need a random sample of size at least ( n = 2z α/2 σ ) 2 w Remark: The half-width w 2 of the 100(1 α)% CI is called the bound on the error of estimation associated with a 100(1 α)% confidence level. Liang Zhang (UofU) Applied Statistics I July 14, 2008 16 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 17 / 18

Example: Extensive experience with fans of a certain type used in diesel engines has suggested that the exponential distribution provides a good model for time until failure. However, the parameter λ is unknown. The following table records the data for a size 10 sample: 1 2 3 4 5 time 1.199 0.105 0.373 0.266 0.888 6 7 8 9 10 time 0.574 0.244 0.008 0.689 0.235 What is a 95% confidence interval for λ? Liang Zhang (UofU) Applied Statistics I July 14, 2008 17 / 18

Liang Zhang (UofU) Applied Statistics I July 14, 2008 18 / 18

Proposition Let X 1, X 2,..., X n i.i.d random variables from an expentional distribution with parameter λ. Then the random variable Y = 2λ n i=1 X i has the chi-squared distribution with 2n degrees of freedom, i.e., Y χ 2 (2n) Liang Zhang (UofU) Applied Statistics I July 14, 2008 18 / 18