Chapter 8: Sampling distributions of estimators Sections

Chapter 8 continued Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p. 476-478 8.4 The t distributions Skip: derivation of the pdf, p. 483-484 8.5 Confidence intervals 8.6 Bayesian Analysis of Samples from a Normal Distribution 8.7 Unbiased Estimators 8.8 Fisher Information Sampling Distributions 1 / 30

Review from Sections 8.1-8.4 Review from Sections 8.1-8.4 Chi-square distribution: χ 2 m, same as Gamma(α = m/2, β = 1/2) The t m distribution: If Y χ 2 m and Z N(0, 1) are independent Z then t m. Y /m Let X 1,..., X n be a random sample from N(µ, σ 2 ) If µ is known but σ is not: n σ 2 0 σ 2 χ 2 n where If both (µ, σ) are unknown: σ 2 0 = 1 n n σ 2 S n χ 2 n 1 where S n = 1 n n(x n µ) σ t n 1 where σ = n (X i µ) 2 i=1 n (X i X n ) 2 i=1 [ n i=1 (X i X n ) 2 n 1 ] 1/2 Sampling Distributions 2 / 30

8.5 Confidence intervals Confidence Interval A frequentist tool Say we want to estimate θ, or in general g(θ) We also want to know how good that estimate is. Def: Confidence Interval (CI) Let X 1,..., X n be a random sample from f (x θ), where θ is unknown (but not random). Let g(θ) be a real-valued function and let A and B be statistics where P (A < g(θ) < B) γ θ. The random interval (A, B) is called a 100γ% confidence interval for g(θ). If =, the CI is exact. After the random variables X 1,..., X n have been observed and the values of A = a and B = b have been computed, the interval (a, b) is called the observed confidence interval. Sampling Distributions 3 / 30

8.5 Confidence intervals Confidence Interval - Mean of a Normal Distribution Last time we saw the following example Let X 1,..., X n be a random sample from N(µ, σ 2 ) Let ( X n = 1 n n X i and σ i=1 = (X i X n ) 2 n n 1 i=1 Then we know that U = n(x n µ) σ ) 1/2 has the t n 1 distribution. We can therefore calculate γ = P( c < U < c). Turning this around, we get ) γ = P (X n c σ < µ < X n + c σ n n Sampling Distributions 4 / 30

8.5 Confidence intervals Confidence Interval - Mean of a Normal Distribution Let T m (x) denote the cdf of the t m distribution. Given γ we can find c so that P( c < U < c) = γ: γ = P( c < U < c) = 2T n 1 (c) 1 since the t distribution is symmetric around 0. Solving for c we get ( ) γ + 1 c = T 1 n 1 2 where T 1 n 1 is the quantile function for the t n 1 distribution. So a 100γ% confidence interval for µ is ( X n T 1 n 1 ( γ + 1 2 ) σ n, X n + T 1 n 1 ( γ + 1 2 ) σ n ) Sampling Distributions 5 / 30

Example Hotdogs Exercise 8.5.7 in the book 8.5 Confidence intervals Data on calorie content in 20 different beef hot dogs from Consumer Reports (June 1986 issue): 186, 181, 176, 149, 184, 190, 158, 139, 175, 148, 152, 111, 141, 153, 190, 157, 131, 149, 135, 132 Assume that these numbers are observed values from a random sample of twenty independent N(µ, σ 2 ) random variables, where µ and σ 2 are unknown. Observed sample mean and σ are X n = 156.85 and σ = 22.64201 Find a 95% confidence interval for µ Sampling Distributions 6 / 30

8.5 Confidence intervals Interpretation of a confidence interval Confidence intervals are a Frequentist tool We know that ( ( ) γ + 1 σ P X n T 1 n 1 < µ < X n + T 1 2 n n 1 ( γ + 1 After observing the data we observe the random interval 2 ) σ n ) = γ For example: (146.25, 167.45) is an observed 95% confidence interval for µ That does NOT mean that P(146.25 < µ < 167.45) = 0.95. For this statement to make sense we need Bayesian thinking and Bayesian methods. Sampling Distributions 7 / 30

8.5 Confidence intervals Interpretation of a confidence interval Confidence intervals are a Frequentist tool One way of thinking of this: Repeated samples. Take a random sample of size n from N(µ, σ 2 ) and calculate the 95% confidence interval Take another random sample (of the same size n) and do the same calculations. Repeat. Many times. Since there is a 95% chance that the random intervals cover the value of µ we expect 95% of the intervals to cover the actual value of µ Problem: We never take more than one sample! Sampling Distributions 8 / 30

8.5 Confidence intervals Properties of a confidence interval - Simulation Study I simulated n=20 r.v. from N(8, 2 2 ) and calculated the 95% CI I repeated that 100 times 4 of the 100 intervals do not cover µ = 8 (red intervals) Sampling Distributions 9 / 30

8.5 Confidence intervals Non-symmetric confidence intervals Mean of the normal distribution More generally we want to find P(c 1 < U < c 2 ) = γ Symmetric confidence interval: Equal probability on either side: P(U c 1 ) = P(U c 2 ) = 1 γ 2 Since the distribution of U is symmetric around 0, the shortest possible for µ is the symmetric confidence interval. One-sided confidence interval: All the extra probability is on one side. That is, either c 1 = or c 2 = Sampling Distributions 10 / 30

8.5 Confidence intervals One-sided Confidence Interval Def: Lower bound Let A be a statistic so that P(A < g(θ)) γ θ The random interval (A, ) is a one-sided 100γ% confidence interval for g(θ) A is a 100γ% lower confidence limit for g(θ) Sampling Distributions 11 / 30

8.5 Confidence intervals One-sided Confidence Interval Def: Upper bound Let B be a statistic so that P(g(θ) < B) γ θ The random interval (, B) is a one-sided 100γ% confidence interval for g(θ). B is a 100γ% upper confidence limit for g(θ) Sampling Distributions 12 / 30

8.5 Confidence intervals One-sided Confidence Interval - Mean of a normal Let X 1,..., X n be a random sample from N(µ, σ 2 ), both µ and σ 2 unknown. Find the one-sided 100γ% confidence intervals for µ Find the observed 95% upper confidence limit for µ for the hotdog example. Sampling Distributions 13 / 30

8.5 Confidence intervals Confidence intervals for other distributions Def: Pivotal Let X = (X 1,..., X n ) be a random sample from a distribution that depends on parameter θ. Let V (X, θ) be a random variable whose distribution is the same for all θ. Then V is called a pivotal quantity. To use this we need to be able to invert the pivotal relationship: find a function r(v, x) so that r(v (X, θ), X) = g(θ). If the function r is increasing in v for every x, V has a continuous distribution with cdf F(v) and γ 2 γ 1 = γ, then ( ) ( ) A = r F 1 (γ 1 ), X and B = r F 1 (γ 2 ), X are the endpoints of an exact 100γ% confidence interval (Theorem 8.5.3). Sampling Distributions 14 / 30

8.5 Confidence intervals Confidence interval using Pivotal quantities Example: The rate parameter θ of the exponential distribution X 1,..., X n i.i.d. Expo(θ) Find the γ% upper confidence limit for θ Find a symmetric γ% confidence interval for θ Example: Variance of the normal distribution X 1,..., X n i.i.d. N(µ, σ 2 ), both unknown. Find a symmetric γ% confidence interval for σ 2 Find the observed symmetric γ% confidence interval for σ 2 for the hotdog example Sampling Distributions 15 / 30

8.5 Confidence intervals Problems with interpretation of a confidence interval Example 8.5.11 is an interesting example. Say X 1, X 2 are i.i.d. Uniform(θ 0.5, θ + 0.5) Let Y 1 = min(x 1, X 2 ) and Y 2 = max(x 1, X 2 ). Then (Y 1, Y 2 ) is a 50% confidence interval for θ However: If we observe Y 1 and Y 2 that are more than 0.5 apart, that is y 2 y 1 > 0.5 then we know for a certainty that (y 1, y 2 ) contains θ! Yet we only assign 50% confidence to that interval, which ignores information we have. Sampling Distributions 16 / 30

Unbiased Estimators 8.7 Unbiased Estimators Suppose that we use an estimator δ(x) to estimate the parameter g(θ) Properties of an estimator (so far): consistency and sufficiency Another property of an estimator: unbiasedness Def: Unbiased Estimator / Bias An estimator δ(x) is an unbiased estimator of g(θ) if E(δ(X)) = g(θ) θ. Otherwise it is called a biased estimator. The bias is defined as E(δ(X)) g(θ) Sampling Distributions 18 / 30

8.7 Unbiased Estimators Examples X 1,..., X n i.i.d. N(µ, σ 2 ). X n is an unbiased estimator of µ since E(X n ) = µ for all µ Unbiased estimators of mean and variance from any distribution: Let X 1,..., X n be a random sample from f (x θ). The mean and variance of the distribution (if exist) are functions of θ. X n is an unbiased estimator of the mean E(X 1 ) Theorem 8.7.1: If variance is finite then ˆσ 1 2 is an unbiased estimator of Var(X) where ˆσ 2 1 = 1 n 1 n (X i X n ) 2 Note: This means that the MLE of σ 2 in N(µ, σ 2 ) is a biased estimator i=1 Sampling Distributions 19 / 30

8.7 Unbiased Estimators Mean Square Error (MSE) Is unbiased good enough? Useless if the estimator has high variance Look for unbiased estimators with lowest variance Mean squared error: E ( (δ(x) g(θ)) 2) Want estimators with small MSE. Corollary 8.7.1 Let δ(x) be an estimator with finite variance. Then MSE(δ(X)) = Var(δ(X)) + bias(δ(x)) 2 the MSE of an unbiased estimator is equal to its variance. Searching for unbiased estimator with small variance is equivalent to searching for unbiased estimators with small MSE. Sampling Distributions 20 / 30

8.7 Unbiased Estimators Example Let X 1,..., X n be a random sample from N(µ, σ 2 ) (both µ and σ 2 are unknown). Consider two estimators of σ 2 δ 1 = S n (the MLE of σ 2 ) δ 2 = ˆσ 2 1 (unbiased) Find the MSE of each estimator. Which estimator has smaller MSE? Which estimator do you prefer? Sampling Distributions 21 / 30

8.7 Unbiased Estimators Why unbiased? Sounds good - who wants to be biased? However, the variance or MSE are better evaluators of quality of estimators In many cases there exist biased estimators with smaller MSE Sampling Distributions 22 / 30

8.8 Fisher Information Let the pdf of X be f (x θ) The Fisher information I(θ) in the random variable X is defined as { [d ] } log f (X θ) 2 I(θ) = E dθ Under mild conditions, we have (Theorem 8.8.1) [ ] [ d log f (X θ) d 2 ] log f (X θ) I(θ) = Var = E dθ dθ 2 For a random sample X 1,..., X n, the Fisher information I n (θ) satisfies that I n (θ) = ni(θ) Sampling Distributions 23 / 30

8.8 Fisher Information Cramér-Rao Inequality Let X 1,..., X n be a random sample from a distribution for which the pdf is f (x θ). For any statistic T, let m(θ) = E(T ). Then under mild conditions, we have Var(T ) [m (θ)] 2 ni(θ). (Corollary 8.8.1) If T is unbiased estimator of θ, then Var(T ) 1 ni(θ). Efficient estimator of its expectation: if an estimator achieves the lower bound in Cramér-Rao Inequality. Example: X 1,..., X n is a random sample from Poisson(θ). Show that the MLE is an efficient estimator of θ. Sampling Distributions 24 / 30

8.8 Fisher Information Asymptotic Distributions of MLE Theorem 8.8.5 Let ˆθ n be the MLE of θ, then under mild conditions, we have [ni(θ)] 1/2 (ˆθ n θ) d N(0, 1). MLE is asymptotically efficient Sampling Distributions 25 / 30

8.8 Fisher Information Chapter 8: Sampling distributions of estimators Sections 8.1 Sampling distribution of a statistic 8.2 The Chi-square distributions 8.3 Joint Distribution of the sample mean and sample variance Skip: p. 476-478 8.4 The t distributions Skip: derivation of the pdf, p. 483-484 8.5 Confidence intervals 8.6 Bayesian Analysis of Samples from a Normal Distribution 8.7 Unbiased Estimators 8.8 Fisher Information Sampling Distributions 26 / 30

8.6 Bayesian Analysis of Samples from a Normal Distribution Bayesian alternative to confidence intervals Bayesian inference is based on the posterior distribution. Reporting a whole distribution may not be what you (or your client) want Point estimates: Bayesian estimators Minimize the expected loss Interval estimates: simply use quantiles of the posterior distribution For example: We can find constants c 1 and c 2 so that P(c 1 < θ < c 1 X = x) γ The interval (c 1, c 2 ) is called a 100γ% Credible interval for θ Note: The interpretation is very different from interpretation of confidence intervals Sampling Distributions 27 / 30

8.6 Bayesian Analysis of Samples from a Normal Distribution Example: the normal distribution Let X 1,..., X n be a random sample for N(µ, σ 2 ) In Chapter 7.3 we saw: If σ 2 is known, the normal distribution is a conjugate prior for µ Theorem 7.3.3: If the prior is µ N(µ 0, ν0 2 ) the posterior of µ is also normal with mean and variance µ 1 = σ2 µ 0 + nν 2 0 x n σ 2 + nν 2 0 and ν 2 1 = σ2 ν 2 0 σ 2 + nν 2 0 We can obtain credible intervals for µ from this N(µ 1, ν 2 1 ) posterior distribution Sampling Distributions 28 / 30

8.6 Bayesian Analysis of Samples from a Normal Distribution Example: the normal distribution What if both µ and σ 2 are unknown? Use the joint distribution of µ and σ 2 as the prior; Conjugate priors are available: the Normal-Inverse Gamma distribution; To give credible intervals for µ and σ 2 individually we need the marginal posterior distributions Sampling Distributions 29 / 30

8.6 Bayesian Analysis of Samples from a Normal Distribution END OF CHAPTER 8 Sampling Distributions 30 / 30