Estimation Example: Cholesterol levels of heart-attack patients Data: Observational study at a Pennsylvania medical center blood cholesterol levels patients treated for heart attacks measurements 2, 4, and 14 days after the attack Id Y 1 Y 2 Y 3 Id Y 1 Y 2 Y 3 1 270 218 156 15 294 240 264 2 236 234 193 16 282 294 220 3 210 214 242 17 234 220 264 4 142 116 120 18 224 200 213 5 280 200 181 19 276 220 188 6 272 276 256 20 282 186 182 7 160 146 142 21 360 352 294 8 220 182 216 22 310 202 214 9 226 238 248 23 280 218 170 10 242 288 298 24 278 248 198 11 186 190 168 25 288 278 236 12 266 236 236 26 288 248 256 13 206 244 238 27 244 270 280 14 318 258 200 28 236 242 204 Aim: Make inference on distribution of cholesterol level 14 days after the attack: Y 3 decrease in cholesterol level: D = Y 1 Y 3 relative decrease in cholesterol level: R = Y 1 Y 3 Y 3 Confidence intervals I, Feb 11, 2004-1 -
Estimation Data: d 1,..., d 28 observed decrease in cholesterol level In this example, parameters of interest might be µ D = E(D) the mean decrease in cholesterol level, σ 2 D = var(d) the variation of the cholesterol level, p D = P(D 0) probability of no decrease in cholesterol level These parameters are naturally estimated by the following sample statistics: ˆµ D = 1 n d i (sample mean) n ˆσ 2 D = 1 n i=1 n (d i d) 2, (sample mean) i=1 ˆp D = #{d i d i 0} (sample proportion) n Such statistics are point estimators since they estimate the corresponding parameter by a single numerical value. Point estimates provide no information about their chance variation. Estimates without an indication of their variability are of limited value. Confidence intervals I, Feb 11, 2004-2 -
Confidence Intervals for the Mean Recall: CLT for the sample mean: For large n we have X N ( ) µ, σ2 n 68-95-99 rule: With 95% probability the sample differs from its mean µ by less that two standard deviations. More precisely, we have P ( µ 1.96 σ n X µ + 1.96 σ n ) = 0.95, or equivalently, after rearranging the terms, P ( X 1.96 σ n µ X + 1.96 σ n ) = 0.95. Interpretation: There is 95% probability that the random interval X 1.96 σ n, X + 1.96 σ n will cover the mean µ. Example: Cholesterol levels d = 36.89, σ = 51.00, n = 28. Therefore, the 95% confidence interval for µ is 18.00, 55.78. Confidence intervals I, Feb 11, 2004-3 -
Confidence Intervals for the Mean Assumption: The population standard deviation σ is known. In the next lecture, we will drop this unrealistic assumption. Assumption is approximately satisfied for large sample sizes, since then ˆσ σ by the law of large numbers. Definition: Confidence interval for µ (σ known) The interval X zα/2 σ n, X + z α/2 σ n is called a 1 α confidence interval for the population mean µ. (1 α) is the confidence level. For large sample sizes n, an approximate (1 α) confidence interval for µ is given by X z α/2 ˆσ n, X + z α/2 ˆσ n. Here, z α is the α-critical value of the standard normal distribution: z α has area α to its right Φ(z α ) = 1 α f(x) 0.4 0.3 0.2 0.1 0.0 3 2 1 α 0 1 z 2 z α 3 Confidence intervals I, Feb 11, 2004-4 -
Confidence Interval for the Mean Example: Community banks Community banks are banks with less than a billion dollars of assets. Approximately 7500 such banks in the United States. Annual survey of the Community Bankers Council of the American Bankers Association (ABA) Population: Community banks in the United States. Variable of interest: Total assets of community banks. Sample size: n = 110 Sample mean: X = 220 millions of dollars Sample standard deviation: SD = 161 millions of dollars Histogram of sampled values: 20 Assets of Community Banks in the U.S. (sample of 110 community banks) 15 Frequency 10 5 0 0 100 200 300 400 500 600 700 800 900 1000 Assets (in millions of dollars) Suppose we want to give a 95% confidence interval for the mean total assets of all community banks in the United States. α = 0.05, z α/2 = 1.96 A 95% confidence interval for the mean assets (in millions of dollars) is 161 161 220 1.96, 220 + 1.96 190, 250. 110 110 Confidence intervals I, Feb 11, 2004-5 -
Example: Cholesterol levels Sample Size Suppose we want a 99% confidence interval for the decrease in cholesterol level: α = 0.01, z 0.005 = 2.58 The 99% confidence interval for µ D is 36.89 2.58 50.93, 36.89 + 2.58 50.93 12.06, 61.72. 28 28 Note: If we raise the confidence level, the confidence interval becomes wider. Suppose we want to obtain increase the confidence level without increasing the error of estimation (indicated by the half-width of the confidence interval). For this we have to increase the sample size n. Question: What sample size n is needed to estimate the mean decrease in cholesterol with error e = 20 and confidence level 99%? The error (half-width of the confidence interval) is e = z α/2 σ n Therefore the sample size n e needed is given by ( zα/2 ) σ 2 ( ) 2.58 50.93 2 n e = = 43.16, e 20 that is, a sample of 44 patients is needed to estimate µ D with error e = 20 and 99% confidence. Confidence intervals I, Feb 11, 2004-6 -
Estimation of the Mean Example: Banks loan-to-deposit ratio The ABA survey of community banks also asked about the loan-to-deposit ratio (LTDR), a bank s total loans as a percent of its total deposits. Sample statistics: n = 110 ˆµ LTDR = 76.7 ˆσ LTDR = 12.3 Frequency 18 15 12 9 6 3 Loan To Deposit Ratio of Community Banks (sample of 110 community banks) 0 50 60 70 80 90 100 110 120 LTDR (in %) Construction of 95% confidence interval: α = 0.05, z α/2 = 1.96 Standard error σ X = σ LT DR n = 1.17 95% confidence interval for µ LTDR : σ LT DR X zα/2, X σ LT DR + z α/2 = 74.4, 79.0 n n To get an estimation with error e = 3.0 (half-width of confidence interval) it suffices to sample n e banks, ( ) zα/2 σ 2 ( ) 2 LT DR 1.96 12.3 n e = = 64.6. e 3.0 Thus a sample of n e = 65 banks it sufficient. Confidence intervals I, Feb 11, 2004-7 -