Confidence Intervals

Similar documents
χ 2 distributions and confidence intervals for population variance

1 Introduction 1. 3 Confidence interval for proportion p 6

Chapter 8 Statistical Intervals for a Single Sample

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions:

BIO5312 Biostatistics Lecture 5: Estimations

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

1 Inferential Statistic

1 Small Sample CI for a Population Mean µ

MVE051/MSG Lecture 7

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Lecture 39 Section 11.5

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Confidence Intervals Introduction

5.3 Interval Estimation

Lecture 9 - Sampling Distributions and the CLT

Lecture 10 - Confidence Intervals for Sample Means

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

Learning Objectives for Ch. 7

Two Populations Hypothesis Testing

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Applied Statistics I

Statistics for Business and Economics

Data Analysis and Statistical Methods Statistics 651

Statistical Intervals (One sample) (Chs )

Lecture 2 INTERVAL ESTIMATION II

Lecture 6: Confidence Intervals

STAT Chapter 7: Confidence Intervals

Sampling Distributions

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

An approximate sampling distribution for the t-ratio. Caution: comparing population means when σ 1 σ 2.

Statistics Class 15 3/21/2012

Statistics and Probability

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Review. Binomial random variable

Chapter 5. Sampling Distributions

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

Chapter 7: Point Estimation and Sampling Distributions

Review of commonly missed questions on the online quiz. Lecture 7: Random variables] Expected value and standard deviation. Let s bet...

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Business Statistics 41000: Probability 3

Lecture 9 - Sampling Distributions and the CLT. Mean. Margin of error. Sta102/BME102. February 6, Sample mean ( X ): x i

Chapter 7. Sampling Distributions and the Central Limit Theorem

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Soc 709 Lec 2 Inferences from Regression

Estimation Y 3. Confidence intervals I, Feb 11,

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

(5) Multi-parameter models - Summarizing the posterior

STA 103: Final Exam. Print clearly on this exam. Only correct solutions that can be read will be given credit.

1. Variability in estimates and CLT

Chapter 8 Estimation

E509A: Principle of Biostatistics. GY Zou

But suppose we want to find a particular value for y, at which the probability is, say, 0.90? In other words, we want to figure out the following:

The Two-Sample Independent Sample t Test

MATH 3200 Exam 3 Dr. Syring

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Distribution. Lecture 34 Section Fri, Oct 31, Hampden-Sydney College. Student s t Distribution. Robb T. Koether.

Chapter 5: Statistical Inference (in General)

Chapter 5. Statistical inference for Parametric Models

II. Random Variables

Lecture 2. Probability Distributions Theophanis Tsandilas

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Modeling Co-movements and Tail Dependency in the International Stock Market via Copulae

Confidence Intervals. σ unknown, small samples The t-statistic /22

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

(# of die rolls that satisfy the criteria) (# of possible die rolls)

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

STATISTICS - CLUTCH CH.9: SAMPLING DISTRIBUTIONS: MEAN.

Data Analysis and Statistical Methods Statistics 651

8.3 CI for μ, σ NOT known (old 8.4)

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Section 1.3: More Probability and Decisions: Linear Combinations and Continuous Random Variables

Statistics & Statistical Tests: Assumptions & Conclusions

Lecture 12: The Bootstrap

Descriptive Statistics (Devore Chapter One)

CIVL Confidence Intervals

Discrete Random Variables

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Experimental Design and Statistics - AGA47A

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

MAS6012. MAS Turn Over SCHOOL OF MATHEMATICS AND STATISTICS. Sampling, Design, Medical Statistics

Sampling Distribution

Fat tails and 4th Moments: Practical Problems of Variance Estimation

Continuous random variables

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Converting to the Standard Normal rv: Exponential PDF and CDF for x 0 Chapter 7: expected value of x

Multiple regression - a brief introduction

Chapter 8: Sampling distributions of estimators Sections

GARCH Models. Instructor: G. William Schwert

Elementary Statistics Triola, Elementary Statistics 11/e Unit 14 The Confidence Interval for Means, σ Unknown

Chapter 7.2: Large-Sample Confidence Intervals for a Population Mean and Proportion. Instructor: Elvan Ceyhan

Random variables. Discrete random variables. Continuous random variables.

Homework Assignments

Transcription:

Confidence Intervals Review If X 1,...,X n have mean µ and SD σ, then E(X) =µ SD(X) =σ/ n no matter what if the X s are independent If X 1,...,X n are iid Normal(mean=µ, SD=σ), then X Normal(mean = µ, SD = σ/ n). If X 1,...,X n are iid with mean µ and SD σ and the sample size n is large, then X Normal(mean = µ, SD = σ/ n).

Confidence intervals Suppose we measure the log 10 cytokine response in 100 male mice of a certain strain, and find that the sample average ( x) is 3.52 and sample SD (s) is 1.61. Our estimate of the SE of the sample mean is 1.61/ 100 = 0.161. A95% confidenceintervalforthepopulationmean(µ) is roughly 3.52 ± (2 0.16) =3.52± 0.32 = (3.20, 3.84). What does this mean? Confidence intervals Suppose that X 1,...,X n are iid Normal(mean=µ, SD=σ). Suppose that we actually know σ. Then X Normal(mean=µ, SD=σ/ n) σ is known but µ is not! How close is X to µ? ( ) X µ Pr σ/ n 1.96 =95% σ µ n ( 1.96 σ Pr n ( Pr X 1.96 σ n X µ 1.96 σ n ) =95% µ X + 1.96 σ n ) =95%

What is a confidence interval? A95%confidenceintervalisanintervalcalculatedfromthedata that in advance has a 95% chance of covering the population parameter. In advance, X ± 1.96σ/ n has a 95% chance of covering µ. Thus, it is called a 95% confidence interval for µ. Note that, after the data is gathered (for instance, n=100, x = 3.52, σ = 1.61), the interval becomes fixed: x ± 1.96σ/ n=3.52± 0.32. We can t say that there s a 95% chance that µ is in the interval 3.52 ± 0.32. It either is or it isn t; we just don t know. What is a confidence interval? 500 confidence intervals for µ (σ known) 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

Longer and shorter intervals If we use 1.64 in place of 1.96, we get shorter intervals with lower confidence. ( ) X µ Since Pr σ/ n 1.64 =90%, X ± 1.64σ/ n is a 90% confidence interval for µ. If we use 2.58 in place of 1.96, we get longer intervals with higher confidence. ( ) X µ Since Pr σ/ n 2.58 =99%, X ± 2.58σ/ n is a 99% confidence interval for µ. What is a confidence interval? (cont) A95%confidenceintervalisobtainedfromaprocedureforproducing an interval, based on data, that 95% of the time will produce an interval covering the population parameter. In advance, there s a 95% chance that the interval will cover the population parameter. After the data has been collected, the confidence interval either contains the parameter or it doesn t. Thus we talk about confidence rather than probability.

But we don t know the SD Use of X ± 1.96 σ/ nasa95%confidenceintervalforµ requires knowledge of σ. That the above is a 95% confidence interval for µ is a result of the following: X µ σ/ n Normal(0,1) What if we don t know σ? We plug in the sample SD S, but then we need to widen the intervals to account for the uncertainty in S. What is a confidence interval? (cont) 500 BAD confidence intervals for µ (σ unknown) 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

What is a confidence interval? (cont) 500 confidence intervals for µ (σ unknown) 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 The Student t distribution If X 1, X 2,...X n are iid Normal(mean=µ, SD=σ), then X µ S/ n t(df = n 1) Discovered by William Gossett ( Student ) who worked for Guinness. In R, use the functions pt(), qt(), and dt(). df=2 df=4 df=14 normal qt(0.975,9) returns 2.26 (compare to 1.96) pt(1.96,9)-pt(-1.96,9) returns 0.918 (compare to 0.95) 4 2 0 2 4

The t interval If X 1,...,X n are iid Normal(mean=µ, SD=σ), then X ± t(α/2, n 1) S/ n is a 1 α confidence interval for µ. t(α/2, n 1) is the 1 α/2 quantile of the t distribution with n 1 degrees of freedom. α 2 4 2 0 2 4 t(α 2, n 1) In R: qt(0.975,9) for the case n=10, α=5%. Example 1 Suppose we have measured the log 10 cytokine response of 10 mice, and obtained the following numbers: Data 0.2 1.3 1.4 2.3 4.2 4.7 4.7 5.1 5.9 7.0 x =3.68 s = 2.24 n = 10 qt(0.975,9) =2.26 95% confidence interval for µ (the population mean): 3.68 ± 2.26 2.24 / 10 3.68 ± 1.60 = (2.1, 5.3) 95% CI s 0 1 2 3 4 5 6 7

Example 2 Suppose we have measured (by RT-PCR) the log 10 expression of a gene in 3 tissue samples, and obtained the following numbers: Data 1.17 6.35 7.76 x =5.09 s = 3.47 n=3 qt(0.975,2) =4.30 95% confidence interval for µ (the population mean): 5.09 ± 4.30 3.47 / 3 5.09 ± 8.62 = ( 3.5, 13.7) 95% CI s 0 5 10 Example 3 Suppose we have weighed the mass of tumor in 20 mice, and obtained the following numbers Data 34.9 28.5 34.3 38.4 29.6 28.2 25.3...... 32.1 x =30.7 s = 6.06 n = 20 qt(0.975,19) =2.09 95% confidence interval for µ (the population mean): 30.7 ± 2.09 6.06 / 20 30.7 ± 2.84 = (27.9, 33.5) 95% CI s 20 25 30 35 40

Confidence interval for the mean Population distribution Z = (X µ) (σ n) σ µ Distribution of X 0 z α 2 t = (X µ) (S n) σ n µ 0 t α 2 X 1, X 2,...,X n independent Normal(µ, σ). 95% confidence interval for µ: X ± t S/ n where t = 97.5 percentile of t distribution with (n 1) d.f. Differences between means Suppose I measure the treatment response on 10 mice from strain Aand10micefromstrainB. How different are the responses of the two strains? Iamnotinterestedintheseparticular mice, but in the strains generally. A B 500 1000 1500 2000 2500 IL10 A B 500 1000 1500 2000 2500 IL10 (on log scale)

X Y Suppose that X 1, X 2,...,X n Y 1, Y 2,...,Y m are iid Normal(mean=µ A, SD=σ), and are iid Normal(mean=µ B, SD=σ). Then E(X Y ) =E(X) E(Y ) = µ A µ B SD(X Y ) = SD(X) 2 + SD(Y ) 2 = ( ) 2 ( ) 2 σ σ 1 + m = σ n n + 1 m Note: If n = m, then SD(X Y )=σ 2/n. Pooled estimate of the population SD We have two different estimates of the populations SD, σ: ˆσ A = S A = (X (Y i X) 2 n 1 ˆσ B = S B = i Y ) 2 m 1 We can use all of the data together to obtain an improved estimate of σ, which we call the pooled estimate. (X i X) ˆσ pooled = 2 + (Y i Y ) 2 n + m 2 S 2 = A(n 1)+S 2 B(m 1) n + m 2 Note: If n = m, then ˆσ pooled = ( ) S 2 A + S 2 B /2

Estimated SE of (X Y ) ŜD(X Y ) = ˆσ pooled 1 n + 1 m = [ ] S 2 A(n 1)+S 2 [ B(m 1) 1 n + m 2 n + 1 ] m In the case n = m, ŜD(X Y )= S 2 A + S 2 B n CI for the difference between the means (X Y ) (µ A µ B ) ŜD(X Y ) t(df = n + m 2) The procedure: 1. Calculate (X Y ). 2. Calculate ŜD(X Y ). 3. Find the 97.5 percentile of the t distr n with n + m 2 d.f. t 4. Calculate the interval: (X Y ) ± t ŜD(X Y ).

Example Strain A: Strain B: 2.67 2.86 2.87 3.04 3.09 3.09 3.13 3.27 3.35 n = 9, x 3.04, s A 0.214 3.78 3.06 3.64 3.31 3.31 3.51 3.22 3.67 m = 8, ȳ 3.44, s B 0.250 ˆσ pooled = s 2 A (n 1) + s2 B (m 1) n + m 2 =... 0.231 ŜD(X Y )=ˆσ pooled 1 n + 1 m =... 0.112 97.5 percentile of t(df=15) 2.13 Example 95% confidence interval: (3.04 3.44) ± 2.13 0.112 0.40± 0.24 = ( 0.64, 0.16). The data A B 2.8 3.0 3.2 3.4 3.6 3.8 Confidence interval for µ A µ B 1.2 1.0 0.8 0.6 0.4 0.2 0.0

Example Strain A: n = 10 sample mean: x = 55.22 sample SD: s A = 7.64 t value = qt(0.975, 9) = 2.26 95% CI for µ A : 55.22 ± 2.26 7.64 / 10 = 55.2 ± 5.5 = (49.8, 60.7) Strain B: n = 16 sample mean: x = 68.2 sample SD: s B = 18.1 t value = qt(0.975, 15) = 2.13 95% CI for µ B : 68.2 ± 2.13 18.1 / 16 = 68.2 ± 9.7 = (58.6, 77.9) Example ˆσ pooled = (7.64) 2 (10 1)+(18.1) 2 (16 1) 10+16 2 = 15.1 ŜD(X Y )=ˆσ pooled 1 n + 1 m = 15.1 tvalue:qt(0.975, 10+16-2) =2.06 1 10 + 1 16 = 6.08 95% confidence interval for µ A µ B : (55.2 68.2) ± 2.06 6.08 = 13.0 ± 12.6 = ( 25.6, 0.5)

Example A B 40 50 60 70 80 90 100 CI for µ A µ B 30 20 10 0 10 20 30 One problem What if the two populations really have different SDs, σ A and σ B? Suppose that X 1, X 2,...,X n are iid Normal(µ A, σ A ), Y 1, Y 2,...,Y m are iid Normal(µ B, σ B ). Then SD(X Y )= σa 2 n + σ2 B m ŜD(X Y )= S 2 A n + S2 B m The problem: (X Y ) (µ A µ B ) ŜD(X Y ) does not follow a t distribution.

An approximation In the case that σ A σ B : Let k = ( s 2 A n + s2 B m ) 2 (s 2 A /n ) 2 n 1 + ( s 2 B /m ) 2 m 1 Let t be the 97.5 percentile of the t distribution with k d.f. Use (X Y ) ± t ŜD(X Y ) as a 95% confidence interval. Example k= [(7.64)2 /10 +(18.1) 2 /16] 2 [(7.64) 2 /10] 2 9 + [(18.1)2 /16] 2 15 = (5.84 + 20.6)2 (5.84) 2 9 + (20.6)2 15 =21.8. tvalue=qt(0.975, 21.8) =2.07. ŜD(X Y )= s 2 A n + s2 B (7.64) m = 2 + (18.1)2 10 16 = 5.14. 95% CI for µ A µ B : 13.0 ± 2.07 5.14 = 13.0 ± 10.7 = ( 23.7, 2.4)

Example A B 40 50 60 70 80 90 100 New CI for µ A µ B Prev CI for µ A µ B 30 20 10 0 10 20 30 Degrees of freedom One sample of size n: X 1, X 2,...,X n (X µ)/(s/ n) t(df = n 1) Two samples, of size n and m: X 1, X 2,...,X n Y 1, Y 2,...,Y m (X Y ) (µ A µ B ) ˆσ pooled 1 n + 1 m t(df = n + m 2) What are these degrees of freedom?

Degrees of freedom The degrees of freedom concern our estimate of the population standard deviation We use the residuals (X 1 X),...,(X n X) to estimate σ. But we really only have n 1 independent data points ( degrees of freedom ), since (X i X) =0. In the two-sample case, we use (X 1 X), (X 2 X),...,(X n X) and (Y 1 Y ),...,(Y m Y ) to estimate σ. But (X i X) =0and (Y i Y )=0, and so we really have just n + m 2 independent data points. Confidence interval for the population SD Suppose we observe X 1, X 2,...,X n iid Normal(µ, σ). Suppose we wish to create a 95% CI for the population SD, σ. Our estimate of σ is the sample SD, S. The sampling distribution of S is such that (n 1)S 2 χ 2 (df = n 1) σ 2 df = 4 df = 9 df = 19 0 5 10 15 20 25 30

Confidence interval for the population SD Choose L and U such that ) Pr (L (n 1)S2 U =95%. σ 2 Pr ( 1 U ( Pr (n 1)S 2 U ( n 1 Pr S U ) σ2 1 (n 1)S 2 L = 95%. ) σ 2 (n 1)S2 L = 95%. σ S ) n 1 L = 95%. 0 L U ( n 1 S U, S ) n 1 L is a 95% CI for σ. Example Strain A: n = 10; sample SD: s A = 7.64 95% CI for σ A : 9 (7.64 19.0, 7.64 L = qchisq(0.025,9) = 2.70 U = qchisq(0.975,9) = 19.0 9 2.70 ) = (7.64 0.69, 7.64 1.83) = (5.3, 14.0) Strain B: n = 16; sample SD: s B = 18.1 95% CI for σ B : 15 (18.1 27.5, 18.1 L = qchisq(0.025,15) = 6.25 U = qchisq(0.975,15) = 27.5 15 6.25 ) = (18.1 0.74, 18.1 1.55) = (13.4, 28.1)

Confidence interval for what..? Estimates of the speed of light, with confidence intervals. 93.3 93.2 Speed of light [million mph] 93.1 93.0 92.9 92.8 1900 1910 1920 1930 1940 1950 year Youden W (1972). Technometrics 14: 1-11. Confidence interval for what..? jl0303b_11 jl0303b_12 se0325b_04 se0325b_05 se0325b_06 jl0309b_00 jl0309b_01 jl0309b_02 oc0301b_06 oc0301b_07 ju0330b_07 ju0330b_08 se0326b_01 se0326b_02 se0326b_03 56.Gu 56.Gu 56.Gu 56.Gu 56.Gu 56.TFE 56.TFE 56.TFE 56.TFE 56.TFE 46.Gu 46.Gu 46.Gu 46.Gu 46.Gu

Summarizing data Bad plot Good plot 70 60 60 50 40 40 30 20 20 10 0 0 A B A B