μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Similar documents
σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

TWO μs OR MEDIANS: COMPARISONS. Business Statistics

Chapter 6.1 Confidence Intervals. Stat 226 Introduction to Business Statistics I. Chapter 6, Section 6.1

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Lecture 2 INTERVAL ESTIMATION II

Chapter 8 Statistical Intervals for a Single Sample

Data Analysis and Statistical Methods Statistics 651

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Distribution. Lecture 34 Section Fri, Oct 31, Hampden-Sydney College. Student s t Distribution. Robb T. Koether.

Statistical Intervals (One sample) (Chs )

Confidence Intervals. σ unknown, small samples The t-statistic /22

Confidence Intervals Introduction

Chapter 7 Sampling Distributions and Point Estimation of Parameters

χ 2 distributions and confidence intervals for population variance

Data Analysis and Statistical Methods Statistics 651

8.1 Estimation of the Mean and Proportion

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

STAT Chapter 7: Confidence Intervals

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

1 Inferential Statistic

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Lecture 16: Estimating Parameters (Confidence Interval Estimates of the Mean)

Probability. An intro for calculus students P= Figure 1: A normal integral

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Statistics vs. statistics

Lecture 6: Confidence Intervals

Chapter 11: Inference for Distributions Inference for Means of a Population 11.2 Comparing Two Means

Chapter 9: Sampling Distributions

Chapter 7. Inferences about Population Variances

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Statistics for Managers Using Microsoft Excel 7 th Edition

Chapter 8 Estimation

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

The Two-Sample Independent Sample t Test

Statistics for Business and Economics

Review: Population, sample, and sampling distributions

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

STAT Chapter 6: Sampling Distributions

The topics in this section are related and necessary topics for both course objectives.

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

Lecture 2. Probability Distributions Theophanis Tsandilas

The normal distribution is a theoretical model derived mathematically and not empirically.

As you draw random samples of size n, as n increases, the sample means tend to be normally distributed.

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

Statistics Class 15 3/21/2012

Chapter 7 Study Guide: The Central Limit Theorem

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Chapter 5. Sampling Distributions

Learning Objectives for Ch. 7

One sample z-test and t-test

Chapter 4: Estimation

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

CIVL Confidence Intervals

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Homework: (Due Wed) Chapter 10: #5, 22, 42

Statistics for Business and Economics: Random Variables:Continuous

5.3 Statistics and Their Distributions

Chapter 4 Continuous Random Variables and Probability Distributions

Experimental Design and Statistics - AGA47A

Elementary Statistics

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Simple Random Sampling. Sampling Distribution

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Dr. Allen Back. Oct. 28, 2016

Descriptive Statistics (Devore Chapter One)

6.1, 7.1 Estimating with confidence (CIS: Chapter 10)

7.1 Comparing Two Population Means: Independent Sampling

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

Confidence Intervals and Sample Size

Statistics & Statistical Tests: Assumptions & Conclusions

1. Variability in estimates and CLT

Data Analysis. BCF106 Fundamentals of Cost Analysis

Introduction to Business Statistics QM 120 Chapter 6

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Statistics 13 Elementary Statistics

Sampling & Confidence Intervals

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 42

Lecture 8: Single Sample t test

Module 4: Probability

2011 Pearson Education, Inc

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

4.2 Probability Distributions

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

Statistics 6 th Edition

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

5.7 Probability Distributions and Variance

Lecture 10 - Confidence Intervals for Sample Means

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

1 Introduction 1. 3 Confidence interval for proportion p 6

Statistics and Probability

Transcription:

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

CONTENTS Estimating parameters The sampling distribution Confidence intervals for μ Hypothesis tests for μ The t-distribution Comparison of z and t Old exam question Further study

ESTIMATING PARAMETERS Central task in inferential statistics Estimation estimating a parameter (population value) from a sample Example what proportion of cars in Amsterdam is electric? population value: π sample of size n = 200 cars yields 26 electric cars so, p = 26 200 = 0.13 this suggests π 0.13

ESTIMATING PARAMETERS Terminology Parameter a characteristic descriptive of the population e.g., μ, π, σ (or σ 2 ) Estimator a statistic derived from a sample to infer the value of a population parameter e.g., തX, P, S (or S 2 ) Estimate the value of the estimator in a particular sample e.g., x, ҧ p, s (or s 2 )

ESTIMATING PARAMETERS

ESTIMATING PARAMETERS Mean Standard deviation Proportion Estimator Estimate Population parameter തX = 1 σ n i=1 n X i x ҧ = 1 σ n i=1 n x i μ S = P = X n 1 σ n n 1 i=1 X i തX 2 s = p = x n 1 σ n n 1 i=1 x i xҧ 2 σ π

ESTIMATING PARAMETERS Another example (Amsterdam, 2015): what is the mean price of a glass of beer? population value: μ sample of size n = 64 glasses of beer yields x ҧ = 2.06 this suggests that μ = 2.06 But suppose we had taken a different sample again with sample size n = 64 but now perhaps yielding x ҧ = 2.13 then we would estimate μ = 2.13 Obviously there is sampling variation so a distribution of x-values ҧ (the sampling distribution of തX) Solution: point estimates and confidence intervals

THE SAMPLING DISTRIBUTION Example Consider a discrete uniform population consisting of the integers {0, 1, 2, 3} The population parameters are: μ = 1.5 σ = 1.118

THE SAMPLING DISTRIBUTION Sample n = 2 values and calculate xҧ Do this for all possible sample of size n = 2 You will get a distribution of x-values: ҧ the distribution തX

THE SAMPLING DISTRIBUTION We will study the variance of the estimate of a population parameter from a sample statistic We will do so by studying how the sample statistic varies when you draw a different sample Example: GMAT score of MBA students N = 2637 μ = 520.78 σ = 86.60

THE SAMPLING DISTRIBUTION Consider eight random samples, each of size n = 5 the sample means ( xҧ 1 = 504.0, xҧ 2 = 576.0,, xҧ 8 = 582) tend to be close to the population mean (μ = 520.78) sometimes a bit lower, sometimes a bit higher

THE SAMPLING DISTRIBUTION The dot plots show that the sample means ( xҧ 1,, xҧ 8 ) have much less variation than the individual data points (x 1,, x 2637 )

THE SAMPLING DISTRIBUTION An estimator is a random variable since samples vary so we write it as a capital letter, e.g., X, തX, S, etc. The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of (a fixed) size n is taken so we write X~N μ, σ, etc.

THE SAMPLING DISTRIBUTION The sampling distribution of തX for a population with μ = μ X and σ 2 = σ X 2 If the CLT holds തX~N μ X, σ X 2 So, the statistic തX is normally distributed has mean μ X and has standard deviation σ X n Fortunately, the CLT holds pretty often n 3 things: shape, mean, dispersion

THE SAMPLING DISTRIBUTION The standard deviation of the distribution of sample means തX is given by σx ത = σ X n has a special name: standard error of the mean is often abbreviated as the standard error (SE) decreases with increasing sample size but only according to the law of diminishing returns (1/ n) is often calculated by software (SPSS, etc.) is the basis for confidence intervals and hypothesis tests (see later) That s a bit confusing, because we will meet more standard errors later on

EXERCISE 1 What is the meaning of the standard error?

CONFIDENCE INTERVALS FOR μ A sample mean xҧ is a point estimate of the population mean μ it is the best possible estimate of μ but it will probably not be completely right A confidence interval (CI) for the mean is a range of possible values for μ: μ lower μ μ upper To simplify notation, we will drop the X from μ X now, and write just μ such that the interval CI μ = μ lower, μ upper contains the true value (μ) with a certain probability (e.g., 95%)

CONFIDENCE INTERVALS FOR μ From the CLT it follows that under certain conditions: the distribution of തX is normal the best estimate of തX of μ is xҧ the standard deviation of തX is σ n This implies that: with probability 2.5%, തX < μ 1.96 σ n μ > തX + 1.96 σ n with probability 2.5%, തX > μ + 1.96 σ n μ < തX 1.96 σ n so with probability 95%, തX 1.96 σ n μ തX + 1.96 σ n So, if we find a sample mean x, ҧ we can construct the following 95% confidence interval for μ: CI μ,0.95 = xҧ 1.96 σ n, x ҧ + 1.96 σ n

CONFIDENCE INTERVALS FOR μ Three notations for a confidence interval for μ xҧ 1.96 σ, x ҧ + 1.96 σ n n xҧ 1.96 σ μ n x ҧ ± 1.96 σ n x ҧ + 1.96 σ n

CONFIDENCE INTERVALS FOR μ Example Population μ = 520.78 (unknown) σ = 86.60 (known) normally distributed (assumed) Sample n = 5 (chosen) x ҧ = 504.0 (estimated) Calculation standard error of mean: 86.60 5 = 38.73 1.96 38.73 = 75.91 CI μ,0.95 = 428.09, 579.91

EXERCISE 2 Write the confidence interval 428.09, 579.91 in two alternative ways.

CONFIDENCE INTERVALS FOR μ The factor 1.96 is of course related to the 95% probability Other confidence levels: Where z α/2 is such that P Z z α/2 = α if Z is drawn from a Z-distribution General form of a 1 α 100% confidence interval of the mean: CI μ,1 α = xҧ z α/2 σn, x ҧ + z σ α/2 n

CONFIDENCE INTERVALS FOR μ

CONFIDENCE INTERVALS FOR μ Trade-off narrow CI low confidence level wide CI high confidence level Choice of confidence level depends on application more precision required for a refinery than for a dairy farm

CONFIDENCE INTERVALS FOR μ A confidence interval either does or does not contain μ The confidence level quantifies the risk Out of 100 confidence intervals, approximately 95% will contain μ, while approximately 5% might not contain μ

HYPOTHESIS TESTS FOR μ We can use the standard error to perform a hypothesis test recall that CI μ,0.95 = 428.09, 579.91 Suppose we hypothesize μ = 550 The value 550 is inside the 95% confidence interval for μ therefore the sample statistic+confidence interval will not suggest that the hypothesis (μ = 550) is wrong and we will not reject the hypothesis notice that we didn t say that μ = 550; we only said that we can t reject it (at a 5% significance level)

HYPOTHESIS TESTS FOR μ Another example: suppose we hypothesize that μ = 600 The value 600 is outside the confidence interval for μ finding a confidence interval not containing μ happens only in 5% of the cases so we conclude that μ 600 (at a 5% significance level) therefore the sample statistic+confidence interval will suggest that the hypothesis (μ = 600) is wrong and we will reject the hypothesis Much more on hypothesis tests later on!

THE t-distribution A closer look at CI μ,0.95 = xҧ 1.96 σ, x ҧ + 1.96 σ n n Given a sample mean x, ҧ you can find a 95% confidence interval for the population mean μ Sounds great when you don t know μ...... but it assumes you do know σ! There are many situations in which you don t know μ and you also don t know σ So what to do?

THE t-distribution A simple strategy If the population standard deviation σ is unknown, we can estimate it with the sample standard deviation s Then we use ±1.96 s n instead of ±1.96 σ n But we pay a price for that The reason is that s is itself an estimate of σ, and therefore uncertain The price we pay is that the factor 1.96 must be somewhat larger

THE t-distribution Recall that the CLT yields that ത X μ σ/ n where Z is the standard normal distribution Likewise, it can be shown that തX μ s/ n ~t ~N 0,1 where t is the t-distribution (or Student s t-distribution) which has an even more complicated formula than the normal distribution f z = 1 2π e 1 2 z2 vs. f t; ν = Γ 1 2 ν+1 νπγ 1 2 ν 1 + t2 ν 1 2 ν+1 Arrrgh: forget quickly!

THE t-distribution The confidence interval for μ with unknown σ is CI μ,1 α = xҧ t α/2 sn, x ҧ + t α/2 s n Where t α/2 is such that P T t α/2 = α if T is drawn from a t-distribution What is the t-distribution? quite similar to the Z-distribution (μ = 0, continuous, symmetric, bellshaped, infinite range,...) a little bit fatter tails it has 1 parameter, usually denoted with df or ν, and called degrees of freedom

THE t-distribution Graph of pdf of t-distribution Z (standard normal) distribution t-distribution with df = 1000 f x t-distribution with df = 13 t-distribution with df = 5 x

THE t-distribution Different notations t 13 t df = 13 etc. And likewise t 13;α/2 t 13 α/2 etc. So altogether for the confidence interval CI μ,1 α = Compare to xҧ z α/2 σn, x ҧ + z α/2 xҧ t n 1;α/2 sn, x ҧ + t n 1;α/2 s n σ n

THE t-distribution

THE t-distribution How to choose the parameter df? it is a parameter based on the sample size that is used to determine the value of the t-statistic it tells how many observations are used to estimate σ, less the number of intermediate estimates used in the calculation the df for the t-distribution in the case of a confidence interval for μ when σ is unknown, is df = n 1 but in other cases, it may be different Properties of t as n increases, the t-distribution approaches the shape of the normal distribution for a given confidence level α, t is always larger than z, so a confidence interval based on t is always wider than if z were used

THE t-distribution Reading the table of critical t-values e.g., t 0.025 9 t = 2.262 α/2 = 0.025 df = 9

THE t-distribution Look carefully at tables for z and t: z usually runs from left to right P X z = f x dx t usually runs from right to left P X t = t f x dx z

THE t-distribution Background of t developed by William Gosset in 1908 while working at Guiness Brewery, Dublin published under the pen name Student

THE t-distribution Example for confidence interval Population μ = 520.78 (unknown) σ = 86.60 (unknown) normally distributed (assumed) Sample n = 5 (chosen) x ҧ = 504.0 (estimated) s = 73.01 (estimated) Calculation standard error of mean: s n = 32.65 2.776 32.65 = 90.65 CI μ,0.95 = 413.35, 594.64 now we have a situation in which σ is not known to us df=4

THE t-distribution Repeat the hypothesis test for this case now CI μ,0.95 = 413.35, 594.65 So we will reject the hypothesis μ = 600 while we will not reject the hypothesis μ = 550 Exactly the same reasoning as with the z-test, but with (slightly) different numbers

COMPARISON OF z AND t When to use which? for a confidence interval for μ if σ 2 is known: use z for a confidence interval for μ if σ 2 is unknown: use t, and estimate σ 2 by s 2 How to find? from a table with z-values: given α, look up z from a table with t-values: given α and df, look up t What is the difference? confidence intervals with t are a bit wider than with z the difference is small for n 30 and negligible for n 100

COMPARISON OF z AND t Example: 50 confidence intervals with z and t 60 50 50 Samples, sample size n=10 Simulated from: N(2,9) distribution Based on 2 Based on s 2 (i) σ തX ± z α/2 n Sample Number i 40 30 20 തX ± t α/2;n 1 S n 10 0 0 2 4 0 2 4

OLD EXAM QUESTION 23 March 2015, Q1l

FURTHER STUDY Doane & Seward 5/E 8.4-8.5, 10.4 Tutorial exercises week 2 point estimate confidence interval, z test for mean t test for mean z versus t