Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Similar documents
Chapter 7. Sampling Distributions and the Central Limit Theorem

Module 4: Point Estimation Statistics (OA3102)

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Chapter 7. Sampling Distributions and the Central Limit Theorem

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 7 Sampling Distributions and Point Estimation of Parameters

5.3 Statistics and Their Distributions

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

STAT Chapter 6: Sampling Distributions

Central Limit Theorem, Joint Distributions Spring 2018

BIO5312 Biostatistics Lecture 5: Estimations

Business Statistics 41000: Probability 4

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Commonly Used Distributions

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Business Statistics 41000: Probability 3

Statistics and Probability

A useful modeling tricks.

STAT Chapter 7: Central Limit Theorem

Statistics for Business and Economics

4 Random Variables and Distributions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

The normal distribution is a theoretical model derived mathematically and not empirically.

Statistical Methods in Practice STAT/MATH 3379

STA 103: Final Exam. Print clearly on this exam. Only correct solutions that can be read will be given credit.

2011 Pearson Education, Inc

Chapter 5. Sampling Distributions

STAT 241/251 - Chapter 7: Central Limit Theorem

Introduction to Business Statistics QM 120 Chapter 6

Part V - Chance Variability

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Sampling and sampling distribution

Chapter 8 Statistical Intervals for a Single Sample

Central Limit Theorem (cont d) 7/28/2006

Chapter 7: Point Estimation and Sampling Distributions

Math 227 Elementary Statistics. Bluman 5 th edition

Chapter 4 Discrete Random variables

Statistics, Their Distributions, and the Central Limit Theorem

II. Random Variables

Lecture 2. Probability Distributions Theophanis Tsandilas

Probability. An intro for calculus students P= Figure 1: A normal integral

AMS7: WEEK 4. CLASS 3

Homework Assignments

Chapter 3 Discrete Random Variables and Probability Distributions

Bernoulli and Binomial Distributions

Statistical Tables Compiled by Alan J. Terry

Statistical Intervals (One sample) (Chs )

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Probability Distributions for Discrete RV

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

4.3 Normal distribution

Section Random Variables and Histograms

Sampling Distribution

ECON 214 Elements of Statistics for Economists 2016/2017

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 4 Discrete Random variables

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Chapter 3 Discrete Random Variables and Probability Distributions

M249 Diagnostic Quiz

Chapter 7 Study Guide: The Central Limit Theorem

Lecture Data Science

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Confidence Intervals. σ unknown, small samples The t-statistic /22

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Lean Six Sigma: Training/Certification Books and Resources

Homework Problems Stat 479

Section 0: Introduction and Review of Basic Concepts

The topics in this section are related and necessary topics for both course objectives.

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

A.REPRESENTATION OF DATA

Data Analytics (CS40003) Practice Set IV (Topic: Probability and Sampling Distribution)

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Chapter 6 Probability

Discrete Random Variables

STAT 201 Chapter 6. Distribution

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Chapter 16. Random Variables. Copyright 2010, 2007, 2004 Pearson Education, Inc.

8.1 Estimation of the Mean and Proportion

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

MATH 3200 Exam 3 Dr. Syring

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

GETTING STARTED. To OPEN MINITAB: Click Start>Programs>Minitab14>Minitab14 or Click Minitab 14 on your Desktop

Chapter 3 Discrete Random Variables and Probability Distributions

Lecture 9 - Sampling Distributions and the CLT

Section Distributions of Random Variables

Probability Distributions

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Statistics for Business and Economics: Random Variables:Continuous

Chapter 15: Sampling distributions

Section Distributions of Random Variables

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

4: Probability. What is probability? Random variables (RVs)

Chapter 4 Continuous Random Variables and Probability Distributions

Chapter 5 Normal Probability Distributions

Transcription:

Module 3: Sampling Distributions and the CLT Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chpt 7.1-7.3, 7.5 Revision: 1-12 1

Goals for this Module Statistics and their distributions Deriving a sampling distribution Analytically Using simulation Sampling Distributions Distribution of the sample mean Distributions related to the normal Central Limit Theorem Normal approximation to the binomial Revision: 1-12 2

Definition: Statistic A statistic is a function of observable random variables in a sample and known constants Revision: 1-12 3

Statistics and Their Distributions (1) Remember, we denote random variables with upper case Roman letters E.g., Y 1, Y 2, Y 3, Represent placeholders for the actual values once we observe them We use lower case Roman letters to denote the observed values: y 1, y 2, y 3, Thus: Y 1, Y 2, Y 3, are random quantities and thus are described by probability distributions y 1, y 2, y 3, are just numbers Revision: 1-12 4

Statistics and Their Distributions (2) Since Y 1, Y 2, Y 3, are random variables, so is any function of them Y 1 n Yi n i 1 E.g., is a random variable It s the mean of n random variables before we observe their values Thus, statistics of random variables are random variables themselves So, they have their own probability distribution It s called the sampling distribution Revision: 1-12 5

Definition: Sampling Distribution A sampling distribution is the probability distribution of a statistic Revision: 1-12 6

Illustrating Random Statistics Consider drawing samples from a Weibull distribution with a=2 and b=5 (so that m =E(X)=4.43, m =4.16, and s=2.32) Six samples of size n=10 drawn from a Weibull distribution Note that the sample means, medians, and standard deviations are all different randomness! Revision: 1-12 7 * Figure and table from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Demonstrating Randomness This is a demonstration showing that statistics (i.e., functions of random variables) are random variables too. TO DEMO Applets created by Prof Gary McClelland, University of Colorado, Boulder You can access them at www.thomsonedu.com/statistics/book_content/0495110817_wackerly/applets/seeingstats/index.html Revision: 1-12 8

Simple Random Sampling (1) The sampling distribution of a statistic depends on the Population distribution Sample size Method of sampling For this class, we will always assume simple random sampling (SRS) Each X (or Y) in the sample comes from the same distribution and is independent of the other Xs Shorthand: They re independent and identically distributed (iid) Revision: 1-12 9

Simple Random Sampling (2) In this class, we will be thinking of iid random variables from a probability distribution It s an idealized model of the real world Implies that the population is infinite in size In real world, populations are often finite If sample with replacement, then SRS still holds If sample without replacement, but sample less than 5 percent of population, close-enough approximation Revision: 1-12 10

Example (Review) A balanced (i.e., fair ) die is tossed three times. Let Y 1, Y 2, and Y 3 be the outcomes, and denote the average of the three outcomes by ( Y-bar ) Y Find the mean and standard deviation of m That is, find and Y s Y Y Revision: 1-12 11

Example (Review) Revision: 1-12 12

Analytically Deriving a Sampling Distribution Consider the following problem The NEX automobile service center charges $40, $45, or $50 for a tune-up on 4, 6, and 8 cylinder cars, respectively The pmf of revenue for a random car, X, is So, m=46.5 and s 2 =15.25 What s the distribution of the average revenue from two tune-ups, (X 1 +X 2 )/2, assuming they are independent? Revision: 1-12 13

Analytically Deriving a Sampling Distribution, cont d Tabulating all outcomes, associated probabilities, and statistics gives Thus, we calculate: Revision: 1-12 14 * Table from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Picturing the Sampling Distribution The two distributions look like this: Note that the: Distribution of X Means of the two distributions look to be the same Variability of the sampling distribution looks smaller This is not an accident Sampling Distribution of (X 1 +X 2 )/2 Revision: 1-12 15 * Figures from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Another Sampling Distribution Consider the same service center, but now calculate the sampling distribution of the average revenue from four (independent) 4 tune-ups: 1 X Xi 4 The sampling distribution looks like this i1 Revision: 1-12 16 * Figure from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Back to the Die Example We could do the same thing to derive the sampling distribution for the mean of three rolls of the die E.g., we know: The outcomes range from (roll three ones) to Y 6 (roll three sixes) Y 1 There are 6 3 =216 possible outcomes of the three rolls but not all translate into unique values Specific values sampling distribution can take on are 3/3, 4/3, 5/3, 6/3, 7/3,,17/3, 18/3 Revision: 1-12 17 Y

Example: Analytically Calculating the Sampling Distribution Calculate Pr Y 1 : Now calculate Pr Y 4 / 3 : Revision: 1-12 18

Example: Analytically Calculating the Sampling Distribution And now calculate Pr Y 5 / 3 : Etc Revision: 1-12 19

Using Simulation to Approximate the Sampling Distribution These calculations are tedious Use R to simulate for approximate results Revision: 1-12 20

Approximate sampling distribution Now, Fancier Previous plot shows frequencies using a histogram Let s do some more calculations and clean things up Check against exact answer Revision: 1-12 21

So, Here s a Nicer Plot Revision: 1-12 22

Simulation Experiments As we ve seen, can use simulation to empirically estimate sampling distributions Can be useful when analytics hard/impossible Need to specify: Statistic of interest Population distribution Sample size Number of replications Revision: 1-12 23

Example Statistic: sample mean Population distribution: N(8.25, 0.75 2 ) Sample size: (a) n=5, (b) n=10, (c) n=20, (d) n=30 Number of replications: 500 each Revision: 1-12 24 * Figures from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Another Example Statistic: sample mean Population distribution: LN(3, 0.16) Sample size: (a) n=5, (b) n=10, (c) n=20, (d) n=30 Number of replications: 500 each Revision: 1-12 25 * Figures from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Sampling Distributions Related to the Normal Distribution of the sample mean (when the population is normally distributed) Chi-squared (c 2 ) distribution Sums of squared normally distributed r.v.s t distribution Ratio of standard normal rv to function of a chisquared random variable F distribution Ratio of (a function of) chi-squared r.v.s Revision: 1-12 26

Why Should We Care??? Eventually we will be doing hypothesis tests and constructing confidence intervals Important statistics that we will want to test have these sampling distributions So, it seems pretty esoteric here, but all of these distributions will play important roles in practical, real-world problems Revision: 1-12 27

Remember Linear Combinations of Random Variables (see Theorem 5.12) Given a collection of n random variables Y 1, Y 2,, Y n and n numerical constants a 1, a 2,, a n, the random variable i1 is called a linear combination of the Y i s Note that we get the Total, X=T 0, if a 1 =a 2 = =a n =1 Sample mean, X= Y n X a Y a Y a Y a Y i i 1 1 2 2 n n if a 1 =a 2 = =a n =1/n But also note the Y i s are not necessarily iid Revision: 1-12 28

Some Useful Facts (1) Let Y 1, Y 2,, Y n have mean values m 1, m 2,, m n, respectively, and variances 2 2 2 s, respectively 1, s 2,, s n 1. Whether or not the Y i s are independent E a Y a Y a Y 1 1 2 2 n n a E Y a E Y a E Y 1 1 2 2 a m a m a m 1 1 2 2 n a m i i Revision: 1-12 i1 29 n n n n

Some Useful Facts (2) 2. If the Y 1, Y 2,, Y n are independent So, Var a Y a Y a Y 1 1 2 2 3. For any Y 1, Y 2,, Y n, n n a Var Y a Var Y a Var Y 2 2 2 1 1 2 2 n a s a s a s 2 2 2 2 2 2 1 1 2 2 n 3 s a s a s a s 2 2 2 2 2 2 a Y a Y a Y 1 1 2 2 n 3 1 1 2 2 n n Var a1y 1 a2y2 a Y a a Cov( Y, Y ) Revision: 1-12 30 n n n n i j i j i1 j1 n

Sampling Distribution of the Sample Mean (Population Normally Dist d) Theorem 7.1: Let Y 1, Y 2,, Y n be a random sample of size n from a normal distribution with mean m Y and standard deviation s Y n Then, 1 2 Y Y i N my, sy n n i1 In particular, note that The sample mean of normally distributed random variables is normally distributed m 2 2 my s s Y Y Also and Y This is true for any sample size n Revision: 1-12 31 n

Proof Revision: 1-12 32

Proof (continued) Revision: 1-12 33

Proof (continued) Revision: 1-12 34

Example 7.2 Amount dispensed (in ounces) by a beer bottling machine is normally distributed with s 2 =1.0. For a sample of size n=9, find the probability that the sample mean is within 0.3 ounces of the true mean m. Solution: Revision: 1-12 35

Example 7.2 (continued) Revision: 1-12 36

Table 4, Appendix 3 Revision: 1-12 37

Example 7.3 In Example 7.2, how big of a sample size do we need if we want the sample mean to be within 0.3 ounces of m with probability 0.95? Solution: Revision: 1-12 38

Example 7.3 (continued) Revision: 1-12 39

Sampling Distribution of the Sum of Squared Standard Normal R.V.s Theorem 7.2: Let Y 1, Y 2,, Y n be defined as in Theorem 7.1. Then Zi Yi my sy are iid standard normal r.v.s and 2 c 2 n n 2 Yi m Y 2 Zi c n i1 i1 sy where n denotes a chi-square distribution with n degrees of freedom Proof based on theorem from Chapter 6, so we ll skip it Revision: 1-12 40

The Chi-squared Distribution Chi-squared distribution has one parameter can take on values 1, 2, 3, Distribution is very skewed for lower values of n f(x,) is positive only for values of x>0 Graphs of three 2 density functions: Revision: 7-10 41 * Figure from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

Looking Up Chi-squared Quantiles Can look up in WM&S Table 6 Note that, because the distribution is not symmetric, must look up each tail separately Table gives probability in the right tail: Revision: 7-10 42 * Figures from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008.

WM&S Table 6 43

Example 7.4 Let Z 1, Z 2,, Z 6 be a random sample from the standard normal distribution. Find the number b such that n 2 Pr Zi b0.95 i1 Solution: Revision: 1-12 44

Sampling Distribution: Ratio of Sample Variance to Population Variance Theorem 7.3: Let Y 1, Y 2,, Y n be an iid sample from a normal distribution with mean m Y and standard deviation s Y. Then n s Y 2 c 2 1 S 1 n 2 2 Yi Y c n s Y i1 1 where n 1 denotes a chi-square distribution with n-1 degrees of freedom Y Also, and S 2 are independent random variables Revision: 1-12 45

Proof (for n=2) Revision: 1-12 46

Proof (continued) Revision: 1-12 47

Example 7.5 In Example 7.2, amount dispensed (in ounces) is normally distributed with s 2 =1.0. For a sample of size n=10, find b 1 and b 2 such that 2 Pr b 1 S b2 0.9. Solution: Revision: 1-12 48

Example 7.2 (continued) Revision: 1-12 49

Sampling Distribution: Sample Mean (Popul n Normally Dist d, s Unknown) Definition 7.2: Let Z be a standard normal r.v. and let W be a chi-square distributed r.v. Then if Z and W are independent where t is the t distribution with dfs In particular, note that T Z ~ t W Y m s / n Y m Z S n W n 2 2 n 1 S s n 1 1 Revision: 1-12 50 ~ t n1

f (x) Illustrating the t Distribution 0.40 normal 0.30 t (3 df) t (10 df) t (100 df) 0.20 0.10 0.00-4 -3-2 -1 0 1 2 3 4 Revision: 7-10 51 x

WM&S Table (Inside Front Cover) Revision: 7-10 52

Example 7.6 The tensile strength of a type of wire is normally distributed with unknown mean m and variance s 2 Six pieces are randomly selected from a large roll Tensile strength will be measured (Y 1,,Y 6 ) We usually use to estimate m and S 2 for s 2, so it s 2 reasonable to estimate s with S n So, find the probability that 2S 2 n Y will be within of the true population mean m 2 Y Y Revision: 1-12 53

Example 7.6 Solution Revision: 1-12 54

Sampling Distribution: Ratio of Chi- Squared RVs (and Their DFs) Definition 7.3: Let W 1 and W 2 be independent chi-square distributed r.v.s with 1 and 2 dfs, respectively. Then W1 1 F ~ F1, 2 W where F 1, 2 2 2 In particular, note that is the F distribution with 1 & 2 dfs 2 2 s n 1 S s n 1 Fn 1 n2 2 2 S1 s n1 1 S 1 1 1 n1 1 W1/ n1 1 2 2 2 2 ~ S2 s 2 W 2 2 2 2 2 / n2 1 1, 1 Revision: 1-12 55

The F Distribution The F distribution specified by the two degrees of freedom, and We will often be interested in right tail probabilities Notation: 1 2 F a, 1, 2 That s how WM&S Table 7 set up (next slide) For left tail probabilities, must use F 1/ 1 a,, a,, 1 2 2 1 Revision: 2-10 56 * Figure from Probability and Statistics for Engineering and the Sciences, 7 th ed., Duxbury Press, 2008. F

WM&S Table 7 Revision: 2-10 57

Example 7.7 If we take independent samples of size n 1 =6 and n 2 =10 from two normal populations with equal population variances, find the number b such that Solution: 2 S 1 Pr b 0.95 2 S2 Revision: 1-12 58

Exercise 7.7: Table 7 Excerpt Revision: 1-12 59

Finding Probabilities and Quantiles Using R R functions: Note: functions are based on cumulative probabilities (i.e., the left tails), not the right tails To do calculations like in the tables Use the lower.tail=true option (so p=a) Use the function as is, but remember p=1-a Revision: 1-12 60

Back to the Examples Example 7.2: Example 7.3: Example 7.4: Example 7.6: Example 7.7: Revision: 1-12 61

The Central Limit Theorem (CLT) The Central Limit Theorem says that, for sufficiently large n [1], sums of iid r.v.s are approximately normally distributed As n gets bigger, the approximation gets better More precisely, as n, the distribution of Y m Z s n converges to a standard normal distribution Where E(Y)=m and Var(Y)=s 2 [1] A generally conservative rule of thumb is n>30 Revision: 1-12 62

CLT (continued) So, let Y 1, Y 2,, Y n be a random sample from any distribution with mean m Y and standard deviation s Y Then if n is sufficiently large, Y has an approximate normal distribution with 2 2 m m and s s X X X X n Similarly, if n is sufficiently large, then Y Y Z Y m s n has an approximate standard normal distribution with and Revision: 1-12 63

5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 Frequency 1 3 5 7 9 11 13 15 17 19 21 23 25 Frequency Frequency Frequency Example: Sums of Dice Rolls Roll of a Single Die 120 100 80 60 40 Sum of 1 roll 20 0 1 2 3 4 7005 6 Sum of Tw o Dice Outcome 600 500 400 300 Sum of 2 rolls 200 100 0 2 3 4 5 6 7 8 970 10 11 12 Sum 60 50 40 30 Sum of 5 Dice Sum of 5 rolls 20 10 0 Sum 350 300 250 200 150 100 Sum of 10 Dice Sum of 10 rolls 50 Revision: 1-12 Sum 64 0

Demonstrating the CLT This is a simulation demonstrating the Central Limit Theorem. TO DEMO Applets created by Prof Gary McClelland, University of Colorado, Boulder You can access them at www.thomsonedu.com/statistics/book_content/0495110817_wackerly/applets/seeingstats/index.html Revision: 1-12 65

Illustrating the CLT in R > m<-matrix(data=runif(10000*100),nrow=10000,ncol=100) > avg1col <- m[,1] > avg2col <- apply(m[,1:2],1,mean) > avg3col <- apply(m[,1:3],1,mean) > avg4col <- apply(m[,1:4],1,mean) > avg5col <- apply(m[,1:5],1,mean) > avg10col <- apply(m[,1:10],1,mean) > avg20col <- apply(m[,1:20],1,mean) > avg50col <- apply(m[,1:50],1,mean) > avg100col <- apply(m[,1:100],1,mean) > par(mfrow=c(3,3)) > hist(avg1col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/12)),lwd=2,col="red",add=true) > hist(avg2col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*2))),lwd=2,col="red",add=true) > hist(avg3col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*3))),lwd=2,col="red",add=true) > hist(avg4col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*4))),lwd=2,col="red",add=true) > hist(avg5col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*5))),lwd=2,col="red",add=true) > hist(avg10col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*10))),lwd=2,col="red",add=true) > hist(avg20col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*20))),lwd=2,col="red",add=true) > hist(avg50col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*50))),lwd=2,col="red",add=true) > hist(avg100col,prob=true,xlim=c(0,1)) > curve(dnorm(x,.5,sqrt(1/(12*100))),lwd=2,col="red",add=true) Revision: 1-12 66

The CLT More Formally Theorem 7.4: Let Y 1, Y 2,, Y n be iid r.v.s with 2 mean EY m and variance Y s Define i U n Y Y my s / n Y Then as n the distribution function of U n converges to the standard normal distribution: u Revision: 1-12 67 n i1 Y s i Y nm 1 t 2 /2 lim Pr Un u e dt for all u n 2 Var i Y n Y

Example 7.8 For the whole population, achievement scores on a certain test have mean m Y = 60 and s Y = 8. For a random sample of n=100 scores from students at one school, the average score is 58. Is there evidence to suggest this school is inferior? That is, what s the probability of seeing an average score of 58 if the true school average matches the population? Revision: 1-12 68

Example 7.8 Solution: Revision: 1-12 69

Example 7.9 The service times for customers coming through a Navy Exchange checkout counter are iid with m Y = 1.5 and s Y = 1.0. Approximate the probability that n=100 customers can be served in less than 2 hours. Solution: Revision: 1-12 70

Example 7.9 Revision: 1-12 71

Checking the Solutions in R Example 7.8 Example 7.9 Revision: 1-12 72

Normal Approximation to the Binomial A r.v. Y~Bin(n,p) is the number of successes out of n independent trials with probability of success p for each trial Define indicator variables X 1, X 2,, X n as 1 if the ith trial is a success X i 0 if the ith trial is a failure So, X 1, X 2,, X n are iid Bernoulli r.v.s and we have Y = X 1, X 2,, X n That is, Y is a sum of iid random variables, so for large enough n the CLT applies Revision: 1-12 73

Exercise 7.10 Candidate A believes she can win an election if she can get 55% of the votes in precinct 1. Assuming 50% of the precinct 1 voters favor her and n=100 random voters show up, what is the (approximate) probability she will receive at least 55% of their votes? Solution: Revision: 1-12 74

Exercise 7.10 Revision: 1-12 75

When to Use the Approximation? Y and Y/n have an approximate normal distribution for large enough n, but large enough n depends on p Rule of thumb: Approximation works well when p 3 pq / n lies in the interval (0,1) An equivalent criterion is 9 max( pq, ) n min( pq, ) See extra credit Exercise 7.70 Revision: 1-12 76

Exercise 7.11 Suppose Y has a binomial distribution with n=25 and p=0.4. Find the exact probabilities that Y 8 and Y=8 and compare these with the corresponding values from the normal approximation. Exact solutions: Table 1 in Appendix 3 gives Pr Y 8 0.274 and Pr Y 8 Pr Y 8 Pr Y 7 0.274 0.154 0.120 In R: Revision: 1-12 77

Exercise 7.11 Solution: Revision: 1-12 78

The Continuity Correction The issue is we are approximating a discrete distribution with a continuous one So, to improve the approximation, rather than use the value for the continuous distribution halfway between the two discrete values In other words Add 0.5 to the value we re approximating for Pr Y y calculations Subtract 0.5 from the value we re approximating for Pr Y y calculations Revision: 1-12 79

Exercise 7.11 Solution with continuity correction: Revision: 1-12 80

What We Covered in this Module Statistics and their distributions Deriving a sampling distribution Analytically Using simulation Sampling Distributions Distribution of the sample mean Distributions related to the normal Central Limit Theorem Normal approximation to the binomial Revision: 1-12 81

Homework WM&S chapter 7 Required exercises: 1, 2, 9, 25, 31a-c, 48, 49, 72, 73 Extra credit: 15a&b, 70 Useful hints: Problem 7.1: Get to the applet more directly at www.thomsonedu.com/statistics/book_content/0495110817_ wackerly/applets/seeingstats/index.html. Click on 7. Distributions to the Normal > DiceSample Problem 7.25 part b: Use R, not the applet. The relevant R function is qt(p,df,lower.tail=false) Problem 7.31: Solutions in the back of the book are wrong. Revision: 1-12 82