Chapter 5. Sampling Distributions

Similar documents
Central Limit Theorem (cont d) 7/28/2006

Chapter 7. Sampling Distributions and the Central Limit Theorem

Chapter 7. Sampling Distributions and the Central Limit Theorem

Elementary Statistics Lecture 5

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Business Statistics 41000: Probability 4

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Sampling and sampling distribution

AP Statistics: Chapter 8, lesson 2: Estimating a population proportion

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

5.3 Statistics and Their Distributions

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Module 4: Probability

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Chapter 3 Discrete Random Variables and Probability Distributions

Chapter 7: Point Estimation and Sampling Distributions

Binomial Random Variables. Binomial Random Variables

Data Analysis and Statistical Methods Statistics 651

Confidence Intervals Introduction

Lecture 9. Probability Distributions. Outline. Outline

Lecture 9. Probability Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Part V - Chance Variability

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

Stat 213: Intro to Statistics 9 Central Limit Theorem

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

BIO5312 Biostatistics Lecture 5: Estimations

Midterm Exam III Review

E509A: Principle of Biostatistics. GY Zou

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

Section The Sampling Distribution of a Sample Mean

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

4.2 Bernoulli Trials and Binomial Distributions

AMS7: WEEK 4. CLASS 3

Engineering Statistics ECIV 2305

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Section Distributions of Random Variables

The Bernoulli distribution

The binomial distribution p314

Chapter 9 Chapter Friday, June 4 th

Part 10: The Binomial Distribution

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

ECON 214 Elements of Statistics for Economists 2016/2017

We use probability distributions to represent the distribution of a discrete random variable.

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

The normal distribution is a theoretical model derived mathematically and not empirically.

2011 Pearson Education, Inc

7 THE CENTRAL LIMIT THEOREM

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Statistical Methods in Practice STAT/MATH 3379

TOPIC: PROBABILITY DISTRIBUTIONS

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 5. Statistical inference for Parametric Models

8.1 Estimation of the Mean and Proportion

MATH 3200 Exam 3 Dr. Syring

STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions

Confidence Intervals and Sample Size

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Probability. An intro for calculus students P= Figure 1: A normal integral

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

The Two-Sample Independent Sample t Test

Section Sampling Distributions for Counts and Proportions

Chapter 6 Probability

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

STAT 201 Chapter 6. Distribution

Chapter 5: Statistical Inference (in General)

4 Random Variables and Distributions

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Statistics and Probability

Chapter 6 Confidence Intervals

Commonly Used Distributions

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Binomial and Normal Distributions

Statistics 13 Elementary Statistics

Random Variables Handout. Xavier Vilà

Chapter 9: Sampling Distributions

SAMPLING DISTRIBUTIONS. Chapter 7

Using the Central Limit

Chapter 8: Binomial and Geometric Distributions

CHAPTER 5 SAMPLING DISTRIBUTIONS

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

χ 2 distributions and confidence intervals for population variance

Math : Spring 2008

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

STA 220H1F LEC0201. Week 7: More Probability: Discrete Random Variables

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution.

Homework: (Due Wed) Chapter 10: #5, 22, 42

STAT Chapter 7: Central Limit Theorem

Transcription:

Lecture notes, Lang Wu, UBC 1 Chapter 5. Sampling Distributions 5.1. Introduction In statistical inference, we attempt to estimate an unknown population characteristic, such as the population mean, µ, using data in a sample, such as the sample mean, x. That is, we might use the sample mean, x, to estimate the population mean, µ. The accuracy of this estimation depends on the sample size, n, and the variability of the data. We can better understand the uncertainty in the estimation, as well as the basic idea behind statistical inference, by introducing an important concept called the sampling distribution. A sample statistic (e.g., x,) can be conceptually viewed as a random variable, because before we collect the data, we do not know what value the statistic will take. The statistic might take on any number in a range of values. Thus, it has a probability distribution, with the probability of certain values higher than the probability of others. The mean and variance of this distribution can be used to estimate the accuracy of using this statistic to estimate a population parameter. The probability distribution of a statistic is called the sampling distribution of this statistic. In other words, the sampling distribution of a statistic may be viewed as the distribution of all possible values of this statistic. For example, the sampling distribution of the sample mean, x, is the distribution of all possible values of x. So if we take many samples from the same population and calculate x for each of them, the values we get will all fall somewhere along the distribution. By examining the sampling distribution of x, we can get an idea of the variability and range of x, which are used to determine the accuracy of using x to estimate µ. For example, suppose we wish to estimate the average sleep time of all students in a university. Here, the population is all students in the university, and the population parameter of interest is average sleep time, denoted by µ. We can randomly select 10 students from this university and record the average sleep time of these 10 students, which is the sample mean x with sample size n = 10. Suppose that x = 6.5 (hours) for these 10 students. If we randomly select another sample of 10 students, we may obtain a different value of x, say, x = 7 (hours). Repeating this procedure many times, we obtain many values of x, such as 6.5, 7,. The probability distribution of all possible values of x is called the sampling distribution of x. This procedure is used as an illustration

Lecture notes, Lang Wu, UBC 2 since it may not be feasible in practice. For some populations, such as a normally distributed population, we can obtain the sampling distribution of the sample mean, x, via theoretical derivations. We examine this more later. Note that the population distribution and the sampling distribution are two different concepts. The population distribution refers to the distribution of a characteristic in the population, while the sampling distribution refers to the distribution of a particular statistic for repeated samples taken from the same population. Note also that if we randomly choose an individual from the population, the value of their characteristic can be seen as a random variable, X, whose probability distribution follows the population distribution. There are many different sample statistics, so there are many different sampling distributions. Here, we focus on the sampling distributions of the two most important statistics: the sampling distribution of the sample proportion the sampling distribution of the sample mean We focus on the above two sampling distributions because they are crucial to two respective population distributions: the binomial distribution (for discrete data) and the normal distribution (for continuous data). The sample proportion is the most important statistic for a population with a binomial distribution, and the sample mean is the most important statistic for a population with a normal distribution. Moreover, the sampling distributions of these two statistics can be derived theoretically. For sampling distributions of other statistics, such as the sample variance, readers are referred to more advanced textbooks. 5.2. Sampling Distribution of the Sample Proportion 5.2.1. The Binomial Distribution Before we discuss the sampling distribution of a sample proportion, we first introduce an important distribution for a discrete binary population: the Bernoulli distribution. In practice, many random variables take on only two possible values, often denoted by the

Lecture notes, Lang Wu, UBC 3 binary numbers 0 and 1 (or thought of as success and failure ). Random variables of this nature are said to follow a Bernoulli distribution. For example, a student taking a course can either pass (1) or fail (0). If you toss a coin, you will get either heads (1) or tails (0). In an election, a randomly selected person can either vote for candidate A (1) or vote against candidate A (0). We can view these examples as experiments with only two possible outcomes, often called Bernoulli trials. Going back to the example regarding taking a course, let s say we randomly select 10 students from a large class. Each student can pass or fail the course. We can view this as an experiment consisting of 10 trials, with each trial having two possible outcomes (pass or fail), and our interest lying in the number of students who pass the course. Moreover, the 10 students may be viewed as independent and identically distributed. Independent because they are randomly selected; identically distributed because we do not know who will be selected so the probability of passing the course is the same for all students in the class (e.g., each student has a passing probability of 0.8 and failing probability of 0.2). The other examples above may be viewed in a similar way. Example 1. In an election, a recent poll shows that 40% of people will vote for candidate A. Suppose that three people are randomly selected. (1) What is the probability that exactly two people vote for candidate A? (2) What is the probability that at least one person votes for candidate A? Solution: Here, each person has two options: vote for candidate A or vote for someone else, so we can view each person as a random variable that follows a Bernoulli distribution. We can assume the three people are independent. Let X i = 1 if person i votes for candidate A and X i = 0 otherwise, i = 1, 2, 3. Let X be the total number of people, among the three who were selected, who vote for candidate A. Then, X = X 1 +X 2 +X 3, with X = 3 meaning all three people vote for candidate A. (1) The probability that exactly two people vote for candidate A is given by P (X = 2) = 3 0.4 2 (1 0.4) = 0.288, 2 where the term 3 2 is the number of possible ways to have 2 out of 3 people vote for candidate A, the term 0.4 2 (1 0.4) is the probability that 2 people vote for candidate A and the other one does not, assuming the 3 people are independent (so we can use the multiplication rule and multiply the probabilities).

Lecture notes, Lang Wu, UBC 4 (2) The probability that at least one person votes for candidate A is P (X 1) = 1 P (X = 0) = 1 0.6 3 = 0.784. Alternatively, we can use P (X 1) = P (X = 1) + P (X = 2) + P (X = 3) to get the same answer, but the computation is more tedious. In general, we can consider n independent and identically distributed Bernoulli trials at once, where each trial has only two possible outcomes ( success or failure ). We are often interested in the probability of a certain number of successes. Let p be the probability of success for each trial, and let X be the total number of successes. Then, the probability distribution of X is given by P (X = k) = n p k (1 p) n k, k = 0, 1, 2,, n. k The above distribution is called the binomial distribution, denoted by X B(n, p), or X Bin(n, p). Thus, a Binomial distribution is determined by two numbers: the number of trials, n, and the probability of success, p, with p being the only unknown parameter (since n is usually known). This is different from the normal distribution N(µ, σ), which is determined by two unknown parameters: the mean, µ, and the standard deviation, σ. Remarks: 1). In practice, a binomial random variable X arises in the following settings: i) there are n i.i.d. Bernoulli trials, with n known and fixed; ii) the probability of success p is the same for each trial; iii) X is the number of successes out of the n trials. 2). The above n trials may be viewed as a sample of size n. The number of successes, X, is the sample count. The proportion X/n, denoted by ˆp, is the sample proportion, and it indicates the proportion of the sample trials that were successful (i.e., the number of successes divided by the total number of trials). The probability of success, p, is the population proportion, and it represents the (usually unknown) true proportion of success in the population. Since X is a count from a sample, the distribution of X may be viewed as the sampling distribution of a count. Remember that X follows a binomial distribution. Theorem 1. If X B(n, p), then E(X) = np,

Lecture notes, Lang Wu, UBC 5 V ar(x) = np(1 p). Thus, for a binomial random variable, X, we can immediately obtain its mean and variance using the above formulas. Example 2. Suppose the probability of getting a certain disease is 0.001, and suppose 50 people are randomly selected. (1) What is the probability of exactly one person having the disease? (2) What is the probability of at least one person having the disease? (3) How many people should be selected so there is a 90% chance of at least one of them having the disease? (4) Find the mean and standard deviation of the number of people who have the disease among the 50 people. Solution: Each randomly selected person either has the disease or does not have the disease. Let X be the number of people who have the disease among n randomly selected people. We are working with a binomial distribution where n = 50 and p = 0.001. (1) The probability that exactly one person has the disease is given by P (X = 1) = 50 0.001 0.999 49 = 0.0476. 1 (2) The probability that at least one person has the disease is given by P (X 1) = 1 P (X = 0) = 1 50 0.001 0 0.999 50 = 0.0488. 0 (3) In this case, n is unknown and needs to be determined. We need to find the value of n so that P (X 1) = 0.9, i.e., 1 P (X = 0) = 0.9 or P (X = 0) = 0.1. Thus n 0 0.001 0 0.999 n = 0.999 n = 0.1, i.e., n log(0.999) = log(0.1). Solving the above equation, we have n = log(0.1) log(0.999) 2303. That is, we must select 2303 people to ensure there is 90% chance of at least one of them having the disease. (4) When n = 50 and p = 0.001, the mean and standard deviation of X are given by E(X) = np = 50 0.001 = 0.05.

Lecture notes, Lang Wu, UBC 6 σ X = V ar(x) = np(1 p) = 50 0.001 0.999 = 0.22. For n = 50, we have a mean of 0.05 people having the disease and a standard deviation of 0.22 people. Example 3. The probability of a battery life exceeding 4 hours is 0.135. There are three batteries in use. (1) Find the probability that at most 2 batteries last for 4 or more hours; (2) Find the mean and standard deviation of the number of batteries lasting 4 or more hours. Solution: A battery s life will either exceed 4 hours or not exceed 4 hours. Let X be the number of batteries lasting 4 or more hours. Here we have n = 3 and p = 0.135. Thus, (1) P (X 2) = 1 P (X = 3) = 1 0.135 3 (1 0.135) 0 = 0.997. (2) E(X) = np = 3 0.135 = 0.405. σ X = np(1 p) = 3 0.135 0.865 = 0.59. For n = 3, the mean is 0.405 batteries exceed 4 hours and the standard deviation is 0.59 batteries. 5.2.2. Sampling Distribution of the sample proportion A major goal in statistics is to make inferences about unknown population parameters. We do this by using sample statistics to estimate corresponding population parameters. For example, we might use sample proportions to estimate population proportions or use sample means to estimate population means. There is uncertainty in these estimations because the value of a statistic will vary from one sample to the next. To measure the uncertainty of each estimation, we look at the variability of the statistic (i.e., how much its value might vary from one sample to the next). To do this, we need to find the distribution of the sample statistic that is used to estimate the population parameter. This distribution is called the sampling distribution of the corresponding statistic. In this section, we consider a discrete population that follows a Bernoulli distribution (i.e., a population that is split into two groups, or a binary population), as described in the previous section. For a population that follows a Bernoulli distribution, the parameter of interest is the population proportion, p, which is the proportion of success in the population (or the proportion of the population with the attribute of

Lecture notes, Lang Wu, UBC 7 interest). Recall the difference between a proportion and a percentage: a percentage is a proportion multiplied by 100. A proportion is a number between 0 and 1, while a percentage is a number between 0 and 100. Examples of population proportions include the proportion of people who are literate, the proportion of people who smoke, the proportion of people with cancer, etc. Recall also the difference between a parameter and a statistic: a parameter is a population characteristic, while a statistic is a function or measure of data in a sample. The difference between the population proportion, p, and the sample proportion, ˆp, is the difference between a parameter and a statistic. Let p be an (unknown) population proportion of success. We select a sample of size n and think of it as n independent Bernoulli trials, with x being the number of successes. Using the information in the sample, we can calculate the sample proportion (denoted by ˆp): number of successes in the sample ˆp = = x sample size n. We can then use ˆp as an estimation of p. For example, if the unknown parameter p is the proportion of people who smoke in Canada, then perhaps ˆp is the proportion of people who smoke in a randomly selected sample of n individuals in Canada. Here, ˆp is a number we can calculate and it gives us an estimate of p. From the previous section, we know that before we collect the data, the number of successes, X, is a random variable that follows a Binomial distribution. Once we have the data, we are interested in the distribution of the sample proportion ˆp (i.e., the sampling distribution of ˆp), which is unknown. Remember that the sampling distribution of ˆp is the the distribution of all possible values of ˆp if ˆp is calculated for an infinite number of samples of equal size taken from the same population. This distribution will allow us to be fairly confident that the actual value of p lies within a certain interval. The distribution of ˆp is difficult to find, so we often approximate it with the normal distribution, as described below in Theorem 3. In addition, the mean and standard deviation of the distribution of ˆp can be easily found, as shown in Theorem 2 below. Note that the normal distribution is completely determined by its mean and standard deviation, but this property does not hold for all distributions. Theorem 2. The mean and variance of the sampling distribution of the sample proportion ˆp are respectively given by E(ˆp) = p, V ar(ˆp) = p(1 p), n

Lecture notes, Lang Wu, UBC 8 where p is the population proportion. Note that Theorem 2 only gives the mean and variance (or standard deviation) of the distribution of ˆp. We still do not know what the exact distribution of ˆp is. However, Theorem 3 below shows that the distribution of ˆp can be approximated by a normal distribution. Theorem 3. If the sample size n is sufficiently large such that then np 10 and n(1 p) 10, (i) the sampling distribution ( of the ) sample proportion, ˆp, can be approximated by the normal distribution N p,, p(1 p) n (ii) the distribution of the number of successes, X, can be approximated by the normal distribution N ( np, np(1 p) ). Theorem 3 shows that both ˆp and X may be approximated by normal distributions when the sample size, n, is large. Here, large means np 10 and n(1 p) 10. Some books use the condition np 5 and n(1 p) 5. Readers should not worry about the specific numbers 5 or 10. The key thing is, in order for the normal approximations to be accurate, n should be large and p should not be too close to 0 or 1. The larger the sample size, n, the more accurate the normal approximations. Theorem 3 (ii) can be used to quickly calculate binomial probabilities. We know that X follows B(n, p). However, computation of probabilities such as P (X < k) can be quite tedious if k is not small. For example, P (X < 10) requires computation of 10 binomial probabilities that are then added together. If we instead use the normal approximation in Theorem 3 (ii), the normal distribution will quickly give us an approximate answer to P (X < 10), as shown in the examples below. We will explore the idea of inferring population parameters from sample statistics in more detail in the next chapter. For now, we focus on familiarizing ourselves with the relationships between parameters and sampling distributions (Theorem 2), as well as how information can be gathered from an approximated sampling distribution (Theorem 3). In the following examples, the population proportion, p, is already known, so we explore how knowing this proportion can allow us to approximate the sampling distribution of ˆp and then gather information from it.

Lecture notes, Lang Wu, UBC 9 Example 4. Suppose 20% of people in a certain city smoke. A sample of 100 people are randomly selected from this city. Find the probability that more than 30% of people in this sample smoke. Solution: Here the population proportion is known to be p = 0.2, and the sample size is n = 100. The sample proportion ˆp can be viewed as a random variable before we observe the data in the sample. Since np = 20 > 10 and n(1 p) = 80 > 10, we can use a normal approximation to find the probability P (ˆp > 0.3). By Theorem 3, we have, approximately, 0.2 0.8 ˆp N 0.2, = N(0.2, 0.04). 100 Thus, an approximation of P (ˆp > 0.3) is given by ( ) ˆp 0.2 0.3 0.2 P (ˆp > 0.3) = P > (standardization) 0.04 0.04 P (Z > 2.5) = P (Z 2.5) = 0.0062. where we first use standardization (i.e., subtract the mean and divide by the standard deviation) to get to the standard normal distribution, and then look up the probability for the specific z value. Note that for this problem, we can also do exact computation using the binomial distribution (where n = 100 and we are finding the probability that more than 30 people smoke): P (ˆp > 0.3) = P (X > 0.3 100) = P (X > 30) = P (X = 31) + P (X = 32) + + P (X = 100) = 100 0.2 31 0.8 69 + 100 0.2 32 0.8 68 + + 100 0.2 100 0.8 0 31 32 100 which is very tedious to compute! Example 5. A fair coin is tossed 60 times. Find the probability that less than 1/3 of the results are heads. Solutions: Let X be the number of heads. Here we have n = 60, p = 0.5, np = n(1 p) = 30 > 10, so we can use a normal approximation 0.5 0.5 ˆp N 0.5, = N(0.5, 0.0645). 60

Lecture notes, Lang Wu, UBC 10 Thus ( ˆp 0.5 1 P (ˆp < 1/3) = P 0.0645 < 0.5 ) 3 = P (Z < 2.58) = 0.0049. 0.0645 This problem can also be solved exactly using binomial distributions, but the computation is again very tedious. The general method for these types of problems is to approximate the binomial distribution with the normal distribution (after checking all requirements are met), convert this distribution to a standard normal distribution using standardization, and then look up the probability for the resulting z value using a standard normal table or statistical software. 5.3 The Sampling Distribution of a Sample Mean In the previous section, we considered (discrete) binary populations that follow Bernoulli distributions, as well as the sampling distribution of the sample proportion. In this section, we consider a population distribution that is continuous and has mean µ and standard deviation σ. The population is not necessary normally distributed. (Remember that a normal distribution is completely determined by µ and σ but a general continuous distribution may not be completely determined by µ and σ.) The parameters µ and σ are unknown. We will use the sample mean, x, as an estimate of the population mean, µ. To measure the accuracy of this estimation, we need to find the sampling distribution of the sample mean x, i.e., the distribution of all possible values of the sample mean, x, if infinitely many samples of equal size are taken from the same population and the mean is calculated for each of them. (Note: when we talk about the sampling distribution of x, we are viewing x as a random variable because we are considering all possible samples. If we instead focus on a specific sample with observed data, then x is a number.) When the population distribution is unknown (except that it is continuous), the exact sampling distribution of the sample mean, x, cannot be known either. However, if we know the population parameters, we can still obtain the mean and standard deviation of the sampling distribution of the sample mean, x, as shown in the theorem below. Moreover, when the sample size is large, we can use a normal distribution to approximate the sampling distribution of the sample mean, x.

Lecture notes, Lang Wu, UBC 11 Theorem 4. Consider a continuous population with mean µ and standard deviation σ. When the population distribution is unknown, we have (i) the mean of all possible values of x (i.e., the mean of the sampling distribution of x, or the mean of the sample mean) is equal to the population mean: E( x) = µ; (ii) The standard deviation of all possible values of x (i.e., the standard deviation of the sampling distribution of x, or the standard deviation of the sample mean) is n times smaller than the population standard deviation: σ x = σ n, or V ar( x) = σ2 n. As you can see, the formulas for the mean and standard deviation of the sample mean distribution depend on the population µ and σ. This shows the relationship between the parameters and the sampling distribution of the sample mean. In practice, however, the parameters µ and σ are usually unknown, so we must estimate them using the statistics we have from a sample. We use the sample mean, x, to estimate the population mean, µ. Plugging x instead of µ into Theorem 4(i), we get an estimate of the mean of the sampling distribution of the sample mean. Similarly, we use the sample standard deviation, ˆσ = s, to estimate the population standard deviation, σ. Plugging ˆσ instead of σ into Theorem 4(ii), we get an estimate of the standard deviation of the sampling distribution of the sample mean. We call this estimate the standard error of the sample mean x, given by ˆσ x = ˆσ n. In other words, the standard error is an estimate of the standard deviation of the sample mean distribution. Since σ x = σ n, the larger the sample size, n, the smaller the standard error of the distribution of x is (i.e., less variability in x), and so the more accurate x is as an estimate of µ. As an example, suppose that you wish to get an accurate measure of your blood pressure. One way to increase your accuracy is to measure your blood pressure as many times as possible and then take an average of the measurements. In Theorem 4, we give the mean and standard deviation of the distribution of the sample mean, x. We still do not know the exact distribution of x since the population distribution is unknown and mean and standard deviation cannot completely determine a continuous distribution (unless it is a normal distribution). However, if the population

Lecture notes, Lang Wu, UBC 12 distribution is known to be normal or if the sample size, n, is large, the distribution of x is either exactly or approximately normal, as shown in the theorem below. Theorem 5. (i) If the population follows a normal distribution, N(µ, σ), then the sample mean x also follows a normal distribution exactly: ( ) σ x N µ,. n (ii) If the population distribution is unknown but the sample size, n, is large (say, n 25), then the sample mean, x, approximately follows the following normal distribution ( ) σ x N µ,, n which is the same distribution as the one in (i). Based on Theorem 5, when the sample size, n, is reasonably large, the distribution of the sample mean, x, will approximately follow the distribution N ( µ, σ n ). Some books use n 25 as a condition and some books use n 30 or another number as a condition. Readers should not worry too much about the specific number, since it just sets a benchmark of accuracy for the normal approximation. The larger the value of n, the more accurately the normal distribution will approximate the distribution of x. Generally, if n < 10, the normal approximation may be poor. Example 6. Suppose the weights of all adults in a large city form a distribution with mean µ = 140 (pounds) and standard deviation σ = 20 (pounds). A sample of 25 adults in the city is randomly selected. Find the probability that the mean weight of the adults in the sample is at least 144 pounds. Solutions: Here, we know the value of the parameters µ and σ so we can calculate the mean and standard deviation of the distribution of the sample mean, x. E( x) = µ = 140 and σ x = σ n = 20/ 25 = 4. Since n = 25, we can approximate the sample mean distribution by a normal distribution: x N(140, 4). Now that we have approximated the sample mean distribution, we can calculate probabilities of certain values. We have P ( x 144) = P (Z 144 140 ) = P (Z 1) = 0.1587. 4

Lecture notes, Lang Wu, UBC 13 Example 7. The weights of large eggs follow a normal distribution with a mean of 1 oz and a standard deviation of 0.1 oz. What is the probability that a dozen (12) eggs weigh more than 13 oz.? Solution: We are given the population mean, standard deviation, and distribution, so we can directly use the above theorems. Since the population follows N(1, 0.1), the sample mean x follows N(1, 0.1/ 12) or N(1, 0.029). Let X i be the weight of egg i, i = 1, 2,, 12. Then, the total weight of 12 eggs is 12 i=1 X i and the mean weight is 12 i=1 X i = x. Thus, 12 ( 12 ) P X i > 13 = P ( x > 13 ) = P ( x > 1.083) i=1 12 ( = P Z > 1.083 1 ) 0.029 = P (Z > 2.86) = 0.0021. In this example, the sample size n = 12 is not large, but we know the population distribution, so we have the exact sampling distribution for the sample mean, x. 5.4. The Central Limit Theorem In the previous sections, we have seen that regardless of whether the population is discrete or continuous, the distributions of the sample proportion and sample mean can be approximated by normal distributions when the sample sizes are large. There is a reason for this the normal approximations are justified by the so-called central limit theorem (CLT). The central limit theorem is one of the most important theorems in Statistics. Basically, the CLT says that, no matter what the population distribution may be, when the sample size is sufficiently large, the mean of i.i.d. random variables will be approximately normally distributed. Note that both the sample proportion, ˆp, and the sample mean, x, can be written as means of independent and identically distributed (i.i.d.) random variables. This is obvious for the sample mean, x. We can see that the sample proportion ˆp can also be written as a mean: ni=1 x i ˆp =, n where each x i only takes on a value of 0 or 1. Note also that a simple random sample (SRS) {x 1, x 2,, x n } can be viewed as having i.i.d. random variables,

Lecture notes, Lang Wu, UBC 14 as noted earlier. The Central Limit Theorem (CLT). The CLT can be stated as follows: (i) If a continuous population has mean µ and standard deviation σ, when the sample size n in a SRS is large, the sample mean approximately follows the following normal distribution x N ( µ, ) σ. n (ii) If a binary (or Bernoulli) population has proportion of success p, when the sample size n in a SRS is large, the sample proportion approximately follows the following normal distribution p(1 p) ˆp N p,. n Remark: In the CLT above, the sample size, n, needs to be large in order for the normal approximations to be accurate. For a continuous population, we usually need n 25, while for a binary population, we need np 10 and n(1 p) 10. These are rough guidelines. The larger n is, the more accurate the normal distributions are as approximations. An SRS ensures i.i.d. random variables because each individual is randomly selected. Note that, for a continuous population, we do not need to know the population distribution when applying the CLT. The CLT not only holds for binary and continuous populations, but also holds for other populations, such as counts. The key here is that the data in the sample must be i.i.d. (e.g., in an SRS,) and the statistic must be a sum or a mean. The CLT can be used to provide an approximate distribution for a statistic if the statistic can be written as a mean (or a sum) of i.i.d. random variables. Since many statistics may be expressed (or approximated) as sums or means of i.i.d. random variables, many statistics may be assumed to approximately follow normal distributions. This explains why the normal distribution is the most common distribution in statistics. However, some statistics, such as the median or the sample standard deviation, cannot be written as a sum or mean of i.i.d. random variables. When this is the case, the CLT cannot be used, so these statistics will not approximately follow normal distributions even when the sample size is large. The sampling distributions of the sample proportion and the sample mean are examples of applications of the CLT. We give one more example below. Example 8. Suppose the scores in a standard test have an average of 500 and

Lecture notes, Lang Wu, UBC 15 a standard deviation of 60. A group of 49 students take the test. (1) What is the probability that the average score of the group will fall between 480 and 520? (2) Find a range of scores such that the group average will fall within this range with a probability of 0.95. Solution: In this example, we do not know the exact population distribution, but we know it is continuous and has mean µ = 500 and standard deviation σ = 60. We can assume the 49 students are an SRS. Since the sample size is 49, which is large, we may apply the central limit theorem and approximate the distribution of the sample mean by a normal distribution. Let x be the group mean. Then, the distribution of x can be approximated by x N(500, 60/ 49) (i.e., N(500, 60/7)). (1) We have 480 500 520 500 P (480 < x < 520) = P ( < Z < ) 60/7 60/7 = P ( 2.33 < Z < 2.33) = 2P (0 < Z < 2.33) = 2(P (Z < 2.33) 0.5) = 0.9802. (2) From (1), x N(500, 60/7) approximately. By the 68-95-99.8 rule for a normal distribution, we have P (µ 2σ x < x < µ + 2σ x ) 0.95. So and 2σ x = 2 60 7 = 17.14, 500 17.14 = 482.86, 500 + 17.14 = 517.14. Thus, with probability 0.95, x will fall between 482.86 and 517.14. In this example, we do not have to use the 68-95-99.8 rule. If we use a standard normal table, then we should replace 2 with 1.96 in the above calculations. Note that a continuous population can be converted into a binary population. For example, in the above example, if we are only interested in the proportion of students who scored over 600, then we have a binary population. The corresponding sample can

Lecture notes, Lang Wu, UBC 16 also be converted into binary data: each student s score is either above 600 or below 600. When we convert continuous data into binary data, we will lose some information. However, sometimes we are only interested in certain pieces of information, such as if a student s score is above 600 or not. In this sense, we do not actually lose any crucial information. 5.5. Chapter Summary In this chapter, we examined the sampling distributions of the sample proportion, ˆp, and the sample mean, x. These sampling distributions are important when making statistical inferences for the unknown population proportion, p, or population mean, µ, as will be shown in the next few chapters. When the sample size is large, the sampling distributions of ˆp and x can be approximated by normal distributions, which can then be used in statistical inference. When the sample size is small, we must know the population distributions in order to know the sampling distributions. The CLT can be used to approximate the sampling distribution of a statistic if the statistic can be written as a sum or mean of i.i.d. random variables. 5.6. Review Questions 1. What is a sampling distribution? Why do we need to consider sampling distributions? 2. Can you think of a sample that does not have i.i.d. random variables? 3. Can we use the CLT to find the sampling distribution of a sample correlation r? Why? 4. I have a box containing a number of tickets numbered between -10 and +10. The mean of the numbers is 0 and the standard deviation is 5. I am going to make a number of draws, with replacement, from the box. If the mean of the numbers that I draw falls between -1 and +1, I win and you will give me $10. Otherwise, you win and I will give you $10. Which of the following number of draws will give you the best chance of winning?

Lecture notes, Lang Wu, UBC 17 A. 10 B. 20 C. 100 D. There is insufficient information to tell 5. Suppose the daily precipitation in a city in December is uniformly distributed between 0mm and 15mm. For the month of December (with 31 days), what is the probability that the daily precipitation is less than 10 mm on at least 20 days? Assume the daily precipitation for the different days are independent. Choose the most appropriate answer. A. Less than 0.16 B. Between 0.16 and 0.5 C. Between 0.5 and 0.84 D. Between 0.84 and 0.975 E. Greater than 0.975 6. Ture or false: For a continuous population, the sampling distribution of the sample mean has the same mean as the population mean but has a smaller standard deviation as long as the sample size is larger than 1. 7. Ture or false: The sample mean always under-estimates the population mean because of sampling variation. 8. Ture or false: If the population is uniformly distributed on an interval, the sample mean of a sample taken from this population will still be approximately normally distributed if the sample size is large (say larger than 30). 9. The waiting time for a bus follows a uniform distribution with a mean of 5 hours and a standard deviation of 1 hour. A student takes the bus 100 times in a semester. There is a 95% chance that the average waiting time for this student during that semester is approximately within which of the following hours of 5 hours. (a) 0.1 (b) 0.2 (c) 2 (d) 1