MAT 2379 (Spring 2012) Sampling Distribution Definition : Let X 1,..., X n be a collection of random variables. We say that they are identically distributed if they have a common distribution. Definition : Let X 1,..., X n be random variables. We say that X 1,..., X n is a random sample, if they are independent and identically distributed. We will model an experiment with n independent trial as a random sample. The common distribution is called the population. The mean of this common distribution is called the population mean and is denoted µ, in other words E(X i ) = µ, for i = 1, 2,..., n. The variance of this common distribution is called the population variance and it is denoted σ 2, in other words V (X i ) = σ 2 for i = 1, 2,..., n. Definition : A fonction of the random sample is called a statistic. A statistic is a random variable and its probability distribution is called its sampling distribution. Remark : In these note, we will discuss some results concerning the sampling distribution of the mean X of a sample of size n from a population with a mean µ and a variance σ 2. We have already shown that E(X) = µ and Var(X) = σ2 n. But what is the sampling distribution of X? Here a few results that we help us answer this question. 1
Theorem : Let X 1,..., X n be independent normal random variables such that X i N(µ i, σ 2 i ) for i = 1,..., n. Let Y = a 1 X 1 + a 2 X 2 +... + a n X n, that is Y is a linear combination of X 1,..., X n. Then, Y has a the following normal distribution : ( n ) n N(E[Y ], Var[Y ]) = N a i µ i, a 2 i σi 2. i=1 i=1 Consequence : Consider a random sample X 1,..., X n from a normal population with a mean µ and a variance σ 2. The mean of the n normal random variables, X = X 1 +... + X n, n is a normal random variable. That is, X N(µ, σ 2 /n). Thus, Z = X E(X) V (X) = X µ σ/ n N(0, 1). Remark : We often do not know the population standard deviation and so we will rather work with the following statistic T = X µ S/ n. The sampling distribution of T is not a normal distribution. However, if the population is normal, then T has a T distribution (often called Student) with ν = n 1 degrees of freedom. The probability density of T is centered about zero, but it is more dispersed than a standard normal distribution N(0, 1). In fact, it can be shown that, for ν > 2, σ T = ν/(ν 2) > 1. 2
When the degrees of freedom ν, the T distribution tends to a standard normal distribution N(0, 1). There is a table for the T distribution in the textbook, see Table 17.4. Notation : t A,ν is a quantile such that P (T > t A,ν ) = A, where T has a T distribution with ν degrees of freedom. Example 1 : Consider a certain large population of mammals such that their weight is normally distributed with a mean of 25 kg and a standard deviation of 5 kg. We will select a random sample of 20 of these mammals. (a) What is the probability that the mean weight of the 25 mammals is between 23 kg and 27 kg? (b) Let X and S, be the sample mean and the sample standard deviation of a sample of size n = 25, respectively. Define Z = X 25 5/ 25 and T = X 25 S/ 25. (i) What is the distribution of Z and of T? (ii) Compute P (Z > 2) and P (T > 2). (iii) Compute P (Z > 4) and P (T > 4) (iv) Compute P (Z < 1,55) and P (T < 1,55). 3
Remark : When using Table 17.4 to compute probabilities involving a T random variable, we will often not be able to give the exact probability. We will only be able to find an interval where the probability might reside. However, we can find the probability with Minitab. Question : If the population is not normal, then what is the sampling distribution of X? A partial answer is giving in the following theorem. Theorem : Consider a random sample X 1,..., X n from a population with a mean µ and a variance σ 2. Let Φ be the cumulative distribution function for the standard normal distribution N(0, 1). (a) If then Z n = X µ σ/ n, lim F Z n = Φ. n (b) If then T n = X µ S/ n, lim F T n = Φ. n The result in (a) is often called the central limit theorem. As long as n is sufficiently large, then we can approximate the sampling distribution of the sample mean with a normal distribution. We will use the following rule-of-thumb : if n 30 then X µ σ/ n N(0, 1)approximately. 4
With n sufficiently large, we can also approximate the sampling distribution of the T statistic with a standard normal distribution. We will use the following rule-of-thumb : if n 40, then X µ S/ n N(0, 1)approximately. Example 2 : Suppose we observe the age of 30 female grizzly bears in particular region at the time of their first surviving litter. Suppose that the population has a mean of 8.4 years and a standard deviation of 3 years. Approximate the probability that the average age of 30 females, when they produced their first surviving litter, is less than 7 years? 5