Chapter 5. Sampling Distributions

Size: px
Start display at page:

Download "Chapter 5. Sampling Distributions"


1 Lecture notes, Lang Wu, UBC 1 Chapter 5. Sampling Distributions 5.1. Introduction In statistical inference, we attempt to estimate an unknown population characteristic, such as the population mean, µ, using data in a sample, such as the sample mean, x. That is, we might use the sample mean, x, to estimate the population mean, µ. The accuracy of this estimation depends on the sample size, n, and the variability of the data. We can better understand the uncertainty in the estimation, as well as the basic idea behind statistical inference, by introducing an important concept called the sampling distribution. A sample statistic (e.g., x,) can be conceptually viewed as a random variable, because before we collect the data, we do not know what value the statistic will take. The statistic might take on any number in a range of values. Thus, it has a probability distribution, with the probability of certain values higher than the probability of others. The mean and variance of this distribution can be used to estimate the accuracy of using this statistic to estimate a population parameter. The probability distribution of a statistic is called the sampling distribution of this statistic. In other words, the sampling distribution of a statistic may be viewed as the distribution of all possible values of this statistic. For example, the sampling distribution of the sample mean, x, is the distribution of all possible values of x. So if we take many samples from the same population and calculate x for each of them, the values we get will all fall somewhere along the distribution. By examining the sampling distribution of x, we can get an idea of the variability and range of x, which are used to determine the accuracy of using x to estimate µ. For example, suppose we wish to estimate the average sleep time of all students in a university. Here, the population is all students in the university, and the population parameter of interest is average sleep time, denoted by µ. We can randomly select 10 students from this university and record the average sleep time of these 10 students, which is the sample mean x with sample size n = 10. Suppose that x = 6.5 (hours) for these 10 students. If we randomly select another sample of 10 students, we may obtain a different value of x, say, x = 7 (hours). Repeating this procedure many times, we obtain many values of x, such as 6.5, 7,. The probability distribution of all possible values of x is called the sampling distribution of x. This procedure is used as an illustration

2 Lecture notes, Lang Wu, UBC 2 since it may not be feasible in practice. For some populations, such as a normally distributed population, we can obtain the sampling distribution of the sample mean, x, via theoretical derivations. We examine this more later. Note that the population distribution and the sampling distribution are two different concepts. The population distribution refers to the distribution of a characteristic in the population, while the sampling distribution refers to the distribution of a particular statistic for repeated samples taken from the same population. Note also that if we randomly choose an individual from the population, the value of their characteristic can be seen as a random variable, X, whose probability distribution follows the population distribution. There are many different sample statistics, so there are many different sampling distributions. Here, we focus on the sampling distributions of the two most important statistics: the sampling distribution of the sample proportion the sampling distribution of the sample mean We focus on the above two sampling distributions because they are crucial to two respective population distributions: the binomial distribution (for discrete data) and the normal distribution (for continuous data). The sample proportion is the most important statistic for a population with a binomial distribution, and the sample mean is the most important statistic for a population with a normal distribution. Moreover, the sampling distributions of these two statistics can be derived theoretically. For sampling distributions of other statistics, such as the sample variance, readers are referred to more advanced textbooks Sampling Distribution of the Sample Proportion The Binomial Distribution Before we discuss the sampling distribution of a sample proportion, we first introduce an important distribution for a discrete binary population: the Bernoulli distribution. In practice, many random variables take on only two possible values, often denoted by the

3 Lecture notes, Lang Wu, UBC 3 binary numbers 0 and 1 (or thought of as success and failure ). Random variables of this nature are said to follow a Bernoulli distribution. For example, a student taking a course can either pass (1) or fail (0). If you toss a coin, you will get either heads (1) or tails (0). In an election, a randomly selected person can either vote for candidate A (1) or vote against candidate A (0). We can view these examples as experiments with only two possible outcomes, often called Bernoulli trials. Going back to the example regarding taking a course, let s say we randomly select 10 students from a large class. Each student can pass or fail the course. We can view this as an experiment consisting of 10 trials, with each trial having two possible outcomes (pass or fail), and our interest lying in the number of students who pass the course. Moreover, the 10 students may be viewed as independent and identically distributed. Independent because they are randomly selected; identically distributed because we do not know who will be selected so the probability of passing the course is the same for all students in the class (e.g., each student has a passing probability of 0.8 and failing probability of 0.2). The other examples above may be viewed in a similar way. Example 1. In an election, a recent poll shows that 40% of people will vote for candidate A. Suppose that three people are randomly selected. (1) What is the probability that exactly two people vote for candidate A? (2) What is the probability that at least one person votes for candidate A? Solution: Here, each person has two options: vote for candidate A or vote for someone else, so we can view each person as a random variable that follows a Bernoulli distribution. We can assume the three people are independent. Let X i = 1 if person i votes for candidate A and X i = 0 otherwise, i = 1, 2, 3. Let X be the total number of people, among the three who were selected, who vote for candidate A. Then, X = X 1 +X 2 +X 3, with X = 3 meaning all three people vote for candidate A. (1) The probability that exactly two people vote for candidate A is given by P (X = 2) = (1 0.4) = 0.288, 2 where the term 3 2 is the number of possible ways to have 2 out of 3 people vote for candidate A, the term (1 0.4) is the probability that 2 people vote for candidate A and the other one does not, assuming the 3 people are independent (so we can use the multiplication rule and multiply the probabilities).

4 Lecture notes, Lang Wu, UBC 4 (2) The probability that at least one person votes for candidate A is P (X 1) = 1 P (X = 0) = = Alternatively, we can use P (X 1) = P (X = 1) + P (X = 2) + P (X = 3) to get the same answer, but the computation is more tedious. In general, we can consider n independent and identically distributed Bernoulli trials at once, where each trial has only two possible outcomes ( success or failure ). We are often interested in the probability of a certain number of successes. Let p be the probability of success for each trial, and let X be the total number of successes. Then, the probability distribution of X is given by P (X = k) = n p k (1 p) n k, k = 0, 1, 2,, n. k The above distribution is called the binomial distribution, denoted by X B(n, p), or X Bin(n, p). Thus, a Binomial distribution is determined by two numbers: the number of trials, n, and the probability of success, p, with p being the only unknown parameter (since n is usually known). This is different from the normal distribution N(µ, σ), which is determined by two unknown parameters: the mean, µ, and the standard deviation, σ. Remarks: 1). In practice, a binomial random variable X arises in the following settings: i) there are n i.i.d. Bernoulli trials, with n known and fixed; ii) the probability of success p is the same for each trial; iii) X is the number of successes out of the n trials. 2). The above n trials may be viewed as a sample of size n. The number of successes, X, is the sample count. The proportion X/n, denoted by ˆp, is the sample proportion, and it indicates the proportion of the sample trials that were successful (i.e., the number of successes divided by the total number of trials). The probability of success, p, is the population proportion, and it represents the (usually unknown) true proportion of success in the population. Since X is a count from a sample, the distribution of X may be viewed as the sampling distribution of a count. Remember that X follows a binomial distribution. Theorem 1. If X B(n, p), then E(X) = np,

5 Lecture notes, Lang Wu, UBC 5 V ar(x) = np(1 p). Thus, for a binomial random variable, X, we can immediately obtain its mean and variance using the above formulas. Example 2. Suppose the probability of getting a certain disease is 0.001, and suppose 50 people are randomly selected. (1) What is the probability of exactly one person having the disease? (2) What is the probability of at least one person having the disease? (3) How many people should be selected so there is a 90% chance of at least one of them having the disease? (4) Find the mean and standard deviation of the number of people who have the disease among the 50 people. Solution: Each randomly selected person either has the disease or does not have the disease. Let X be the number of people who have the disease among n randomly selected people. We are working with a binomial distribution where n = 50 and p = (1) The probability that exactly one person has the disease is given by P (X = 1) = = (2) The probability that at least one person has the disease is given by P (X 1) = 1 P (X = 0) = = (3) In this case, n is unknown and needs to be determined. We need to find the value of n so that P (X 1) = 0.9, i.e., 1 P (X = 0) = 0.9 or P (X = 0) = 0.1. Thus n n = n = 0.1, i.e., n log(0.999) = log(0.1). Solving the above equation, we have n = log(0.1) log(0.999) That is, we must select 2303 people to ensure there is 90% chance of at least one of them having the disease. (4) When n = 50 and p = 0.001, the mean and standard deviation of X are given by E(X) = np = = 0.05.

6 Lecture notes, Lang Wu, UBC 6 σ X = V ar(x) = np(1 p) = = For n = 50, we have a mean of 0.05 people having the disease and a standard deviation of 0.22 people. Example 3. The probability of a battery life exceeding 4 hours is There are three batteries in use. (1) Find the probability that at most 2 batteries last for 4 or more hours; (2) Find the mean and standard deviation of the number of batteries lasting 4 or more hours. Solution: A battery s life will either exceed 4 hours or not exceed 4 hours. Let X be the number of batteries lasting 4 or more hours. Here we have n = 3 and p = Thus, (1) P (X 2) = 1 P (X = 3) = ( ) 0 = (2) E(X) = np = = σ X = np(1 p) = = For n = 3, the mean is batteries exceed 4 hours and the standard deviation is 0.59 batteries Sampling Distribution of the sample proportion A major goal in statistics is to make inferences about unknown population parameters. We do this by using sample statistics to estimate corresponding population parameters. For example, we might use sample proportions to estimate population proportions or use sample means to estimate population means. There is uncertainty in these estimations because the value of a statistic will vary from one sample to the next. To measure the uncertainty of each estimation, we look at the variability of the statistic (i.e., how much its value might vary from one sample to the next). To do this, we need to find the distribution of the sample statistic that is used to estimate the population parameter. This distribution is called the sampling distribution of the corresponding statistic. In this section, we consider a discrete population that follows a Bernoulli distribution (i.e., a population that is split into two groups, or a binary population), as described in the previous section. For a population that follows a Bernoulli distribution, the parameter of interest is the population proportion, p, which is the proportion of success in the population (or the proportion of the population with the attribute of

7 Lecture notes, Lang Wu, UBC 7 interest). Recall the difference between a proportion and a percentage: a percentage is a proportion multiplied by 100. A proportion is a number between 0 and 1, while a percentage is a number between 0 and 100. Examples of population proportions include the proportion of people who are literate, the proportion of people who smoke, the proportion of people with cancer, etc. Recall also the difference between a parameter and a statistic: a parameter is a population characteristic, while a statistic is a function or measure of data in a sample. The difference between the population proportion, p, and the sample proportion, ˆp, is the difference between a parameter and a statistic. Let p be an (unknown) population proportion of success. We select a sample of size n and think of it as n independent Bernoulli trials, with x being the number of successes. Using the information in the sample, we can calculate the sample proportion (denoted by ˆp): number of successes in the sample ˆp = = x sample size n. We can then use ˆp as an estimation of p. For example, if the unknown parameter p is the proportion of people who smoke in Canada, then perhaps ˆp is the proportion of people who smoke in a randomly selected sample of n individuals in Canada. Here, ˆp is a number we can calculate and it gives us an estimate of p. From the previous section, we know that before we collect the data, the number of successes, X, is a random variable that follows a Binomial distribution. Once we have the data, we are interested in the distribution of the sample proportion ˆp (i.e., the sampling distribution of ˆp), which is unknown. Remember that the sampling distribution of ˆp is the the distribution of all possible values of ˆp if ˆp is calculated for an infinite number of samples of equal size taken from the same population. This distribution will allow us to be fairly confident that the actual value of p lies within a certain interval. The distribution of ˆp is difficult to find, so we often approximate it with the normal distribution, as described below in Theorem 3. In addition, the mean and standard deviation of the distribution of ˆp can be easily found, as shown in Theorem 2 below. Note that the normal distribution is completely determined by its mean and standard deviation, but this property does not hold for all distributions. Theorem 2. The mean and variance of the sampling distribution of the sample proportion ˆp are respectively given by E(ˆp) = p, V ar(ˆp) = p(1 p), n

8 Lecture notes, Lang Wu, UBC 8 where p is the population proportion. Note that Theorem 2 only gives the mean and variance (or standard deviation) of the distribution of ˆp. We still do not know what the exact distribution of ˆp is. However, Theorem 3 below shows that the distribution of ˆp can be approximated by a normal distribution. Theorem 3. If the sample size n is sufficiently large such that then np 10 and n(1 p) 10, (i) the sampling distribution ( of the ) sample proportion, ˆp, can be approximated by the normal distribution N p,, p(1 p) n (ii) the distribution of the number of successes, X, can be approximated by the normal distribution N ( np, np(1 p) ). Theorem 3 shows that both ˆp and X may be approximated by normal distributions when the sample size, n, is large. Here, large means np 10 and n(1 p) 10. Some books use the condition np 5 and n(1 p) 5. Readers should not worry about the specific numbers 5 or 10. The key thing is, in order for the normal approximations to be accurate, n should be large and p should not be too close to 0 or 1. The larger the sample size, n, the more accurate the normal approximations. Theorem 3 (ii) can be used to quickly calculate binomial probabilities. We know that X follows B(n, p). However, computation of probabilities such as P (X < k) can be quite tedious if k is not small. For example, P (X < 10) requires computation of 10 binomial probabilities that are then added together. If we instead use the normal approximation in Theorem 3 (ii), the normal distribution will quickly give us an approximate answer to P (X < 10), as shown in the examples below. We will explore the idea of inferring population parameters from sample statistics in more detail in the next chapter. For now, we focus on familiarizing ourselves with the relationships between parameters and sampling distributions (Theorem 2), as well as how information can be gathered from an approximated sampling distribution (Theorem 3). In the following examples, the population proportion, p, is already known, so we explore how knowing this proportion can allow us to approximate the sampling distribution of ˆp and then gather information from it.

9 Lecture notes, Lang Wu, UBC 9 Example 4. Suppose 20% of people in a certain city smoke. A sample of 100 people are randomly selected from this city. Find the probability that more than 30% of people in this sample smoke. Solution: Here the population proportion is known to be p = 0.2, and the sample size is n = 100. The sample proportion ˆp can be viewed as a random variable before we observe the data in the sample. Since np = 20 > 10 and n(1 p) = 80 > 10, we can use a normal approximation to find the probability P (ˆp > 0.3). By Theorem 3, we have, approximately, ˆp N 0.2, = N(0.2, 0.04). 100 Thus, an approximation of P (ˆp > 0.3) is given by ( ) ˆp P (ˆp > 0.3) = P > (standardization) P (Z > 2.5) = P (Z 2.5) = where we first use standardization (i.e., subtract the mean and divide by the standard deviation) to get to the standard normal distribution, and then look up the probability for the specific z value. Note that for this problem, we can also do exact computation using the binomial distribution (where n = 100 and we are finding the probability that more than 30 people smoke): P (ˆp > 0.3) = P (X > ) = P (X > 30) = P (X = 31) + P (X = 32) + + P (X = 100) = which is very tedious to compute! Example 5. A fair coin is tossed 60 times. Find the probability that less than 1/3 of the results are heads. Solutions: Let X be the number of heads. Here we have n = 60, p = 0.5, np = n(1 p) = 30 > 10, so we can use a normal approximation ˆp N 0.5, = N(0.5, ). 60

10 Lecture notes, Lang Wu, UBC 10 Thus ( ˆp P (ˆp < 1/3) = P < 0.5 ) 3 = P (Z < 2.58) = This problem can also be solved exactly using binomial distributions, but the computation is again very tedious. The general method for these types of problems is to approximate the binomial distribution with the normal distribution (after checking all requirements are met), convert this distribution to a standard normal distribution using standardization, and then look up the probability for the resulting z value using a standard normal table or statistical software. 5.3 The Sampling Distribution of a Sample Mean In the previous section, we considered (discrete) binary populations that follow Bernoulli distributions, as well as the sampling distribution of the sample proportion. In this section, we consider a population distribution that is continuous and has mean µ and standard deviation σ. The population is not necessary normally distributed. (Remember that a normal distribution is completely determined by µ and σ but a general continuous distribution may not be completely determined by µ and σ.) The parameters µ and σ are unknown. We will use the sample mean, x, as an estimate of the population mean, µ. To measure the accuracy of this estimation, we need to find the sampling distribution of the sample mean x, i.e., the distribution of all possible values of the sample mean, x, if infinitely many samples of equal size are taken from the same population and the mean is calculated for each of them. (Note: when we talk about the sampling distribution of x, we are viewing x as a random variable because we are considering all possible samples. If we instead focus on a specific sample with observed data, then x is a number.) When the population distribution is unknown (except that it is continuous), the exact sampling distribution of the sample mean, x, cannot be known either. However, if we know the population parameters, we can still obtain the mean and standard deviation of the sampling distribution of the sample mean, x, as shown in the theorem below. Moreover, when the sample size is large, we can use a normal distribution to approximate the sampling distribution of the sample mean, x.

11 Lecture notes, Lang Wu, UBC 11 Theorem 4. Consider a continuous population with mean µ and standard deviation σ. When the population distribution is unknown, we have (i) the mean of all possible values of x (i.e., the mean of the sampling distribution of x, or the mean of the sample mean) is equal to the population mean: E( x) = µ; (ii) The standard deviation of all possible values of x (i.e., the standard deviation of the sampling distribution of x, or the standard deviation of the sample mean) is n times smaller than the population standard deviation: σ x = σ n, or V ar( x) = σ2 n. As you can see, the formulas for the mean and standard deviation of the sample mean distribution depend on the population µ and σ. This shows the relationship between the parameters and the sampling distribution of the sample mean. In practice, however, the parameters µ and σ are usually unknown, so we must estimate them using the statistics we have from a sample. We use the sample mean, x, to estimate the population mean, µ. Plugging x instead of µ into Theorem 4(i), we get an estimate of the mean of the sampling distribution of the sample mean. Similarly, we use the sample standard deviation, ˆσ = s, to estimate the population standard deviation, σ. Plugging ˆσ instead of σ into Theorem 4(ii), we get an estimate of the standard deviation of the sampling distribution of the sample mean. We call this estimate the standard error of the sample mean x, given by ˆσ x = ˆσ n. In other words, the standard error is an estimate of the standard deviation of the sample mean distribution. Since σ x = σ n, the larger the sample size, n, the smaller the standard error of the distribution of x is (i.e., less variability in x), and so the more accurate x is as an estimate of µ. As an example, suppose that you wish to get an accurate measure of your blood pressure. One way to increase your accuracy is to measure your blood pressure as many times as possible and then take an average of the measurements. In Theorem 4, we give the mean and standard deviation of the distribution of the sample mean, x. We still do not know the exact distribution of x since the population distribution is unknown and mean and standard deviation cannot completely determine a continuous distribution (unless it is a normal distribution). However, if the population

12 Lecture notes, Lang Wu, UBC 12 distribution is known to be normal or if the sample size, n, is large, the distribution of x is either exactly or approximately normal, as shown in the theorem below. Theorem 5. (i) If the population follows a normal distribution, N(µ, σ), then the sample mean x also follows a normal distribution exactly: ( ) σ x N µ,. n (ii) If the population distribution is unknown but the sample size, n, is large (say, n 25), then the sample mean, x, approximately follows the following normal distribution ( ) σ x N µ,, n which is the same distribution as the one in (i). Based on Theorem 5, when the sample size, n, is reasonably large, the distribution of the sample mean, x, will approximately follow the distribution N ( µ, σ n ). Some books use n 25 as a condition and some books use n 30 or another number as a condition. Readers should not worry too much about the specific number, since it just sets a benchmark of accuracy for the normal approximation. The larger the value of n, the more accurately the normal distribution will approximate the distribution of x. Generally, if n < 10, the normal approximation may be poor. Example 6. Suppose the weights of all adults in a large city form a distribution with mean µ = 140 (pounds) and standard deviation σ = 20 (pounds). A sample of 25 adults in the city is randomly selected. Find the probability that the mean weight of the adults in the sample is at least 144 pounds. Solutions: Here, we know the value of the parameters µ and σ so we can calculate the mean and standard deviation of the distribution of the sample mean, x. E( x) = µ = 140 and σ x = σ n = 20/ 25 = 4. Since n = 25, we can approximate the sample mean distribution by a normal distribution: x N(140, 4). Now that we have approximated the sample mean distribution, we can calculate probabilities of certain values. We have P ( x 144) = P (Z ) = P (Z 1) =

13 Lecture notes, Lang Wu, UBC 13 Example 7. The weights of large eggs follow a normal distribution with a mean of 1 oz and a standard deviation of 0.1 oz. What is the probability that a dozen (12) eggs weigh more than 13 oz.? Solution: We are given the population mean, standard deviation, and distribution, so we can directly use the above theorems. Since the population follows N(1, 0.1), the sample mean x follows N(1, 0.1/ 12) or N(1, 0.029). Let X i be the weight of egg i, i = 1, 2,, 12. Then, the total weight of 12 eggs is 12 i=1 X i and the mean weight is 12 i=1 X i = x. Thus, 12 ( 12 ) P X i > 13 = P ( x > 13 ) = P ( x > 1.083) i=1 12 ( = P Z > ) = P (Z > 2.86) = In this example, the sample size n = 12 is not large, but we know the population distribution, so we have the exact sampling distribution for the sample mean, x The Central Limit Theorem In the previous sections, we have seen that regardless of whether the population is discrete or continuous, the distributions of the sample proportion and sample mean can be approximated by normal distributions when the sample sizes are large. There is a reason for this the normal approximations are justified by the so-called central limit theorem (CLT). The central limit theorem is one of the most important theorems in Statistics. Basically, the CLT says that, no matter what the population distribution may be, when the sample size is sufficiently large, the mean of i.i.d. random variables will be approximately normally distributed. Note that both the sample proportion, ˆp, and the sample mean, x, can be written as means of independent and identically distributed (i.i.d.) random variables. This is obvious for the sample mean, x. We can see that the sample proportion ˆp can also be written as a mean: ni=1 x i ˆp =, n where each x i only takes on a value of 0 or 1. Note also that a simple random sample (SRS) {x 1, x 2,, x n } can be viewed as having i.i.d. random variables,

14 Lecture notes, Lang Wu, UBC 14 as noted earlier. The Central Limit Theorem (CLT). The CLT can be stated as follows: (i) If a continuous population has mean µ and standard deviation σ, when the sample size n in a SRS is large, the sample mean approximately follows the following normal distribution x N ( µ, ) σ. n (ii) If a binary (or Bernoulli) population has proportion of success p, when the sample size n in a SRS is large, the sample proportion approximately follows the following normal distribution p(1 p) ˆp N p,. n Remark: In the CLT above, the sample size, n, needs to be large in order for the normal approximations to be accurate. For a continuous population, we usually need n 25, while for a binary population, we need np 10 and n(1 p) 10. These are rough guidelines. The larger n is, the more accurate the normal distributions are as approximations. An SRS ensures i.i.d. random variables because each individual is randomly selected. Note that, for a continuous population, we do not need to know the population distribution when applying the CLT. The CLT not only holds for binary and continuous populations, but also holds for other populations, such as counts. The key here is that the data in the sample must be i.i.d. (e.g., in an SRS,) and the statistic must be a sum or a mean. The CLT can be used to provide an approximate distribution for a statistic if the statistic can be written as a mean (or a sum) of i.i.d. random variables. Since many statistics may be expressed (or approximated) as sums or means of i.i.d. random variables, many statistics may be assumed to approximately follow normal distributions. This explains why the normal distribution is the most common distribution in statistics. However, some statistics, such as the median or the sample standard deviation, cannot be written as a sum or mean of i.i.d. random variables. When this is the case, the CLT cannot be used, so these statistics will not approximately follow normal distributions even when the sample size is large. The sampling distributions of the sample proportion and the sample mean are examples of applications of the CLT. We give one more example below. Example 8. Suppose the scores in a standard test have an average of 500 and

15 Lecture notes, Lang Wu, UBC 15 a standard deviation of 60. A group of 49 students take the test. (1) What is the probability that the average score of the group will fall between 480 and 520? (2) Find a range of scores such that the group average will fall within this range with a probability of Solution: In this example, we do not know the exact population distribution, but we know it is continuous and has mean µ = 500 and standard deviation σ = 60. We can assume the 49 students are an SRS. Since the sample size is 49, which is large, we may apply the central limit theorem and approximate the distribution of the sample mean by a normal distribution. Let x be the group mean. Then, the distribution of x can be approximated by x N(500, 60/ 49) (i.e., N(500, 60/7)). (1) We have P (480 < x < 520) = P ( < Z < ) 60/7 60/7 = P ( 2.33 < Z < 2.33) = 2P (0 < Z < 2.33) = 2(P (Z < 2.33) 0.5) = (2) From (1), x N(500, 60/7) approximately. By the rule for a normal distribution, we have P (µ 2σ x < x < µ + 2σ x ) So and 2σ x = = 17.14, = , = Thus, with probability 0.95, x will fall between and In this example, we do not have to use the rule. If we use a standard normal table, then we should replace 2 with 1.96 in the above calculations. Note that a continuous population can be converted into a binary population. For example, in the above example, if we are only interested in the proportion of students who scored over 600, then we have a binary population. The corresponding sample can

16 Lecture notes, Lang Wu, UBC 16 also be converted into binary data: each student s score is either above 600 or below 600. When we convert continuous data into binary data, we will lose some information. However, sometimes we are only interested in certain pieces of information, such as if a student s score is above 600 or not. In this sense, we do not actually lose any crucial information Chapter Summary In this chapter, we examined the sampling distributions of the sample proportion, ˆp, and the sample mean, x. These sampling distributions are important when making statistical inferences for the unknown population proportion, p, or population mean, µ, as will be shown in the next few chapters. When the sample size is large, the sampling distributions of ˆp and x can be approximated by normal distributions, which can then be used in statistical inference. When the sample size is small, we must know the population distributions in order to know the sampling distributions. The CLT can be used to approximate the sampling distribution of a statistic if the statistic can be written as a sum or mean of i.i.d. random variables Review Questions 1. What is a sampling distribution? Why do we need to consider sampling distributions? 2. Can you think of a sample that does not have i.i.d. random variables? 3. Can we use the CLT to find the sampling distribution of a sample correlation r? Why? 4. I have a box containing a number of tickets numbered between -10 and +10. The mean of the numbers is 0 and the standard deviation is 5. I am going to make a number of draws, with replacement, from the box. If the mean of the numbers that I draw falls between -1 and +1, I win and you will give me $10. Otherwise, you win and I will give you $10. Which of the following number of draws will give you the best chance of winning?

17 Lecture notes, Lang Wu, UBC 17 A. 10 B. 20 C. 100 D. There is insufficient information to tell 5. Suppose the daily precipitation in a city in December is uniformly distributed between 0mm and 15mm. For the month of December (with 31 days), what is the probability that the daily precipitation is less than 10 mm on at least 20 days? Assume the daily precipitation for the different days are independent. Choose the most appropriate answer. A. Less than 0.16 B. Between 0.16 and 0.5 C. Between 0.5 and 0.84 D. Between 0.84 and E. Greater than Ture or false: For a continuous population, the sampling distribution of the sample mean has the same mean as the population mean but has a smaller standard deviation as long as the sample size is larger than Ture or false: The sample mean always under-estimates the population mean because of sampling variation. 8. Ture or false: If the population is uniformly distributed on an interval, the sample mean of a sample taken from this population will still be approximately normally distributed if the sample size is large (say larger than 30). 9. The waiting time for a bus follows a uniform distribution with a mean of 5 hours and a standard deviation of 1 hour. A student takes the bus 100 times in a semester. There is a 95% chance that the average waiting time for this student during that semester is approximately within which of the following hours of 5 hours. (a) 0.1 (b) 0.2 (c) 2 (d) 1

Central Limit Theorem (cont d) 7/28/2006

Central Limit Theorem (cont d) 7/28/2006 Central Limit Theorem (cont d) 7/28/2006 Central Limit Theorem for Binomial Distributions Theorem. For the binomial distribution b(n, p, j) we have lim npq b(n, p, np + x npq ) = φ(x), n where φ(x) is

More information

Chapter 7. Sampling Distributions and the Central Limit Theorem

Chapter 7. Sampling Distributions and the Central Limit Theorem Chapter 7. Sampling Distributions and the Central Limit Theorem 1 Introduction 2 Sampling Distributions related to the normal distribution 3 The central limit theorem 4 The normal approximation to binomial

More information

Chapter 7. Sampling Distributions and the Central Limit Theorem

Chapter 7. Sampling Distributions and the Central Limit Theorem Chapter 7. Sampling Distributions and the Central Limit Theorem 1 Introduction 2 Sampling Distributions related to the normal distribution 3 The central limit theorem 4 The normal approximation to binomial

More information

Elementary Statistics Lecture 5

Elementary Statistics Lecture 5 Elementary Statistics Lecture 5 Sampling Distributions Chong Ma Department of Statistics University of South Carolina Chong Ma (Statistics, USC) STAT 201 Elementary Statistics 1 / 24 Outline 1 Introduction

More information

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are Chapter 7 presents the beginning of inferential statistics. Concept: Inferential Statistics The two major activities of inferential statistics are 1 to use sample data to estimate values of population

More information

Business Statistics 41000: Probability 4

Business Statistics 41000: Probability 4 Business Statistics 41000: Probability 4 Drew D. Creal University of Chicago, Booth School of Business February 14 and 15, 2014 1 Class information Drew D. Creal Email: Office:

More information

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same. Chapter 14 : Statistical Inference 1 Chapter 14 : Introduction to Statistical Inference Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same. Data x

More information

Sampling and sampling distribution

Sampling and sampling distribution Sampling and sampling distribution September 12, 2017 STAT 101 Class 5 Slide 1 Outline of Topics 1 Sampling 2 Sampling distribution of a mean 3 Sampling distribution of a proportion STAT 101 Class 5 Slide

More information

AP Statistics: Chapter 8, lesson 2: Estimating a population proportion

AP Statistics: Chapter 8, lesson 2: Estimating a population proportion Activity 1: Which way will the Hershey s kiss land? When you toss a Hershey Kiss, it sometimes lands flat and sometimes lands on its side. What proportion of tosses will land flat? Each group of four selects

More information

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10. IEOR 3106: Introduction to OR: Stochastic Models Fall 2013, Professor Whitt Class Lecture Notes: Tuesday, September 10. The Central Limit Theorem and Stock Prices 1. The Central Limit Theorem (CLT See

More information

5.3 Statistics and Their Distributions

5.3 Statistics and Their Distributions Chapter 5 Joint Probability Distributions and Random Samples Instructor: Lingsong Zhang 1 Statistics and Their Distributions 5.3 Statistics and Their Distributions Statistics and Their Distributions Consider

More information

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4 AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Summer 2014 1 / 26 Sampling Distributions!!!!!!

More information

Module 4: Probability

Module 4: Probability Module 4: Probability 1 / 22 Probability concepts in statistical inference Probability is a way of quantifying uncertainty associated with random events and is the basis for statistical inference. Inference

More information

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate

More information

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics Unit 5: Sampling Distributions of Statistics Statistics 571: Statistical Methods Ramón V. León 6/12/2004 Unit 5 - Stat 571 - Ramon V. Leon 1 Definitions and Key Concepts A sample statistic used to estimate

More information

Chapter 3 Discrete Random Variables and Probability Distributions

Chapter 3 Discrete Random Variables and Probability Distributions Chapter 3 Discrete Random Variables and Probability Distributions Part 3: Special Discrete Random Variable Distributions Section 3.5 Discrete Uniform Section 3.6 Bernoulli and Binomial Others sections

More information

Chapter 7: Point Estimation and Sampling Distributions

Chapter 7: Point Estimation and Sampling Distributions Chapter 7: Point Estimation and Sampling Distributions Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 20 Motivation In chapter 3, we learned

More information

Binomial Random Variables. Binomial Random Variables

Binomial Random Variables. Binomial Random Variables Bernoulli Trials Definition A Bernoulli trial is a random experiment in which there are only two possible outcomes - success and failure. 1 Tossing a coin and considering heads as success and tails as

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Review of previous lecture: Why confidence intervals? Data Analysis and Statistical Methods Statistics 651 Suhasini Subba Rao Suppose you want to know the

More information

Confidence Intervals Introduction

Confidence Intervals Introduction Confidence Intervals Introduction A point estimate provides no information about the precision and reliability of estimation. For example, the sample mean X is a point estimate of the population mean μ

More information

Lecture 9. Probability Distributions. Outline. Outline

Lecture 9. Probability Distributions. Outline. Outline Outline Lecture 9 Probability Distributions 6-1 Introduction 6- Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7- Properties of the Normal Distribution

More information

Lecture 9. Probability Distributions

Lecture 9. Probability Distributions Lecture 9 Probability Distributions Outline 6-1 Introduction 6-2 Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7-2 Properties of the Normal Distribution

More information

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon. Chapter 14: random variables p394 A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon. Consider the experiment of tossing a coin. Define a random variable

More information

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41 STA258H5 Al Nosedal and Alison Weir Winter 2017 Al Nosedal and Alison Weir STA258H5 Winter 2017 1 / 41 NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION. Al Nosedal and Alison Weir STA258H5 Winter 2017

More information

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4 7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4 - Would the correlation between x and y in the table above be positive or negative? The correlation is negative. -

More information

Part V - Chance Variability

Part V - Chance Variability Part V - Chance Variability Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 1 / 78 Law of Averages In Chapter 13 we discussed the Kerrich coin-tossing experiment.

More information


CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS A random variable is the description of the outcome of an experiment in words. The verbal description of a random variable tells you how to find or calculate

More information

Stat 213: Intro to Statistics 9 Central Limit Theorem

Stat 213: Intro to Statistics 9 Central Limit Theorem 1 Stat 213: Intro to Statistics 9 Central Limit Theorem H. Kim Fall 2007 2 unknown parameters Example: A pollster is sure that the responses to his agree/disagree questions will follow a binomial distribution,

More information

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43 chapter 13: Binomial Distribution ch13-links binom-tossing-4-coins binom-coin-example ch13 image Exercises (binomial)13.6, 13.12, 13.22, 13.43 CHAPTER 13: Binomial Distributions The Basic Practice of Statistics

More information

BIO5312 Biostatistics Lecture 5: Estimations

BIO5312 Biostatistics Lecture 5: Estimations BIO5312 Biostatistics Lecture 5: Estimations Yujin Chung September 27th, 2016 Fall 2016 Yujin Chung Lec5: Estimations Fall 2016 1/34 Recap Yujin Chung Lec5: Estimations Fall 2016 2/34 Today s lecture and

More information

Midterm Exam III Review

Midterm Exam III Review Midterm Exam III Review Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Midterm Exam III Review 1 / 25 Permutations and Combinations ORDER In order to count the number of possible ways

More information

E509A: Principle of Biostatistics. GY Zou

E509A: Principle of Biostatistics. GY Zou E509A: Principle of Biostatistics (Week 2: Probability and Distributions) GY Zou Reporting of continuous data If approximately symmetric, use mean (SD), e.g., Antibody titers ranged from

More information

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution. MA 5 Lecture - Mean and Standard Deviation for the Binomial Distribution Friday, September 9, 07 Objectives: Mean and standard deviation for the binomial distribution.. Mean and Standard Deviation of the

More information

Section The Sampling Distribution of a Sample Mean

Section The Sampling Distribution of a Sample Mean Section 5.2 - The Sampling Distribution of a Sample Mean Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin The Sampling Distribution of a Sample Mean Example: Quality control check of light

More information

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017 Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017 Please fill out the attendance sheet! Suggestions Box: Feedback and suggestions are important to the

More information

4.2 Bernoulli Trials and Binomial Distributions

4.2 Bernoulli Trials and Binomial Distributions Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 4.2 Bernoulli Trials and Binomial Distributions A Bernoulli trial 1 is an experiment with exactly two outcomes: Success and

More information


AMS7: WEEK 4. CLASS 3 AMS7: WEEK 4. CLASS 3 Sampling distributions and estimators. Central Limit Theorem Normal Approximation to the Binomial Distribution Friday April 24th, 2015 Sampling distributions and estimators REMEMBER:

More information

Engineering Statistics ECIV 2305

Engineering Statistics ECIV 2305 Engineering Statistics ECIV 2305 Section 5.3 Approximating Distributions with the Normal Distribution Introduction A very useful property of the normal distribution is that it provides good approximations

More information

CH 5 Normal Probability Distributions Properties of the Normal Distribution

CH 5 Normal Probability Distributions Properties of the Normal Distribution Properties of the Normal Distribution Example A friend that is always late. Let X represent the amount of minutes that pass from the moment you are suppose to meet your friend until the moment your friend

More information

Section Distributions of Random Variables

Section Distributions of Random Variables Section 8.1 - Distributions of Random Variables Definition: A random variable is a rule that assigns a number to each outcome of an experiment. Example 1: Suppose we toss a coin three times. Then we could

More information

The Bernoulli distribution

The Bernoulli distribution This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

The binomial distribution p314

The binomial distribution p314 The binomial distribution p314 Example: A biased coin (P(H) = p = 0.6) ) is tossed 5 times. Let X be the number of H s. Fine P(X = 2). This X is a binomial r. v. The binomial setting p314 1. There are

More information

Chapter 9 Chapter Friday, June 4 th

Chapter 9 Chapter Friday, June 4 th Chapter 9 Chapter 10 Sections 9.1 9.5 and 10.1 10.5 Friday, June 4 th Parameter and Statisticti ti Parameter is a number that is a summary characteristic of a population Statistic, is a number that is

More information

Part 10: The Binomial Distribution

Part 10: The Binomial Distribution Part 10: The Binomial Distribution The binomial distribution is an important example of a probability distribution for a discrete random variable. It has wide ranging applications. One readily available

More information

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise. Math 224 Q Exam 3A Fall 217 Tues Dec 12 Version A Problem 1. Let X be the continuous random variable defined by the following pdf: { 1 x/2 when x 2, f(x) otherwise. (a) Compute the mean µ E[X]. E[X] x

More information

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20.

Sampling. Marc H. Mehlman University of New Haven. Marc Mehlman (University of New Haven) Sampling 1 / 20. Sampling Marc H. Mehlman University of New Haven (University of New Haven) Sampling 1 / 20 Table of Contents 1 Sampling Distributions 2 Central Limit Theorem 3 Binomial Distribution

More information

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations MLLunsford 1 Activity: Central Limit Theorem Theory and Computations Concepts: The Central Limit Theorem; computations using the Central Limit Theorem. Prerequisites: The student should be familiar with

More information

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the mean, use the CLT for the mean. If you are being asked to

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics College of Education School of Continuing and

More information

We use probability distributions to represent the distribution of a discrete random variable.

We use probability distributions to represent the distribution of a discrete random variable. Now we focus on discrete random variables. We will look at these in general, including calculating the mean and standard deviation. Then we will look more in depth at binomial random variables which are

More information


Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES Essential Question How can I determine whether the conditions for using binomial random variables are met? Binomial Settings When the

More information

The normal distribution is a theoretical model derived mathematically and not empirically.

The normal distribution is a theoretical model derived mathematically and not empirically. Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 4 Random Variables & Probability Distributions Content 1. Two Types of Random Variables 2. Probability Distributions for Discrete Random Variables 3. The Binomial

More information


7 THE CENTRAL LIMIT THEOREM CHAPTER 7 THE CENTRAL LIMIT THEOREM 373 7 THE CENTRAL LIMIT THEOREM Figure 7.1 If you want to figure out the distribution of the change people carry in their pockets, using the central limit theorem and

More information

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean. Lecture 3 Sampling distributions. Counts, Proportions, and sample mean. Statistical Inference: Uses data and summary statistics (mean, variances, proportions, slopes) to draw conclusions about a population

More information

Statistical Methods in Practice STAT/MATH 3379

Statistical Methods in Practice STAT/MATH 3379 Statistical Methods in Practice STAT/MATH 3379 Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics Sam Houston State University Overview 6.1 Discrete

More information


TOPIC: PROBABILITY DISTRIBUTIONS TOPIC: PROBABILITY DISTRIBUTIONS There are two types of random variables: A Discrete random variable can take on only specified, distinct values. A Continuous random variable can take on any value within

More information

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables Chapter 5 Continuous Random Variables and Probability Distributions 5.1 Continuous Random Variables 1 2CHAPTER 5. CONTINUOUS RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS Probability Distributions Probability

More information

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution The Central Limit Theorem Sec. 8.1: The Random Variable it s Distribution Sec. 8.2: The Random Variable it s Distribution X p and and How Should You Think of a Random Variable? Imagine a bag with numbers

More information

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1 Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide

More information

Chapter 5. Statistical inference for Parametric Models

Chapter 5. Statistical inference for Parametric Models Chapter 5. Statistical inference for Parametric Models Outline Overview Parameter estimation Method of moments How good are method of moments estimates? Interval estimation Statistical Inference for Parametric

More information

8.1 Estimation of the Mean and Proportion

8.1 Estimation of the Mean and Proportion 8.1 Estimation of the Mean and Proportion Statistical inference enables us to make judgments about a population on the basis of sample information. The mean, standard deviation, and proportions of a population

More information

MATH 3200 Exam 3 Dr. Syring

MATH 3200 Exam 3 Dr. Syring . Suppose n eligible voters are polled (randomly sampled) from a population of size N. The poll asks voters whether they support or do not support increasing local taxes to fund public parks. Let M be

More information

STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions

STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions 5/31/11 Lecture 14 1 Statistic & Its Sampling Distribution

More information

Confidence Intervals and Sample Size

Confidence Intervals and Sample Size Confidence Intervals and Sample Size Chapter 6 shows us how we can use the Central Limit Theorem (CLT) to 1. estimate a population parameter (such as the mean or proportion) using a sample, and. determine

More information

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Chapter 3 - Lecture 5 The Binomial Probability Distribution Chapter 3 - Lecture 5 The Binomial Probability October 12th, 2009 Experiment Examples Moments and moment generating function of a Binomial Random Variable Outline Experiment Examples A binomial experiment

More information

Probability. An intro for calculus students P= Figure 1: A normal integral

Probability. An intro for calculus students P= Figure 1: A normal integral Probability An intro for calculus students. P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided

More information

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values. MA 5 Lecture 4 - Expected Values Wednesday, October 4, 27 Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the

More information

The Two-Sample Independent Sample t Test

The Two-Sample Independent Sample t Test Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 The General Formula The Equal-n Formula 4 5 6 Independence Normality Homogeneity of Variances 7 Non-Normality Unequal

More information

Section Sampling Distributions for Counts and Proportions

Section Sampling Distributions for Counts and Proportions Section 5.1 - Sampling Distributions for Counts and Proportions Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Distributions When dealing with inference procedures, there are two different

More information

Chapter 6 Probability

Chapter 6 Probability Chapter 6 Probability Learning Objectives 1. Simulate simple experiments and compute empirical probabilities. 2. Compute both theoretical and empirical probabilities. 3. Apply the rules of probability

More information

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi Chapter 4: Commonly Used Distributions Statistics for Engineers and Scientists Fourth Edition William Navidi 2014 by Education. This is proprietary material solely for authorized instructor use. Not authorized

More information

STAT 201 Chapter 6. Distribution

STAT 201 Chapter 6. Distribution STAT 201 Chapter 6 Distribution 1 Random Variable We know variable Random Variable: a numerical measurement of the outcome of a random phenomena Capital letter refer to the random variable Lower case letters

More information

Chapter 5: Statistical Inference (in General)

Chapter 5: Statistical Inference (in General) Chapter 5: Statistical Inference (in General) Shiwen Shen University of South Carolina 2016 Fall Section 003 1 / 17 Motivation In chapter 3, we learn the discrete probability distributions, including Bernoulli,

More information

4 Random Variables and Distributions

4 Random Variables and Distributions 4 Random Variables and Distributions Random variables A random variable assigns each outcome in a sample space. e.g. called a realization of that variable to Note: We ll usually denote a random variable

More information


INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9 INF5830 015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 3, 1.9 Today: More statistics Binomial distribution Continuous random variables/distributions Normal distribution Sampling and sampling

More information

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Probability is the tool used for anticipating what the distribution of data should look like under a given model. AP Statistics NAME: Exam Review: Strand 3: Anticipating Patterns Date: Block: III. Anticipating Patterns: Exploring random phenomena using probability and simulation (20%-30%) Probability is the tool used

More information

Statistics and Probability

Statistics and Probability Statistics and Probability Continuous RVs (Normal); Confidence Intervals Outline Continuous random variables Normal distribution CLT Point estimation Confidence intervals

More information

Chapter 6 Confidence Intervals

Chapter 6 Confidence Intervals Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) VOCABULARY: Point Estimate A value for a parameter. The most point estimate of the population parameter is the

More information

Commonly Used Distributions

Commonly Used Distributions Chapter 4: Commonly Used Distributions 1 Introduction Statistical inference involves drawing a sample from a population and analyzing the sample data to learn about the population. We often have some knowledge

More information

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44

HOMEWORK: Due Mon 11/8, Chapter 9: #15, 25, 37, 44 This week: Chapter 9 (will do 9.6 to 9.8 later, with Chap. 11) Understanding Sampling Distributions: Statistics as Random Variables ANNOUNCEMENTS: Shandong Min will give the lecture on Friday. See website

More information

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom Review for Final Exam 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom THANK YOU!!!! JON!! PETER!! RUTHI!! ERIKA!! ALL OF YOU!!!! Probability Counting Sets Inclusion-exclusion principle Rule of product

More information

Binomial and Normal Distributions

Binomial and Normal Distributions Binomial and Normal Distributions Bernoulli Trials A Bernoulli trial is a random experiment with 2 special properties: The result of a Bernoulli trial is binary. Examples: Heads vs. Tails, Healthy vs.

More information

Statistics 13 Elementary Statistics

Statistics 13 Elementary Statistics Statistics 13 Elementary Statistics Summer Session I 2012 Lecture Notes 5: Estimation with Confidence intervals 1 Our goal is to estimate the value of an unknown population parameter, such as a population

More information

Random Variables Handout. Xavier Vilà

Random Variables Handout. Xavier Vilà Random Variables Handout Xavier Vilà Course 2004-2005 1 Discrete Random Variables. 1.1 Introduction 1.1.1 Definition of Random Variable A random variable X is a function that maps each possible outcome

More information

Chapter 9: Sampling Distributions

Chapter 9: Sampling Distributions Chapter 9: Sampling Distributions 9. Introduction This chapter connects the material in Chapters 4 through 8 (numerical descriptive statistics, sampling, and probability distributions, in particular) with

More information


SAMPLING DISTRIBUTIONS. Chapter 7 SAMPLING DISTRIBUTIONS Chapter 7 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution Statistic and Parameter Statistic numerical summary of sample data: p-hat or xbar Parameter

More information

Using the Central Limit

Using the Central Limit Using the Central Limit Theorem By: OpenStaxCollege It is important for you to understand when to use the central limit theorem. If you are being asked to find the probability of the mean, use the clt

More information

Chapter 8: Binomial and Geometric Distributions

Chapter 8: Binomial and Geometric Distributions Chapter 8: Binomial and Geometric Distributions Section 8.1 Binomial Distributions The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Section 8.1 Binomial Distribution Learning Objectives

More information


CHAPTER 5 SAMPLING DISTRIBUTIONS CHAPTER 5 SAMPLING DISTRIBUTIONS Sampling Variability. We will visualize our data as a random sample from the population with unknown parameter μ. Our sample mean Ȳ is intended to estimate population mean

More information

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal Econ 6900: Statistical Problems Instructor: Yogesh Uppal Email: Lecture Slides 4 Random Variables Probability Distributions Discrete Distributions Discrete Uniform Probability Distribution

More information

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by

Normal distribution. We say that a random variable X follows the normal distribution if the probability density function of X is given by Normal distribution The normal distribution is the most important distribution. It describes well the distribution of random variables that arise in practice, such as the heights or weights of people,

More information

χ 2 distributions and confidence intervals for population variance

χ 2 distributions and confidence intervals for population variance χ 2 distributions and confidence intervals for population variance Let Z be a standard Normal random variable, i.e., Z N(0, 1). Define Y = Z 2. Y is a non-negative random variable. Its distribution is

More information

Math : Spring 2008

Math : Spring 2008 Math 1070-2: Spring 2008 Lecture 7 Davar Khoshnevisan Department of Mathematics University of Utah davar February 27, 2008 An example A WHO study of health: In Canada, the systolic

More information

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Random Variables Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc. 8.1 What is a Random Variable? Random Variable: assigns a number to each outcome of a random circumstance, or,

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 05 Normal Distribution So far we have looked at discrete distributions

More information

STA 220H1F LEC0201. Week 7: More Probability: Discrete Random Variables

STA 220H1F LEC0201. Week 7: More Probability: Discrete Random Variables STA 220H1F LEC0201 Week 7: More Probability: Discrete Random Variables Recall: A sample space for a random experiment is the set of all possible outcomes of the experiment. Random Variables A random variable

More information

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution.

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution. Chapter 5 Solutions 51 (a) n = 1500 (the sample size) (b) The Yes count seems like the most reasonable choice, but either count is defensible (c) X = 525 (or X = 975) (d) ˆp = 525 1500 = 035 (or ˆp = 975

More information

Homework: (Due Wed) Chapter 10: #5, 22, 42

Homework: (Due Wed) Chapter 10: #5, 22, 42 Announcements: Discussion today is review for midterm, no credit. You may attend more than one discussion section. Bring 2 sheets of notes and calculator to midterm. We will provide Scantron form. Homework:

More information

STAT Chapter 7: Central Limit Theorem

STAT Chapter 7: Central Limit Theorem STAT 251 - Chapter 7: Central Limit Theorem In this chapter we will introduce the most important theorem in statistics; the central limit theorem. What have we seen so far? First, we saw that for an i.i.d

More information