Properties of the Normal Distribution Example A friend that is always late. Let X represent the amount of minutes that pass from the moment you are suppose to meet your friend until the moment your friend showed up. Suppose your friend is just as likely to arrive at any time between x = 0 and x = 30 minutes late. That is, the random variable X can take any value in the interval [0, 30]. In this case we will say that the random variable X follows a uniformly probability distribution. What is the probability that your friend will arrive exactly 5 minutes late? What is the probability that your friend will arrive between 5 and 10 minutes late? This is what we call probability density functions
Properties of the Normal Distribution Probability Density Functions A probability density function is a function (given as an equation) used to compute probabilities of continuous random variables that satisfy the following: 1 The total area under the graph of the function is equal to 1. the total probability is 1 2 The function f(x) is always positive or zero. probabilities are always greater than or equal to zero how de we use density functions? The area under the graph of a density function over some interval represents the probability of observing a value of the random variable in that interval
Properties of the Normal Distribution mode, median and mode in a density function For any distribution the mode represents the high point The median is the point where 50% of the area of the distribution is on the left.
Properties of the Normal Distribution Graph of a normal curve A continuous random variable has a normal probability distribution if its relative frequency histogram has the shape of a normal curve The normal probability density function is given by f(x) = 1 σ 2π e (x µ)2 2σ 2 In a normal distribution: mode=mean=median The points µ + σ and µ σ are the inflection points of the graph
Properties of the Normal Distribution varying µ and σ Different values of µ shifts the graph left/right. Different values of σ stretches the graph up/down.
Properties of the Normal Distribution Properties of the normal curve 1 It is symmetric about its mean. 2 the highest point occurs at x = µ 3 It has inflection points at x = µ σ and at x = µ + σ 4 The area under the curve is 1 5 The area under the curve to the left of µ is 1 2 6 It has the x-axis as a horizontal asymptote. 7 It follows the empirical rule
Properties of the Normal Distribution The empirical rule
Properties of the Normal Distribution histograms vs density functions Frequently histograms are bell shaped We can approximate these histograms with normal curves In this case the normal curve is close to the histogram, so it is a good approximation. We use the probability distribution function to model the situation we are studying.
Properties of the Normal Distribution Using a normal distribution Assume that the distribution of giraffe weights is modeled with a normal distribution with µ = 2200 pounds and σ = 200 pounds What does the area under the curve left of 2100 represent? It is the proportion of giraffes that weigh 2100 pounds or less.
Properties of the Normal Distribution Standard normal Random variable We have tables for computing areas under normal curves. If we needed a different table for each value of µ and σ this would not be manageable. We use the Standard normal random distribution: The Standard Normal Distribution is the one with µ = 0 σ = 1 Any x-value can be transformed into a z-score by using the formula z = x µ σ Suppose X is a random variable with normal distribution that is not standard (µ X 0, σ X 1) and we want to compute the area to the left of a certain value x. We first convert the value x to the corresponding value in the standard normal distribution z, and find the area to the left of this value under this curve. For now, we will concentrate on finding areas under the Standard Normal Distribution.
Properties of the Normal Distribution Calculating the area to the left of z 0 Find the area under the standard normal curve that lies to the left of z = 1.68
Properties of the Normal Distribution Calculating the area to the left of z 0 Find the area under the standard normal curve that lies to the left of z = 1.68
Properties of the Normal Distribution example...continued A picture: 0.9535 0 1.68 z
Properties of the Normal Distribution example... Find the area under the standard normal curve that lies to the left of Z = 2.53 using a table We are finding the area before z = 2.53, that is, we are finding the area under the curve in the interval [, 2.53] There is no way we can tell the calculator to do this. There is no symbol in the calculator. The approach here is to tell the calculator to approximate the value in the interval [ M, 2.53], where M is a very large number. The biggest number the calculator the can handle is 9 10 99, so we use the interval [ 9 10 99, 2.53] We do this by typing normalcdf(-9e99, 2.53, 0, 1)
Properties of the Normal Distribution Calculating the area to the right of Z Sometimes we will need to find the area under the standard normal curve that lies to the left of a value z. For this we use (area to the right of z ) = 1 (area to the left of z) example Find the area under the standard normal curve to the right of z =.45
Properties of the Normal Distribution Calculating the area between two z-values Example To find the area between two z-values z 0 and z 1, with z 0 < z 1 we do (area between z 0 and z 1 ) = (area left of z 1 ) (area left of z 0 ) Find the area under the standard normal curve between the values z 0 =.25 and z 1 =.75.
Properties of the Normal Distribution Homework 5.1 3, 5, 9, 15, 17, 21, 23, 31, 35, 39, 41, 45, 47, 53, 55, 59 42
5.2 Normal Distributions: Finding Probabilities If a random variable x is normally distributed, you can find the probability that x will fall in a given interval by calculating the area under the normal curve for the given interval To find the area under any normal curve we can first convert the upper and lower bound of the interval into z-scores, then use the standard normal distribution to find the area. Normal Distribution Standard Normal Distribution µ = 500 σ = 100 µ = 0 σ = 1 P(x < 600) P(z < 1) µ =500 600 x µ = 0 1 z Same Area P(x < 500) = P(z < 1)
5.2 Normal Distributions: Finding Probabilities Example IQ scores follow a normal distribution with µ = 102 and σ = 13. What is the proportion of individuals with an IQ scores of 95 or less? 1 Find the z-score: z = x µ σ 95 102 = = 0.54 13 2 Find P(z < 0.54) normalcdf(-10000, -0.54) = 0.295 3 Interpret the result: 29 % of individuals who take an IQ test will get 95 points or less. without obtaining the z-scores Your calculator can find the probability for any normal distribution, so you don t have to convert to z-scores The input is in this example: normalcdf(-10000, x, 102, 13) =.295 normalcdf(-10000, x, µ, σ)
Homework 5.2 Homework 5.2 1, 3, 5, 7, 9, 13, 15, 17, 19, 21, 23, 25, 29 22
5.3 Finding values Finding z-scores Finding z-scores We will work with the standard normal distribution for now. Suppose we want to approach the reverse problem: we want to know for what value of z is the area to the left of z equal to 0.90. That is, we would like to know for what value of z do we have P(x < z) = 0.90 Using a table. Using your calculator invnorm(0.9)=1.28
5.3 Finding values Finding z-scores percentiles With this same method we can find what z-scores correspond to a given percentile in the standard normal distribution. For example, to find the z-score corresponding to the 25th percentile in a standard normal distribution we would perform: invnorm(.25)=-.6749
5.3 Finding values Finding z-scores from z-scores to x-values We have a formula to convert x-scores to z-values: z = x µ σ If we solve this formula for x we obtain x = µ + zσ Example: IQ scores follow a normal distribution with µ = 102 and σ = 13. Find the IQ scores corresponding toz-scores of z = 1, z = 2.5 for z = 1: x = 102 + 1(13) = 115 for z = 2.5 : x = 102 + 2.5(13) = 134.5
5.3 Finding values Finding z-scores Example IQ scores follow a normal distribution with µ = 102 and σ = 13. What is the lowest score you should obtain in order to be in the top 10% of individuals who take the IQ exam? We are looking for the 90th percentile we found this already (for the standard normal distribution), the z-score was 1.28. To find x we use the formula: x = 102 + 1.28(13) = 118.64 With your calculator: you dont need to find z-scores. You can write directly: invnorm(0.9, 102, 13)= 118.66
5.3 Finding values Finding z-scores Homework 5.3 1, 5, 7, 15, 25, 27, 31, 35, 37, 39, 41, 43, 45 40
5.4 Distribution of the Sample mean Statistical Inference In many situations we cannot take a census for the entire population we are considering. So we take a sample What can we say about the population with our data? In particular, what is the relationship between the sample mean ( x) and the population mean (µ)? What is the relationship between the sample standard deviation (s) and the population standard deviation (σ)? This is what we call Statistical Inference: using information from a sample to draw conclusions about a population
5.4 Distribution of the Sample mean Estimate the Population Mean How do we estimate the population mean µ, using the sample mean x? The main idea is to compute several sample means x 1, x 2, x 3,... etc. How? By performing a series of random samples. Sample 1: calculate x 1 Sample 2: calculate x 2 Sample 3: calculate x 3. Each time we obtain a random sample, we will get a distinct value for x (once we pick a particular sample, it should not be used a second time). x is a random variable
5.4 Distribution of the Sample mean x as a random variable sample mean as a random variable Because x is a random variable: x has a mean x has a standard deviation x has a probability distribution This is called the sampling distribution of sample means
5.4 Distribution of the Sample mean x as a random variable in-class example Consider the set of data 7, 10, 11, 12 We will consider samples of size 3 How many possible samples can we form? 4C 3 = 4 the samples are: {7, 10, 11}, {7, 10, 12}, {7, 11, 12}, {10, 11, 12} If we calculate the mean of each sample we obtain: sample sample mean, x i {7, 10, 11} 9.33 {7, 10, 12} 9.67 {7, 11, 12} 10 {10, 11, 12} 11 Now lets find the mean of the x i s: 9.33 + 9.67 + 10 + 11 µ x = = 40 4 4 = 10 Compare the result with the population mean µ = 7 + 10 + 11 + 12 4 = 10
5.4 Distribution of the Sample mean x as a random variable all possible samples of size n? example In practice, we cannot obtain all possible samples of size n of a population. If fact, we would be working more than necessary: if we have the data for the entire population, there is no need to work with samples. This is the theory behind why we can use samples to predict the value of the mean of the population. We want to find the mean age of students in this class. Say we want to use samples of size 4 How many samples of size 4 are possible? We are not going to do that! Lets take a subset of all possible samples...
5.4 Distribution of the Sample mean x as a random variable Mean and Standard Deviation of the sampling distribution Suppose that a simple random sample of size n is drawn from a large population that follows a normal distribution with mean µ and standard deviation σ. The mean of the sampling distribution of x will be equal to the population mean, that is µ x = µ. The standard deviation of the sampling distribution of x is called the standard error of the mean and σ x = σ n
5.4 Distribution of the Sample mean x as a random variable example A bank auditor claims that credit card balances are normally distributed, with a mean of $2870 and a standard deviation of $900. If 25 cardholders are selected randomly, what is the probability that their mean credit card balance is less than $2500? We know the sampling distribution follows a normal distribution with µ x = µ = 2870 σ x = σ = 900 25 5 = 180 we need to find the area to the left of 2500 under the normal distribution with µ = 2870 and σ = 180. we know how to do this normalcdf(-10000, 2500, 2870, 180)=0.0197 there is a 1.97 % chance that the mean of a sample of 25 cardholders is less than $2500 Using z-scores z = x µ x 2500 2870 = = 2.06 σ x 180
5.4 Distribution of the Sample mean The Central Limit Theorem The shape of the Sampling distribution of x for X normal example If we know (somehow) that the random variable X has a normal distribution with µ = 20 σ = 12 1 If we use samples of size n = 4, then the sampling distribution of x is a normal distribution with µ x = 20 σ x = 6 2 If we use samples of size n = 9, then the sampling distribution of x is a normal distribution with µ x = 20 σ x = 4
5.4 Distribution of the Sample mean The Central Limit Theorem what if the population is not normal? If the population does not follow a normal distribution this does not apply... at least not exactly The Central Limit Theorem 1 Regardless of the shape of the population, the sampling distribution of x becomes approximately normal as the sample size n increases. 2 If the population itself follows a normal distribution, the sampling distribution of x follows a normal distribution for any sample size. In either case, the mean of the sampling distribution of x will be equal to the population mean, that is µ x = µ. The standard deviation of the sampling distribution of x (called the standard error of the mean) and σ x = σ n
5.4 Distribution of the Sample mean how big? how big? How big must n be in order to have the sampling distribution of x behave like a normal distribution? it depends on the shape of X usually for n 30
5.4 Distribution of the Sample mean how big? example note The average number of pounds of red meat a person consumes each year is 196 with a standard deviation of 22 pounds (Source: American Dietetic Association). If a sample of 50 individuals is randomly selected, find the probability that the mean of the sample will be less than 200 pounds Since the size of the sample is greater than 30, the Central Limit Theorem tells us that the sampling distribution of x will follow a normal distribution with µ x = 196 σ x = 22 50 = 3.11 so we are finding the area to the left of 200 under a normal distribution with µ = 196 and σ = 3.11 we know how to do this: normalcdf(-10000, 200, 196, 3.11)= 0.90 if the population does not follow a normal distribution, and the sample size is not greater than 30 we cannot guarantee that the sampling distribution of x will follow a normal distribution.
5.4 Distribution of the Sample mean how big? Homework 9, 11, 13, 15, 19, 25, 27, 29, 31, 33 28
5.5 Normal Approximations to Binomial Distributions Approximating a Binomial Distribution The normal distribution is used to approximate the binomial distribution when it would be impractical to use the binomial distribution to find a probability. For example: 52% of adults say that they prefer coffee over tea in the morning. If you randomly select 60 adults and ask if they prefer coffee or tea in the morning, what is the probability that at least 20 of them will say coffee? Using the Binomial distribution we would need to compute 40 different probabilities.(if we calculate P(20) + P(21) +... + P(60) ) or 20 different probabilities, (if we calculate 1 P(0) + P(1) + P(2)..., +P(19) ) in either case this is a lot of work. recall that in section 4.2 we saw that as the number of trials increases, the histograms of a binomial distribution will get closer and closer to a normal distribution. We will use this to approximate the value of a binomial distribution.
5.5 Normal Approximations to Binomial Distributions Approximating a Binomial Suppose x is a binomial random variable, n is the number of trials, p the probability of success, and q the probability of failure. if np 5 and nq 5 then the binomial random variable is follows approximately a normal distribution with µ = np and σ = npq
5.5 Normal Approximations to Binomial Distributions example Fifty-one percent of adults in the U.S. whose New Years resolution was to exercise more achieved their resolution. You randomly select 65 adults in the U.S. whose resolution was to exercise more and ask each if he or she achieved that resolution since n = 65, p = 0.51, q = 0.49 np = (65)(0.51) = 33.15 5 nq = (65)(0.49) = 31.85 5 we can use a normal distribution as an approximation with µ = np = 65(0.51) = 33.15 and σ = npq = 65(0.51)(0.49) = 4.03
5.5 Normal Approximations to Binomial Distributions correction for continuity Correction for continuity The binomial distribution is discrete, and can be represented by a probability histogram as the number of trials increases, the histogram takes the shape of a normal distribution. we are using a normal distribution to approximate the value of the real distribution (which is binomial) To calculate the exact binomial probabilities, the binomial formula must be used for each value of an interval and all the results added together. geometrically, this corresponds to adding the area bars in the histogram ( with base =1) To have a better approximation we should move 0.5 to each side of the midpoint to include all possible x-values in the interval. This is called a correction for continuity Exact binomial probability Normal approximation P(x = c) P(c 0.5 < x < c + 0.5) c c 0.5 c c+ 0.5
5.5 Normal Approximations to Binomial Distributions correction for continuity example... continued Using the same New Year s resolutions example from before, we know we can use a normal distribution with µ = 33.15 and σ 4.03 to approximate the binomial distribution. If we want to calculate the probability that fewer than 40 of 65 will respond yes, instead of finding P(0) + P(1) +... + P(39) we will find (using the correction for continuity) P(x < 39.5) for a normal distribution with µ = 33.15 and σ = 4.03.
5.5 Normal Approximations to Binomial Distributions Homework Homework 5.5 1, 3, 5, 7, 9, 17, 21, 23, 25, 20