Section 15.0: The Normal Distribution

Size: px

Start display at page:

Download "Section 15.0: The Normal Distribution"

Hollie West
5 years ago
Views:

1 Section 15.0: The Normal Distribution The Normal distribution is the most widely recognized of all probability distributions. It is a continuous distribution, which means its graph has no gaps. The shape of its graph is classically described as Bellshaped. It is symmetric around the mean with long tails. It was originally discovered in the early part of the 18 th century by the French mathematician, De Moivre and independently rediscovered in the 19 th century by the German mathematician, Gauss. The distribution is described by a complex formula that cannot be used analytically to answer probability questions. Instead, all probability questions are answered by inserting values into a calculator or spreadsheet. The Normal Distribution is the most important probability distribution that is used in statistics. This is due to a number of desirable mathematical properties that Normal Distribution has. Statisticians have been able to solve many problems in statistics only by assuming a Normal Distribution and using its mathematical properties. This is because there are many variables, such as intelligence or height, that when plotted appear similar to it. There might be an infinite number of variables of interest that have this characteristic. However, the Normal Distribution is only a model. There are, in fact, no variables that completely match it. Furthermore, there are many other models, such as exponential growth (e.g., bacteria) or exponential decay (e.g., radioactivity), each also having possibly an infinite number of important variables that when plotted appear similar to them as well. The existence of other models has a major impact on the consequences of assuming the Normal Distribution if it is not reasonable to do so. The Normal Distribution arises naturally when the random variable is the sum or mean of several of other independent random variables. The distribution has two independent parameters: the mean μ (mu) and the standard deviation σ (sigma). Sometimes the variance is used instead of the standard deviation. In this chapter, the graphing calculator is the primary tool used to calculate probabilities North Carolina State University Chapter 15 Page 1

2 Section 15.1: Cutting Fabric for Parachutes Skydiving equipment has advanced considerably over the last several years. Round parachutes are seldom seen these days and have been replaced by modern, rectangular ram-air canopies that have better directional control and offer softer landings. Ram-air canopies are made of a series of inflatable tubes or cells, connected side-by-side along their length. Each cell is designed to form the cross section of an airfoil, so when the parachute inflates, it forms a wing-shaped canopy, ready for flight. The front of each cell is open to the air, and the back is sewn closed. Once inflated, the ram-air canopy is a semi-rigid, rectangular plane, similar to an airplane wing. Figure shows a ram-air canopy parachute with seven cells. Figure : A ram-air parachute The fabric most often used to make parachutes is called rip stop nylon. It is commercially available in 20-meter rolls that are 150 centimeters wide. Depending on the final dimensions of the parachute, the size of each cell will vary as will the length of fabric that must be cut. Regardless of the size, the fabric must be cut with great precision. If the pieces are cut too small, the material cannot be used. If they are cut too large, the material can be used, but some of the fabric will be wasted and may need to be re-cut down to size Sampling the Cut Pieces Sky s the Limit is a small company that makes parachutes. Twin brothers Tom and Ken Speedy work in the cutting department at Sky s the Limit. They cut the large panels of fabric used in making the parachutes. Ray Monitor, the production manager at Sky s the Limit, is studying Tom s work and Ken s work to determine if one or both of them should receive a pay raise. Tom and Ken are cutting pieces of fabric that are supposed to be 50 cm wide and 2 m long. Administrators at Sky s the Limit established that the margin of error for cutting the fabric is 5 millimeters. This means that the fabric can still be used if it is within 5 mm of 50 cm wide and within 5 mm of 2 m long. Q1. What is the acceptable range of widths of fabric (in centimeters)? Q2. What is the acceptable range of lengths of fabric (in meters)? Ray wants to decide if one, neither, or both of the Speedy brothers should get a pay raise. He is basing his decision on the quality of the work they do. He wants to know how often Tom s and Ken s cut pieces of fabric are within specifications: 50 cm wide ± 5 mm and 2 m long ± 5 mm. However, Ray cannot measure 2013 North Carolina State University Chapter 15 Page 2

3 every single piece of fabric they cut because they cut many every day. Instead, Ray takes a random sample of 10 pieces of Tom s work and 10 pieces of Ken s work, to see if they are within specifications. The measurements for all cut pieces of fabric would make up the population of cut pieces of fabric. Since it is usually impossible to collect data for an entire population, a random sample is often used instead. A sample refers to a portion of the population. By choosing the sample randomly, Ray expects that the sample pieces of data are representative of the pieces of data for the entire population. That is, Ray can use the sample data to make predictions and gain a better understanding of the population data. When a sample is representative of the population, the sample data closely match the characteristics of the population data. This helps Ray to generalize his results from the small sample to the large population. Ray takes random samples of 10 pieces of Tom s work and 10 pieces of Ken s work, to see if they are within the given specifications (2 m long ± 5 mm and 50 cm wide ± 5 mm). The length and width of each piece of work that was sampled appear in Table Worker Tom Ken Sample Length (m) Width (cm) Length (m) Width (cm) Table : Lengths and widths of Tom s and Ken s samples measured to the nearest mm Q3. Based on the data presented in Table , would you recommend that Tom or Ken should receive a raise? Explain your reasoning. Ray Monitor decides he needs to compute some statistics to help him make sense of the data. A statistic is a fact or a piece of data, usually calculated from a sample. The first statistics Ray decides to calculate are the means of the lengths of the cloth pieces in each of the two samples. The Mean of a Sample The mean is one way to describe the center of a set of data. The mean of a sample is just the arithmetic average: Add each of the sampled items and divide by the number of items. Note: If the elements of a data set are represented by x i and there are n elements in the set, the usual notation for the mean of the data set is: n i 1 where the term is used to represent the mean of a sample. n 2013 North Carolina State University Chapter 15 Page 3

Q4. Compute the mean of the lengths of each worker s sample. What do you observe? Ray looks at the two means and does not find this information very helpful.

Ray realizes that Tom s lengths varied between 1.996 and 2.004; while Ken s varied between 1.999 and 2.002. Ray uses a statistic called the range to express this realization mathematically.

4 Q4. Compute the mean of the lengths of each worker s sample. What do you observe? Ray looks at the two means and does not find this information very helpful. He decides to look at the data a more closely. Q5. Do the two sets of lengths look different to you? If so, how are they different? If not, what similarities did you see? Ray realizes that Tom s lengths varied between and 2.004; while Ken s varied between and Ray uses a statistic called the range to express this realization mathematically. The Range of a Sample The range is one way to measure how spread-out the individual data points are. To find the range, subtract the smallest number from the largest number in the data set. Q6. Compute the range of the lengths for each worker s sample. What do you observe? At this point, Ray is thinking that maybe Ken is more deserving of a pay raise than Tom. He wonders what he will observe about the width data. Q7. Compute the mean and the range for each worker s sample of widths. What do you observe? Ray is unsure what to do with this information. He sees similarities and differences in the two data sets. Q8. Do you see any differences in the two width data sets? If yes, how are they different? If not, what similarities do they have? Ray decides that the range was not helpful for the width data. He needs another way to measure the spread of the data. He decides to use a statistic called the standard deviation. Note: For the remainder of his analysis, Ray only considers the sample widths to make his decision. The Standard Deviation of a Sample The standard deviation is another way to measure how spread-out the individual data points are. The symbol S x is used to represent standard deviation of a sample. The formula for S x is: 1 This formula is explained in more detail below. The standard deviation uses the difference, or deviation, of each data point from the mean to measure the spread of the data. To find the standard deviation, complete the following steps North Carolina State University Chapter 15 Page 4

5 Step 1: Calculate the difference between each data point and the mean For each data point, x i, compute the difference. For example, using Tom s data for width of pieces, the data points refer to each individual width collected by Ray. Thus, the first step would be as follows: x x 1 = = -0.3 x x 2 = = -0.3 x x 3 = = -0.2 x x 4 = = -0.3 x x 5 = = 0.3 x x 6 = = 0.2 x x 7 = = 0.3 x x 8 = = -0.2 x x 9 = = 0.2 x x 10 = = 0.3 Step 2: Square each difference Next, finding an average difference would make sense. However, some of the differences are positive, and some are negative, so if they were just added together, the positives and negatives would cancel each other. To avoid that, each of the differences is squared. This is the typical way for mathematicians to address an issue like this. For example, using Tom s width data, the second step would be as follows: (x x 1 ) 2 = ( ) 2 = (-0.3) 2 = 0.09 (x x 2 ) 2 = ( ) 2 = (-0.3) 2 = 0.09 (x x 3 ) 2 = ( ) 2 = (-0.2) 2 = 0.04 (x x 4 ) 2 = ( ) 2 = (-0.3) 2 = 0.09 (x x 5 ) 2 = ( ) 2 = (0.3) 2 = 0.09 (x x 6 ) 2 = ( ) 2 = (0.2) 2 = 0.04 (x x 7 ) 2 = ( ) 2 = (0.3) 2 = 0.09 (x x 8 ) 2 = ( ) 2 = (-0.2) 2 = 0.04 (x x 9 ) 2 = ( ) 2 = (0.2) 2 = 0.04 (x x 10 ) 2 = ( ) 2 = (0.3) 2 = 0.09 Step 3: Sum these squares Next, the squared differences are added. For Tom s width data, this sum would be: = = 0.7 Q9. Why does adding the squared differences avoid the problem of positives and negatives canceling each other? Step 4: Divide this sum by n 1, where n is the number of points in the data set Next, divide the sum from Step 3 by n 1. In this example, n is 10 because there were 10 pieces of fabric in the sample. Therefore, continuing with Tom s width data set, this step yields: 2013 North Carolina State University Chapter 15 Page 5

6 Step 5: Take the square root of the results Lastly, take the square root of this quotient. In this example, this would be: This step is important because it allows the units of measurement to be consistent. In this example, the width of the fabric is measured in cm. Ray would like the standard deviation also to be measured in cm. Taking the square root allows the units of the standard deviation to be the same as the units of the points in the data set. These steps together give the formula for standard deviation: 1 The standard deviation tells Ray how spread-out the data points are. It measures the typical amount each data point deviates from the mean. Therefore, the more spread-out the data points are, the greater the standard deviation. Q10. Using this process, find the standard deviation for Ken s width data. What do you observe about the two standard deviations? Why do you think that happened? Q11. Based on Tom s and Ken s standard deviations, should Tom, Ken, both, or neither get a pay raise? Explain your reasoning. Sometimes it is useful to work with a statistic called the variance. The variance is simply the square of the standard deviation. Therefore, it is denoted by S x 2. Notice that the square root of the variance is the standard deviation, which is as it should be: S 2 x S x n i x x i n 1 Q12. Calculate the variances of Tom s and Ken s width samples. Recall that Ray wanted to know the mean, standard deviation, and variance for the widths of all of the panels cut. That information would not be available unless every panel had been measured after it was cut. That would have been very costly. Instead, Ray decides to use the means and standard deviations from the samples that were measured. He reasons that these statistics are probably good estimates of the actual mean and standard deviation. In other words, Ray cannot know the mean or the standard deviation of the widths of all of the panels Tom and Ken have ever cut (i.e., the population of cut panels). However, he does know the mean and the standard deviation of the sample that they cut North Carolina State University Chapter 15 Page 6

7 Sample vs. Population Different representations are used to represent population and sample means, variances, and standard deviations. These are detailed in Table Ray can use the sample statistics, x and S x, as estimates of the actual parameters, and x, of Tom s and Ken s widths. Sample Population Mean x μ Variance S 2 x n i x x i n 1 Standard Deviation Sx n i 1 x x i 2 n 1 σ 2 x n i μ x i n σ x n i 1 μ x i 2 n Table : Sample statistics and population parameters Ray notices also the variance is always computed as part of the computation of the standard deviation. He wonders, Why, then, are there two different statistics? One reason is that the units of measure for the variance might not be meaningful, and even when they are meaningful, they are not appropriate to the problem context. For example, in the width example, the units in which the widths were measured are centimeters. However, the variance squares differences that are measured in centimeters. The units of the variance are square centimeters. Centimeters are used to measure distances, as in the width example, but square centimeters are used to measure area. It is not appropriate to use a unit of area to measure variation in a distance. There is another reason why there are two different statistics, the variance and the standard deviation. The standard deviation, which is the more meaningful statistic, lacks an important property that the less meaningful statistic, the variance, has. This property is explored in Appendix A. This Appendix also explains how to find several statistics using the graphing calculator Hiring a New Worker Sky s the Limit has been quite successful lately. Sales have increased, and they need to hire a new worker for the cutting department. They want this new worker to cut with precision and consistency. Ray Monitor has been tasked with hiring this new worker. Five potential workers have been interviewed for the job. These potential workers were asked to cut several pieces of fabric. Ray will use these samples to decide which worker to hire. Ray decides to compare the samples from the potential workers to the samples from the current workers (including Tom and Ken Speedy), who he knows are precise and consistent. However, he is unsure how to proceed. He asks his brother, a mathematician named Jacob, to help him with the next steps. There are currently 20 workers in the cutting department. Jacob asks Ray to collect five samples from each worker. Then, rather than consider all 100 collected samples, Jacob finds the mean for each worker, based on the five collected samples North Carolina State University Chapter 15 Page 7

8 Jacob explains to Ray that by using the means, they can assume the data are Normally Distributed. He says this is due to something called The Central Limit Theorem. When data are Normally Distributed, this allows them to make assumptions, predictions, and conclusions that they would not typically be able to make. Therefore, this is a very useful property. Ray is convinced by his brother s argument. So he sets out to collect data from the 20 workers and finds their means. For example, Worker A provided the sample widths shown in Table Sample Width (cm) Table : Sample widths provided by worker A Then, the mean of Worker A s sample would be: Ray repeats this process for all 20 workers to obtain a total of 20 means. These means are shown in Table Mean Sample Widths (cm) Worker A Worker K Worker B Worker L Worker C Worker M Worker D Worker N Worker E Worker O 50 Worker F Worker P Worker G Worker Q Worker H Worker R Worker I Worker S Worker J Worker T Table : Mean sample widths for all 20 workers Jacob wants to provide Ray a picture of the data. He starts by taking the means shown in Table and creating a histogram. He creates bins based on the data. For example, he knows his bins need to range from approximately 49.6 to approximately 50.3 (based on the range of means in Table ). He also knows that in order to gain a good picture of the data, the bins should be neither too small nor too large. His bins are listed in Table Jacob also needs to show how tall the bars of the histogram are as well. Rather than simply counting the number of means that fit into each bin, Jacob uses relative frequencies. To find a relative frequency, Jacob divides the count by the total number of means (in this case, there are 20 means). Table begins this process. For example, there is only one mean that fits in the 49.5 T 49.7 bin (Worker S). Therefore, the relative frequency of a mean fitting into this bin is 1 20 = North Carolina State University Chapter 15 Page 8

9 Bin Total Count Relative Frequency 49.5 T = < T = < T < T < T 50.5 Total 20 1 Table : Calculating the relative frequency for each bin Q13. Complete Table by counting the number of means (from Table ) and calculating the relative frequency for each bin. Q14. Create a histogram by drawing bars on Figure The heights of each bar should be the relative frequencies found in Q13. Figure : Creating a histogram for the relative frequencies of the workers mean sample widths 2013 North Carolina State University Chapter 15 Page 9

10 Q15. Describe the general shape of this histogram. Then, Jacob points out the area of the bars to Ray. He notes that the width of each bar is one unit, and the height of each bar is the relative frequency of a panel width being within the respective bin. For example, the height of the first bar (the 49.5 T 49.7 bin) is 0.05 because the relative frequency of a panel width being within 49.5 and 49.7 cm is Therefore, the area of this bar is: Area = (width of bar)(height of bar) = (1)(0.05) = 0.05 Q16. Using this same logic, calculate the areas of the remaining bars of the histogram. Q17. Calculate the sum of all the areas of the bars (don t forget to include the first area). Why does this make sense? Now that Ray has a basic idea of what these means look like on a histogram, Jacob introduces him to the Normal Distribution curve. The shape of the curve is similar to the shape of the histogram, but it is continuous and smooth. To draw the curve, Jacob uses his graphing calculator. He instructs Ray to complete the following steps. Step 1: Identify the mean and standard deviation In this case, the mean of the data listed in Table is 50. The standard deviation is Step 2: Input the normalpdf function in the Y= menu Jacob presses the Y= button on his calculator. Then, he accesses the distribution menu by pressing 2 nd and then VARS. In this menu, Ray chooses 1:normalpdf(, as shown in Figure Figure : Choosing the normalpdf function on the graphing calculator After pressing ENTER, Jacob is brought back to the Y= screen. Now, he types in the syntax for the normalpdf function: normalpdf(x, x, S x ) The graph for the mean width data has the syntax shown in Figure Figure : Graphing the Normal Distribution curve for the mean width data 2013 North Carolina State University Chapter 15 Page 10

11 Step 3: Set up the appropriate window settings Before pressing Graph, Jacob changes his window settings. To do so, he presses the Window button. Then, based on what he knows about the Normal Distribution, he chooses the settings shown in Figure Figure : Window settings for the Normal Distribution curve for the mean width data Step 4: Press Graph and analyze as appropriate Finally, Jacob presses Graph and sees the graph shown in Figure a. He uses the Pen function in the Draw menu ( 2 nd then PRGM ) to draw the vertical line along the mean, as shown in Figure b. (Note: to clear this vertical line, go back to the Draw menu and choose 1:ClrDraw ). Figure a: Graph of the Normal Distribution curve for the mean width data Figure b: Graph of the Normal Distribution curve with a vertical line along the mean Q18. What proportion of the area under the bell-shaped curve and above the x-axis appears to be to the left of the vertical line? What proportion appears to be to the right? Q19. What do you think is the probability that a value of x is greater than 50 cm? What do you think is the probability that a value of x is less than 50 cm? Q20. Based on your response to Q17, what do you think the area under this curve is? How do you know? Now that Ray has spent some time thinking about the Normal Distribution curve, he comes back to his original problem. He is hiring a new worker, and he needs to see how the potential workers compare to his current workers. Recall that Ray collected data from five potential workers. He asked each potential worker to cut five pieces of fabric. Then, he found the mean for each potential worker. These means are listed in Table North Carolina State University Chapter 15 Page 11

12 Mean Sample Widths (cm) Potential Worker Potential Worker Potential Worker Potential Worker Potential Worker Table : Mean sample widths for the five potential workers Now, Jacob instructs Ray to locate each of these means on the Normal Distribution curve shown in Figure a. This will allow Ray to see how these potential workers compare to the current workers because the curve was created based on the current workers data. Jacob again uses his graphing calculator. While on the main screen, he chooses the Vertical option from the Draw menu. Then, he types in the mean sample width for Potential Worker 1. This is shown in Figure Figure : Graphing at vertical line at x = (the sample mean for Potential Worker 1) When Jacob hits Enter, a vertical line is drawn on the Normal Distribution curve (created previously). This is shown in Figure Figure : Graph of the Normal Distribution curve with a vertical line at x = Q21. Consider the location of this vertical line. a. How do you think Potential Worker 1 compares to the current workers in the cutting department? b. Would you recommend that Ray hire this worker? Why or why not? Q22. Graph vertical lines on the curve for the remaining four potential workers. (Remember: to clear previous drawings, choose ClrDraw from the Draw menu.) Q23. Based on these graphs, which potential worker should Ray hire? Explain your reasoning North Carolina State University Chapter 15 Page 12

13 Ray feels confident that he knows who the best worker will be. However, he remembers from his previous analysis with Tom and Ken that he also needs to consider the spread of the data. Therefore, he calculates the standard deviation for each potential worker, shown in Table Mean Sample Widths (cm) Standard Deviations Potential Worker Potential Worker Potential Worker Potential Worker Potential Worker Table : Mean sample widths for the five potential workers Q24. Graph the Normal Distribution curve for each of the potential workers using the normalpdf function on the graphing calculator. Q25. Describe how these five graphs differ from one another. Q26. Based on these graphs, which potential worker should Ray hire? Explain your reasoning North Carolina State University Chapter 15 Page 13

14 Section 15.2: Automobile Battery Warranties Alex Murphy is a senior at Chesterfield High School in Chesterfield Township, Michigan. In a case of what he likes to call Murphy s Law, the battery in his recently purchased used car just died and he needs to replace it as soon as possible. He intends to keep this car through his senior year of high school as well as through his four years of college, so a reliable car battery with a high-quality warranty is essential. After doing some research, Alex considers purchasing a car battery that costs $110 and includes a 36- month full replacement warranty with an additional 6-month pro-rated warranty. This means that if the battery fails after only 36 months or less, he will get all his money back. If the battery fails between 36 and 42 months, he will only get part of his money back. If the battery fails after 42 months or more, he will not get any money back. He lets T represent the time, in months, from the purchase of the car battery until it fails. Then, he creates Table , which shows the replacement values for the warranty period. Time, in months, from the purchase of the car battery that it fails Replacement Value (% of original purchase price) T < T < T < T 42 0 Table : Replacement values for battery failure within warranty period Replacement Value (amount of money he will get back) Q1. The car battery costs $110. Complete Table to determine how much money Alex will get back if his car battery fails within each time frame Using a Normal Distribution to Find the Probability of Battery Failure After doing some additional research, Alex found an article in an automotive magazine with some data about battery life for the same battery that he is considering purchasing. The magazine collected data on a sample of 50 such batteries. They analyzed their data and found that the battery life is Normally Distributed with a mean of 48 months and a standard deviation of 6 months. By assuming the battery life is Normally Distributed, Alex can make many predictions about when the battery may fail. Q2. In your own words, interpret the statement, the battery life is Normally Distributed with a mean of 48 months and a standard deviation of 6 months. Alex decides to look at the graph of this Normal Distribution curve. He does this on his graphing calculator. Figure shows the calculator screens for the steps he takes. Note: these steps are explained in greater detail in Section North Carolina State University Chapter 15 Page 14

15 (a) (b) (c) Figure : Steps to graphing a Normal Distribution on a graphing calculator The most challenging step in graphing the Normal Distribution on the graphing calculator is setting up an appropriate viewing window. In order to set the Xmin and Xmax, Alex reasons that the mean should be in the middle, half-way between the Xmin and Xmax. He decides to set the x-axis from 18 months below the mean (30 months) to 18 months above the mean (66 months). He also thinks that it makes sense to scale the x-axis using the standard deviation, which is 6 months. His real challenge comes in setting-up the y-axis. Fortunately, he remembers learning that the area between the graph of a Normal Distribution curve and the x-axis is always 1. Since his x-values were fairly large, he decides to use fairly small values for y. The viewing window Alex uses and his graph of this Normal Distribution curve are given in Figure (a) (b) Figure : Alex s viewing window and graph of the Normal Distribution curve with μ = 48 and σ = 6 Q3. What are some features that you notice about the graph of this Normal Distribution curve? The Battery Failing Between 36 and 39 Months of Purchase Alex now wants to find the probability that this battery will fail within the ranges given in the terms of the battery warranty (see Table ). He is not too concerned about the probability of the battery failing within the first 36 months. In that case, he would get 100% of his money back. Instead, he considers the first range in the pro-rated warranty period. If the car battery fails between 36 and 39 months of his purchase, Alex would be refunded 75% of the original purchase price. Alex wants to know the probability that the battery will fail between 36 and 39 months of purchase. That is, he wants to find the total probabilities of the battery failing at 36 months, 39 months, or any time in between. Figure shows a picture of this North Carolina State University Chapter 15 Page 15

16 Figure : A graph with the shaded region showing the probabilities of interest This pictures helps Alex visualize the probability he is looking for. He knows that the area (i.e., the total probability) under the curve is 1. He is interested in the proportion of the area that is between 36 and 39 months. Q4. Based on the picture shown in Figure , do you think it is likely that the battery will fail between 36 and 39 months? Why or why not? Now, Alex actually finds the probability of the battery failing between 36 and 39 months. He does so using his graphing calculator. There are two methods for finding this probability on the graphing calculator. The first method is to use the normalcdf function on the calculator. This function, known as the Normal Cumulative Density Function, calculates the cumulative probability between two values. Since Alex is interested in finding the probability between 36 and 39 months, this function makes sense. The syntax for this function is: normalcdf(lower limit, upper limit, mean, standard deviation) Recall that for this particular car battery, Alex assumes that the mean is 48 months and the standard deviation is 6 months. Figure shows the calculator screenshot for this example. The lower limit is 36 months, and the upper limit is 39 months. Figure : Calculating a probability using the Normal Cumulative Density Function Q5. Use the normalcdf function to calculate the probability that the battery will fail between 36 and 39 months of purchase North Carolina State University Chapter 15 Page 16

17 Q6. On average, out of every 100 of these batteries that are made, how many fail between 36 and 39 months after purchase? How many out of every 1,000? The second way to calculate this probability on the graphing calculator is done using the graph of the Normal Distribution curve. This is done by first choosing the ShadeNorm function. This function is found by pressing DISTR (2 nd VARS) and scrolling over to the DRAW menu, as shown in Figure Figure : Choosing the ShadeNorm function Then, the following syntax is used: ShadeNorm(lower limit, upper limit, mean, standard deviation) This is shown in Figure Figure : Calculating a probability using the ShadeNorm function Q7. Use the ShadeNorm function to calculate the probability that the battery will fail between 36 and 39 months of purchase. Describe what happens when you use this function. Q8. Based on the probability that the battery will fail between 36 and 39 months of purchase, would you recommend Alex purchase this battery? Explain your reasoning. The Battery Failing After 42 Months of Purchase Next, Alex decides to calculate the probability that the battery fails beyond the warranty period. Since he would not get any money back if this happens, he is very interested in the likelihood that the battery will fail after 42 months. He wants to use the normalcdf command, but he needs to know both the lower and upper limits of that range. The lower limit is 42 months, but he does not initially know what the upper limit should be. Figure shows the area under the curve that shows the probability of interest North Carolina State University Chapter 15 Page 17

18 Figure : The area showing the probability that the battery will fail after 42 months of purchase Alex realizes that, in theory, the upper limit for the range of months when he would not get any money back is infinite. Since there is no infinity button on the calculator, he wonders how to tell the calculator that the upper limit is without bound. So, Alex uses the largest number he can possibly tell the calculator. In scientific notation, the largest number the calculator can interpret is , which is usually written as 1E99. Alex accesses the EE command by pressing 2 nd comma. The command is shown in Figure Figure : Alex s use of scientific notation to represent infinity Q9. What is the probability that the car battery will fail beyond the warranty period? Note: Alex does not need to use an absurdly large number. With any Normal Distribution, there is less than a one in ten thousand chance of observing a value that is more than four standard deviations above the mean (this idea will be discussed in more detail later in this section). Q10. What value do you obtain when you multiply the standard deviation by 4 and add it to the mean? Input this value as the upper limit and compare your answer to that of the previous question. Q11. Use the ShadeNorm function on the calculator to find the probability that the battery will fail beyond the warranty period. (Recall: To clear previous drawings, choose ClrDraw from the Draw menu.) a. What was your lower limit? b. What was your upper limit? c. Did you get the same answer as when you used the normalcdf function? 2013 North Carolina State University Chapter 15 Page 18

19 d. Which method do you prefer (ShadeNorm or normalcdf)? Explain your reasoning. Q12. Use either technique (ShadeNorm or normalcdf) to find the probabilities for all the ranges in the warranty period. Record your answers in Table Time, in months, from the purchase of the car battery that it fails Probability of battery failure at time T T 36 P(T 36) 36 < T 39 P(36 < T 39) 39 < T < 42 P(39 < T < 42) T 42 P(T 42) Table : Recording the probabilities of battery failure at time T Q13. Do your values make sense? Why or why not? Q14. What happens when you add all the probabilities together? Why? Q15. What is the probability that the battery will last to the average? That is, find P(T > 48). Q16. Since Alex intends to keep his car for 5 years, what is the probability that the battery will fail before 60 months has passed? Q17. Do you think he will need to buy another car battery before 5 years have passed? Explain your reasoning. Since the average life of the car battery is 48 months and Alex intends to keep this car for 5 years, he is concerned about what may happen outside of the warranty period. Q18. What is the probability that the battery will fail after the warranty period ends? Q19. What is the probability that the battery will fail before the warranty period ends? Q20. Based on these probabilities, do you think this battery is the right choice for Alex? Why or why not? Expected Value of the Warranty EverLast, Inc. is the company that produces the battery Alex is considering. The president of EverLast, Inc., Caroline Myers, is interested in determining how much money the company is paying, on average, for the car battery warranty. She can use the probabilities (from the previous section) to calculate the expected value of the warranty. Caroline has experience with the Normal Distribution. She has already calculated the probabilities found in Table Since she knows the probabilities for each range, she can calculate the expected value of the warranty. Recall that the expected value of a random variable is the long-term average value. So, the expected value of a battery warranty is the long-term average amount of money that the company pays out when a battery fails. It is calculated by multiplying each probability by the amount of money the company would pay out and then adding them all together North Carolina State University Chapter 15 Page 19

20 Q21. What is the expected value of the car battery warranty? (Hint: use the replacement values from Q1 and the probabilities from Q12.) Q22. From the company s perspective, what does that value mean? Note: the expected value can also be thought of in terms of the customer. However, a customer will only purchase one car battery, in which case the notion of expected value referring to the long run average does not really make sense. For example, in the long run, Alex still only has the one car battery that either fails during the warranty period or fails outside the warranty period. On the other hand, the company that sells the battery would be interested in the expected value simply because of the large number of car batteries they have sold. From the company s perspective, they would want to have the expected value of the warranty as small as possible. The warranty requires them to pay in the event of their battery s failure. Q23. Suggest changes to the warranty policy to reduce the expected value of the warranty cost to less than $10? Q24. What happens to the expected value of the warranty cost if the company can improve the life of the battery so it lasts, on average: a. 49 months? b. 50 months? The Oldest Batteries Suppose the federal government is trying to decrease the number of older, more hazardous car batteries in the public. They have developed a new way to recycle and reuse toxic batteries. To remove some of these older batteries from the public, they now require car battery companies to offer exchange vouchers to the customers with the oldest batteries. Specifically, the federal government will offer exchange vouchers to those with batteries in the top 10% age range. They choose to look at the top 10% age range for all batteries, rather than a certain age (e.g., 10 years old), because the life of a battery depends on the quality of the battery, which varies by company and type. Caroline Myers, the president of EverLast, Inc., needs to determine at what age of the battery s life customers would qualify for this exchange program. That is, she needs to know at what age there is only a 10% chance that the battery is still alive. In other words, at what point in the battery s life will only 10% of them still be alive? Q25. How could the company contact those people with older batteries? Caroline needs to know the age at which there is only a 10% chance of surpassing. Another way of phrasing this is: Caroline needs to know the age at which 90% of the batteries will have failed before reaching this age. That is, if she wants to find the age at which only 10% of the batteries are still alive, she can instead find the age at which 90% of the batteries have failed. Mathematically, Caroline is looking for the value of t such that P(T > t ) = 0.1 and P(T < t ) = 0.9. This is shown graphically in Figure North Carolina State University Chapter 15 Page 20

Figure 15.2.9: A graphical representation of P(T < t ) = 0.9 and P(T > t ) = 0.1 Caroline knows the probability of interest, but does not know the value of t that will give this probability.

When given the probability, she can use her calculator to find the corresponding value on the normal curve. To do this, she uses the distribution menu from her graphing calculator.

21 Figure : A graphical representation of P(T < t ) = 0.9 and P(T > t ) = 0.1 Caroline knows the probability of interest, but does not know the value of t that will give this probability. This is the reverse of what was done earlier. Caroline now needs to invert the Normal Distribution. When given the probability, she can use her calculator to find the corresponding value on the normal curve. To do this, she uses the distribution menu from her graphing calculator. Since she is inverting the Normal Distribution, she uses the inverse normal function. The command is option 3, invnorm(, and the parameters are the probability, the mean, and the standard deviation. The syntax is as follows: invnorm(probability, mean, standard deviation) Note: when using the inverse normal function, the calculator only reads probabilities (or areas) from left to right. Therefore, in this example, Caroline Myers needs to use 0.90 as the probability rather than Figure shows the calculator screens. Figure : Using the invnorm command Q26. There a 90% chance that the battery will fail before how many months? Q27. Suppose the federal government instead offers vouchers to the 5% oldest batteries. There a 95% chance that the battery will fail before how many months? Q28. Suppose the federal government instead offers vouchers to the 1% oldest batteries. There a 99% chance that the battery will fail before how many months? 2013 North Carolina State University Chapter 15 Page 21

22 The Empirical Rule Returning to the original problem, Alex is trying to determine whether he should purchase this particular car battery made by EverLast, Inc. He makes a list of everything he knows at this point: The battery s life follows the Normal Distribution, with an average battery life of 48 months and a standard deviation of 6 months. When given two months, he can calculate the probability the battery will fail between the two months using the Normal Distribution. When given a probability P(T < t), he can calculate the month, t, at which the battery fails using the inverse normal function on his calculator. The last thing Alex wants to know is between which two months the battery is most likely to fail. One of the important features of any Normal Distribution is that the probabilities of being ±1, ±2, ±3 or ±4 standard deviations from the mean are well-known. Those probabilities are given in Table , where (mu) is the mean and (sigma) is the standard deviation. Domain of x Probability X % 2 X % 3 X % 4 X % Table : Probabilities based on standard deviations in a Normal Distribution Thus, 0.68 is a good estimate of the probability of being within 1 standard deviation from the mean, 0.95 is a good estimate of the probability of being within 2 standard deviations from the mean, and 0.99 is a good estimate of the probability being within 3 standard deviations. The table also shows that there is very little chance of being 4 standard deviations or more from the mean. Therefore, nearly all values lie within 3 standard deviations from the mean. This is an important fact known as the empirical rule. Figure shows a graph of the Normal Distribution including these probabilities for the car battery problem North Carolina State University Chapter 15 Page 22

23 Figure : Probabilities based on standard deviations in a Normal Distribution Q29. Interpret each of the probabilities in Table or Figure in terms of the problem context. Earlier in this section, Alex used the value 4 standard deviations above the mean as an upper limit rather than choosing an extremely large number (like ). He stated that there was less than a one in ten thousand chance of observing a value that is more than four standard deviations above the mean. The empirical rule shows this in more detail because there is a 99.99% chance of a value being within 4 standard deviations from the mean. Alex recalls that the Normal Distribution curve is symmetric about the mean. This fact, together with the probabilities in Table , can be used to determine that the probability of a value, x, being between the mean and 1 standard deviation above the mean. This range has a probability of about In terms of the problem context, there is approximately a 34% chance that the battery will fail between 48 and 54 months. This is shown in Figure North Carolina State University Chapter 15 Page 23

24 Figure : The probability that a value is between and + is approximately 34.13% Q30. What is the approximate probability of a value, x, being between the mean and 2 standard deviations above the mean? Interpret this question in terms of the car battery problem context. Q31. What is the approximate probability of x being between the mean and 2 standard deviations below the mean? Interpret this question in terms of the car battery problem context. Q32. What is the approximate probability of x being between 1 and 2 standard deviations above the mean? Interpret this question in terms of the car battery problem context. Q33. What is the approximate probability of x being between 1 and 2 standard deviations below the mean? Interpret this question in terms of the car battery problem context. Q34. What is the approximate probability of x being between 2 and 3 standard deviations below the mean? Interpret this question in terms of the car battery problem context. Alex can also use the empirical rule to calculate probabilities at the ends of the curve. For example, he can find the probability that the battery will fail before 36 months have passed. The probability that the battery fails between 42 and 54 months is 68.26%. Thus, the probability that the battery fails outside this range is = However, this value includes the probability that the battery fails before 42 months have passed as well as the probability that the battery fails after 54 months. Since the normal curve is symmetric about the mean, Alex can simply divide in half to determine these two pieces. This is shown in Figure North Carolina State University Chapter 15 Page 24

Figure 15.2.12: The probability that the battery fails before 42 months is approximately 15.87% Q35.

b. What is the probability that the battery fails after 66 months have passed? c. What is the probability that the battery fails between 36 and 54 months? Q36.

25 Figure : The probability that the battery fails before 42 months is approximately 15.87% Q35. Use the empirical rule and the fact that the area under the curve is 1 to complete the following. a. What is the probability that the battery fails before 36 months have passed? b. What is the probability that the battery fails after 66 months have passed? c. What is the probability that the battery fails between 36 and 54 months? Q36. Fill in the blank probabilities of using the empirical rule Figure : Calculating probabilities using the empirical rule 2013 North Carolina State University Chapter 15 Page 25

26 Note: A common error in this situation involves subtracting in the incorrect order. But there is an easy way to tell if the order is wrong. If the values are subtracted in the wrong order, the result is negative. Q37. How can you tell that a negative number cannot be the correct answer to a question regarding probability (or area)? 2013 North Carolina State University Chapter 15 Page 26

27 Section 15.3: Rappin Skoop Dogg Seasonal Demand The Pineapple Technology Corp. has developed a new item just in time for the holiday season. Rappin Skoop Dogg is an upright stuffed hound dog, dressed in a leather-like jacket and pants costume. It sings and dances while standing on a replica of a stage and holding a microphone. There are ten different rap songs that Rappin Skoop Dogg can sing as well as dance to. Additional songs can be downloaded from the Pineapple Technology Corp. web site. There are three settings for activating the dog. First, Rappin Skoop Dogg has a motion detector and can be set to activate when anyone approaches. Second, it can be activated by clapping your hands. Third, it can be activated by pushing a button. Chillioz Toys, a toy store chain, is planning on placing a large order in anticipation of the holiday season. The Pineapple Technology Corp. charges $45 per item for purchases of more than 10,000 units. Chillioz Toys plans to price Rappin Skoop Dogg at $99. However, items left over after the holiday season will be steeply discounted to $30. There is an inventory carrying cost of $2 per item left over after the holiday season ends. These data are summarized in Table Variable Description Amount P Purchase Price $45 S Selling Price $99 D After-Christmas Selling Price $30 H Inventory Carry-Over Cost $2 Table : Chillioz Toys data regarding Rappin Skoop Dogg Therefore, the net profit on a regular pre-holiday sale is: Net Profit = $99 $45 = $54 The net cost at the end of the season for leftover products is: Net Cost = $ = -$17 Based on past experience with similar products, executives at Chillioz Toy know there is high potential, but also a great deal of uncertainty. Due to their extensive experience, they can assume that the demand for this item is Normally Distributed with a mean of 21,000 and a standard deviation of 4,200 (one-fifth of the mean). Q1. Sketch a Normal Distribution curve for the demand of the Rappin Skoop Dogg toy. Roy Charles, the Vice President for marketing at Chillioz Toys, is interested in analyzing the impact of different ordering policies. He starts by wondering how much to order so that there is only a 40% chance that the demand will be less than this number. He therefore wants to determine the value of x such that: P(X < x) = 0.40 Q2. Use the invnorm function on the graphing calculator to find the value of x such that P(X < x) = Interpret this value in terms of the problem context. Q3. Use the normalcdf function on the graphing calculator to verify your answer to Q2 by doing the problem in reverse. (Hint: Choose 4200 as your lower limit because it is 4 standard deviations below the mean.) 2013 North Carolina State University Chapter 15 Page 27

28 Roy Charles is considering ordering more than the average demand. This time, he wants to find out how many to order so that there is only a 40% chance that the demand will be more than this number. That is, he is looking for the value of x such that: P(X > x) = 0.40 This is shown in Figure % Figure : Determining the value of x such that P(X > x) = 0.40 However, the calculator only provides the left hand tail of the Normal Distribution. He must first apply the principle of the complement: P(X > x) = 1 P(X < x) Q4. If the area to right of the x is going to be 0.4, what should the area to the left have to equal? Q5. Use the inverse normal function to determine how many toys Roy Charles should order. Roy Charles wants to look at the decision from a different point of view. He is hoping to sell as many units as possible and does not want to run out of stock. Q6. Use the normalcdf or the ShadeNorm function on the calculator to determine the following probabilities. Interpret the probability in terms of the problem context for each case. a. P(X > 22,000) b. P(X < 22,000) c. P(20,000 < X < 22,000) d. P(16,800 < X < 25,200) Q7. Consider the empirical rule. a. There is approximately a 68% chance that the demand will be between what two values? (Assume the interval is symmetric about the mean.) b. There is approximately a 95% chance that the demand will be between what two values? (Assume the interval is symmetric about the mean.) c. There is approximately a 99.7% chance that the demand will be between what two values? (Assume the interval is symmetric about the mean.) x 2013 North Carolina State University Chapter 15 Page 28

Continuous Random Variables and the Normal Distribution

Chapter 6 Continuous Random Variables and the Normal Distribution Continuous random variables are used to approximate probabilities where there are many possible outcomes or an infinite number of possible