Stat 6 Introduction to Business Statistics I Spring 009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:300:50 a.m. Chapter, Section.3 The Normal Distribution Density Curves So far we have: graphically displayed data: histogram, stemplot, boxplot described the overall pattern and identified deviations and outliers numerically quantified center and spread of the distribution If the distribution (as displayed by the histogram) appears sufficiently regular, we can approximate it with a smooth curve, a so-called density curve. The density curve is simplified and an idealized version of reality, but can still be useful! Example: Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38 Chapter.3 The Density Curve Properties A density curve is a curve that is always on or above the horizontal axis, and has an area of exactly underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that range. gas mileage example from textbook: Examples: Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 3 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38
Median and Mean of a Density Curve..5.6.7.8 5 6 7 8 9 Median: The equal-areas point with 50% of the mass on either side. Mean: The balancing point of the curve, if it were a solid mass Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 5 / 38 3 5 6 7 8 9 0 3 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 6 / 38 Introduction to Normal Distributions the Normal (or Gaussian) distribution is the single most important distribution in Statistics. many variables can be modeled (described) using the Normal distribution, e.g. height of humans SAT scores length of human pregnancies, etc. it is characterized by the following two parameters: Normal Distribution (by Carl Friedrich Gauss (777-855)) the the overall shape: and Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 7 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 8 / 38
pictures of various normal distributions: Notation: to denote the normal distribution we use Example: denotes a normal distribution with mean and standard deviation, while denotes a normal distribution with mean and standard deviation. To denote that a variable (e.g. heights, SAT scores, etc.) follows a normal distribution we write Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 9 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 0 / 38 The 68-95-99.7 Rule holds for all normal distributions (i.e. for any choice of µ and σ) 68-95-99.7 Rule For a variable that follows a have that, we approx. of the data fall within standard deviation of the mean, i.e. within 99.7% 95% 68% approx. of all the data fall within standard deviations of the mean, i.e. within 3% 3% approx. of all the data fall within standard deviations of the mean, i.e. within Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38 3.5% 3.5% 0.5%.35%.35% 0.5% " # 3! " #! " #! " " $! " $! " $ 3! Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38
Example: The length of human pregnancies follows a normal distribution with mean µ = 66 days and a standard deviation of σ = 6 days. How long do the middle 95% of all pregnancies last? The Standard Normal Distribution is a special normal distribution. has a mean and a standard deviation. denoted by. Nearly all the area is between and.!"#$%#&%'()&*#+'%,-"&,./",)$ How long do the shortest 6% of all pregnancies last (at most)? 3 How long do the longest 0.5% of all pregnancies last (at least)? ()*+,-'. $%$ $%# $%" $%! $%&!!!"!# $ # "! ' Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 3 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38 Knowing the mean and the standard deviation of a normal distribution allows us to determine What of individuals fall in a specified range. What a given individual falls at if you know their data value. 3 What data value corresponds to a given. For the standard normal distribution, the proportion of observations falling into a specified range is tabulated. This is the tabulated values. normal distribution for which we have We therefore need to any given normal distribution to a standard normal distribution, i.e. the values from any are transformed to the corresponding values from a. This is called. Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 5 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 6 / 38
standardizing, z-score If x is an observation from a normal distribution that has mean µ and standard deviation σ, the standardized value of x is given by Example: (length of human pregnancies continued) A standardized value is often called a. A z-score tells us how many standard deviations the original observation is off the mean and in which direction. Observations larger than the mean are positive (i.e. have a positive z-score) when standardized, and observations smaller than the mean are negative (i.e. have a negative z-score) when standardized. Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 7 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 8 / 38 Finding z-scores and corresponding proportions/areas under the normal curve Why are z-scores helpful? IQ s follow a normal distribution with mean µ = 00 and standard deviation σ = 6 heights of males follow approx. a normal distribution with mean µ = 70 inches and σ = 3 Who is more unusual? A man being 73 inches tall or a man having an IQ of? Once we know the corresponding z-score of an observation we can look up the overall proportion (percentage) of men in that population having a height of 73 inches or more. need to know how to read Table A (Table of the Standard Normal Distribution) Table A in your textbook Note, in the following the terms proportion, probability, percentage, and area are all interchangeable, i.e. proportion = probability = percentage = area Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 9 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 0 / 38
To find the proportion (corresponding to the area under the normal curve) of observations that fall into a given range, e.g. between -z and z: The first column gives the z-score values correct to one decimal place and the first row gives the second decimal place for a z- score. For example, if we want to find the area below z=., we will find z=. in the first column, then look for z=0.0 along the first row. Where the corresponding row and column intersect gives the value 0.05. Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38 using table a to find proportions under the normal curve consider the following situations: What proportion of observations is below z =.67, i.e. what is the probability of observing a z-score of.67 or less? What proportion of observations is greater than z =.67? What proportion is less than z =.00 and greater than z =.00? What proportion is below z =.67? Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 3 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 / 38
What is the area between z = and z =? What z-score does the 30 th percentile correspond to? What proportion is between z = 0.96 and z =.33? What z-scores bound the middle 60%? Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 5 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 6 / 38 Applications of the Normal Distribution State the problem, i.e. state the mean µ, the standard deviation σ and the value of the observation x standardize x, i.e. find the corresponding z-score using z = x µ σ 3 draw picture, i.e. locate z-score under normal curve and shade area of interest Applications of the Normal Distribution Example: male heights N(70, 3) What proportion of men is shorter than 7 inches? What proportion of men is taller than 65 inches? 3 What proportion of men is taller than 73 inches? use Table A to find the shaded area Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 7 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 8 / 38
What proportion of men has an IQ of or more? (IQ N(00, 6)) Backwards Calculations we can also work backwards given a certain percentile (or proportion), what is the corresponding value of x? Example: Heights N(70, 3) What value does the 50 th percentile of men s height correspond to? What value does the 0 th percentile of men s height correspond to? Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 9 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 30 / 38 In general, to do backward calculations use the following formula x = z σ + µ What value does the 85 th percentile correspond to? Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 3 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 3 / 38
Assessing Normality of Data Based on experience and/or past data the assumption of normality might be justified In general it is quite risky though to assume normality without looking at the data and verifying normality Normally distributed data allow the application of further statistical procedures which enable us to learn more about the data and also to further derive additional information about the variable we are interested in. (We will learn about such procedures in Chapters 6&7) If data are not normally distributed and we still apply statistical procedures that require the assumption of normality, derived information can be wrong and misleading. Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 33 / 38 How to assess Normality Histogram/stemplot or boxplot: reveal non-normal features, such as skewness multiple models outliers If the above graphical displays appear somewhat normal, i.e. they indicate a symmetric, unimodal, bell-shaped distribution we can use a so-called normal quantile plot. Normal quantile plots are a more sensitive tool allowing us to take a closer look to judge the adequacy of normality. Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 3 / 38 Normal quantile plots: Observations from a standard normal distribution for various sample sizes hard to construct by hand (use JMP) n=50 n=00 for main idea see pages 67 & 68 of the textbook If distribution is close to a normal distribution, the plots points in a normal quantile plot will lie close to a straight line. Some Caution: Real data almost always show some departure from normality (i.e. from a perfect normal distribution). It is important to restrict the examination of a normal quantile plot to searching for clear departures from normality. We can ignore minor wiggles in the plot most common methods will work well as long as the data are reasonably close to a normal distribution with no extreme outliers. Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 35 / 38 9 3.50 0.0.00-3 -3 0 3 9 3.50 0.0.00-3 -3 0 3 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 36 / 38
small sample sizes n=0 n=5 Observations from a skewed right and a triangular distribution 9.50 3 0 9.50 3 0 9.50 3 0 9.50 3 0.0.0.0.0.00-3.00-3.00-3.00-3 0 0 3 0 3 5 6 7 0...3..5.6.7.8.9 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 37 / 38 Stat 6 (Spring 009, Section A) Introduction to Business Statistics I Section.3 38 / 38