Quantitative Methods 2013 Continuous Distributions 1 The most important probability distribution in statistics is the normal distribution. Carl Friedrich Gauss (1777 1855) Normal curve A normal distribution is a continuous probability distribution for a random variable,. The graph of a normal distribution is called the normal curve. 2
Normal distributions have many convenient properties, so random values with unknown distributions are often assumed to be normal. Many common quantities such as test scores, height, etc., follow roughly normal distributions, with few members at the high and low ends and many in the middle. 3 Properties of a Normal Distribution 1. The mean and median are equal. Normal curve mean = median 2. The normal curve is bell shaped and symmetric about the mean. 4
Properties of a Normal Distribution Normal curve 3. The total area under the curve is equal to one. 4. The normal curve approaches, but never touches the ais as it etends further and further away from the mean. 5 Properties of a Normal Distribution Total area = 1 μ 3σ μ 2σ μ σ μ μ + σ μ+ 2σ μ+ 3σ If is a continuous random variable having a normal distribution with mean μ and standard deviation σ, you can graph a normal curve with the equation y = 1 e σ 2π 2 2 -( -µ) 2σ e = the mathematical constant approimated by 2.71828 π = the mathematical constant approimated by 3.14159. 6
Means and Standard Deviations A normal distribution can have any mean and any positive standard deviation. 1 2 3 4 5 6 Mean: µ = 3.5 Standard deviation: σ 1.3 The mean gives the location of the line of symmetry. 1 2 3 4 5 6 7 8 9 10 11 Mean: µ = 6 Standard deviation: σ 1.9 The standard deviation describes the spread of the data. 7 Means and Standard Deviations Eample: 1. Which curve has the greater mean? 2. Which curve has the greater standard deviation? A B 1 3 5 7 9 11 13 The line of symmetry of curve A occurs at = 5. The line of symmetry of curve B occurs at = 9. Curve B has the greater mean. 8
A B 1 3 5 7 9 11 13 Curve B is more spread out than curve A, so curve B has the greater standard deviation. 9 Economic risk = risk to lose money One of possible risk measures standard deviation of potential results µ1=2500 σ1=400 µ2=2500 σ2=1000 10
Assume normal distribution of the repair price * 0.0012 0.001 0.0008 0.0006 11% 1 2 0.0004 31% 0.0002 0 * we will discuss later if this assumption sensible =1 NORMDIST(3000,mean,stdev,1) 11 As with other continuous random variables, probability calculations with any normal distribution are made by computing areas under the graph of the probability density function. Thus, to find the probability that a normal random variable is within any specific interval, we must compute the area under the normal curve over that interval. 0.75 0 1.23 z 12
The Standard Normal Distribution The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. The horizontal scale corresponds to z-scores. 3 2 1 0 1 2 3 z Any normal distribution (with any mean and standard deviation combination) can be transformed into the standard normal distribution (Z). 13 The Standard Normal Distribution Transformation of X units into Z units The horizontal scale corresponds to z-scores. 3 2 1 0 1 2 3 z Any value can be transformed into a z-score by using the formula Value -Mean - µ z = =. Standard deviation σ 14
Eample: If X is distributed normally with mean of 100 and standard deviation of 50, the Z value for X = 200 is: X µ 200 100 Z = = = σ 50 2.0 This says that X = 200 is two standard deviations (2 increments of 50 units) above the mean of 100. 15 If each data value of a normally distributed random variable X is transformed into a z score, the result will be the standard normal distribution. 3 2 1 0 1 2 3 z The area that falls in the interval under the nonstandard normal curve (the -values) is the same as the area under the standard normal curve (within the corresponding z-boundaries). 16
Probability and Normal Distributions Normal Distribution μ = 10 σ = 5 P( < 15) Standard Normal Distribution P(z < 1) μ = 0 σ = 1 μ =10 15 Same area μ =0 1 z P( < 15) = P(z < 1) = Shaded area under the curve = 0.8413 =NORMDIST(15,10,5,1) =NORMSDIST(1) 17 Ecel s NORMSDIST(Z) function provides cumulative probabilities for a standard normal distribution. Ecel s NORMDIST(,μ,σ,cumulative) function provides cumulative probabilities for a (general) normal distribution. For the fourth argument, TRUE is specified if a cumulative probability is desired. 18
Guidelines for Finding Areas Finding Areas Under the Standard Normal Curve 1.Sketch the standard normal curve and shade the appropriate area under the curve. 2.Find the area by following the directions for each case shown. a.to find the area to the left of z, find the area that corresponds to z. The area to the left of z = 1.23 is 0.8907. 0 1.23 z =NORMSDIST(1.23) 19 b. To find the area to the right of z, find the area that corresponds to z. Then subtract the area from 1. 1. The area to the left of z = 1.23 is 0.8907. 2. Subtract to find the area to the right of z = 1.23: 1 0.8907 = 0.1093. 0 1.23 z 20
c.to find the area between two z scores, find the area corresponding to each z score. Then subtract the smaller area from the larger area. 1. The area to the left of z = 1.23 is 0.8907. 2. The area to the left of z = 0.75 is 0.2266. 3. Subtract to find the area of the region between the two z-scores: 0.8907 0.2266 = 0.6641. 0.75 0 1.23 z 21 Eample: Find the area under the standard normal curve to the left of z = 2.33. 2.33 0 z 22
Eample: Find the area under the standard normal curve to the right of z = 0.94. 0.8264 1 0.8264 = 0.1736 0 0.94 z 23 Eample: Find the area under the standard normal curve between z = 1.98 and z = 1.07. 0.8577 0.0239 0.8577 0.0239 = 0.8338 1.98 0 1.07 z 24
If a random variable, X, is normally distributed, you can find the probability that X will fall in a given interval by calculating the area under the normal curve for that interval. P( < 15) μ = 10 σ = 5 μ =10 15 25 Probability and Normal Distributions Eample: The average on a statistics test was 78 with a standard deviation of 8. If the test scores are normally distributed, find the probability that a student receives a test score less than 90. P( < 90) μ = 78 σ = 8 μ =78 μ =0 90? 1.5 z 26
Eample: The average on a statistics test was 78 with a standard deviation of 8. If the test scores are normally distributed, find the probability that a student receives a test score greater than than 85. μ = 78 σ = 8 P( > 85) μ =78 μ =0 85 0.88? z 27 Eample: The average on a statistics test was 78 with a standard deviation of 8. If the test scores are normally distributed, find the probability that a student receives a test score between 60 and 80. μ = 78 σ = 8 P(60 < < 80) 60 μ =78 80 2.25? μ =0 0.25? z The probability that a student receives a test score between 60 and 80 is 0.5865. 28
Question: Which of the following statements about a normal distribution is least accurate? A) The mean, median, and mode are equal. B) A normal distribution is positively skewed. C) The mean and variance completely define a normal distribution. D) Approimately 68% of the observations lie within +/ 1 standard deviation of the mean. 29 Question: Assume that the annual earnings per share (EPS) for a large sample of firms are normally distributed with a mean of $ 6.00 and a standard deviation of $ 2.00. What is the approimate probability of an observed EPS value falling between $3.5 and $9.34? 30
μ = 6 σ = 2 3.5 6 9.34 31 Eample: The probability that a normally distributed random variable will be more than two standard deviations above its mean is: A. 0.0217. B. 0.0228. C. 0.4772. D. 0.9772. 32
Question: A client will move his investment account unless the portfolio manager earns at least a 10% rate of return on his account. The rate of return for the portfolio that the portfolio manager has chosen has a normal probability distribution with an epected return of 19% and a standard deviation of 4.5%. What is the probability that the portfolio manager will keep this account? A) 0.750 B) 0.950 C) 0.977 D) 0.905 33 34
Question: Assume X is a standard normal random variable. Given the probabilities P( X < 0.5) = 0.3085, P(X < 0.75) = 0.7734, and P( X < 1.50) = 0.9332, the probability of 0.2266 corresponds to: A) P( X < 0.25 ). B) P( X < 0.25 ). C) P (X < 0.50). D) P(X < 0.75). 35 The area of a normal distribution is 1, with two symmetric halves that equal 0.5 each. P(X < 0.75) means that the area to the left of 0.75 on the positive portion of the curve is 0.7734. This means that the area to the right of 0.75 is (1.0 0.7734) = 0.2266. Since the halves of a normal distribution are symmetrical, that means the area to the left of ( 0.75) is also 0.2266. 36
Finding percentiles of the Normal probability distribution Using =NORMINV(probability, mean, standard deviation) generates the inverse of a Normal variable at the cumulative probability specified. Thus, if you want the value of a standard Normal (mean=0, standard deviation = 1) variable at its 75th percentile use =NORMINV(.75, 0, 1) Area = 0.75 The z score is 0.67. μ =0 0.67 z 37 Eample: Consider the distribution of the final eam results in quantitative methods that follow a normal distribution with a mean of 55 and a standard deviation of 10. If the results followed this distribution in the net eamination what mark would you epect to divide the class into two groups, one of which consisted of 95% of the students and the other 5% of the students with the highest marks. 38
Question: A normal distribution can be completely described by its: A) skewness. B) mean and variance. C) mean and mode. D) standard deviation. 39 Question: Let Z be a standard normal random variable. An event X is defined to happen if either z takes a value between 1 and 1 or z takes any value greater then 1.5. What is the probability of event X happening? A. 0.083 B. 0.2166 C. 0.6826 D.0.7494 40
Eamples of other Continuous Probability Distributions 41 Eponential A random variable X is said to have Eponential distribution with parameter λ>0 if its probability density function is given by f X λ e, 0 ( ) = λ 0, < 0 Eponential random variables are often used to model arrival times, waiting times, and equipment failure times. EXPONDIST(,lambda,cumulative) returns the eponential distribution. 42
Moments of the Eponential distribution: EX [ ] 1 λ Var[ X ] = 1 λ = 2 0.12 0.51 29.96 5.0% 90.0% 5.0% 0.10 0.08 Epon(10) 0.06 Minimum 0.00 Maimum + Mean 10.00 Std Dev 10.00 0.04 0.02 0.00-5 0 5 10 15 20 25 30 35 40 45 50 43 Eample: The time between machine failures at an industrial plant has an eponential distribution with an average of 2 days between failures. Suppose a failure has just occurred at the plant. Find the probability that the net failure won t happen in the net 5 days. 44
Solution: Let X denote the time between accidents. The mean time to failure is 2 days. Thus, λ= 0.5. Now, P(X > 5) = 1 P(X < 5) = = 1 EXPONDIST(5,0.5,1) = 0.082085 45 The most interest property of the eponential distribution is known as the memoryless property: That says that the probability that we have to wait for an additional time t (and therefore a total time of s + t) given that we have already waited for time s is the same as the probability at the start that we would have had to wait for time t. So the eponential distribution forgets that it is larger than s. 46
Eample: The distance between major cracks in a highway follows an eponential distribution with a mean of 5 km. (a) What is the probability that there are no major cracks in a 20 km stretch of the highway? (b) What is the probability that the first major crack occurs between 15 and 20 km of the start of inspection? (c) Given that there are no cracks in the first 5 km inspected, what is the probability that there are no major cracks in the net 15 km? 47 Solution: Let X denote the distance between major cracks. Then X is an eponential random variable with λ= 1/5. Then a) P(X > 20) = 1 EXPONDIST(20,0.2,1)= 0.018315639 b) P(15 < X < 20)=EXPONDIST(20,0.2,1) EXPONDIST(15,0.2,1) = 0.031471429 48
-3-2 -1 0 1 2 3 c) By the memoryless property, we have P(X > 15+5 X > 5) = P(X > 15) =1 EXPONDIST(15,0.2,1) =0.049787068 49 Lognormal A random variable X is said to have Lognormal distribution with parameters µ and σ if its natural logarithm has a normal distribution with mean µ and standard deviation σ. X 0.5 0.6 Y 0.45 0.5 0.4 Probability Density 0.35 0.3 0.25 0.2 X Y = e 0.4 0.3 0.2 0.15 0.1 0.1 0.05 0 0-1 1 3 5 7 9 11 Normal Counterpart Lognormal If X is a normally distributed then ep(x) is log normally distributed. The log normal distribution is never below zero! 50
A lognormal distribution is used as the standard model stock prices in financial economics. This distribution does not take values less than 0 and positively skewed. 0.6 0.5 0.4 0.3 0.2 0.1 0-1 1 3 5 7 9 11 The rate of return follows a normal distribution. LOGNORMDIST(,mean,standard_dev) 51 A random variable X is said to have Lognormal distribution with parameters µ and σ if its probability density function is given by f X ( ) = 1 e σ 2π 0 (ln µ )/ 2σ 2 > 0 0 52
0 Moments of the LN distribution: E[ X ] = e 2 µ +σ / 2 Var[ X ] = ( E[ ]) 2 ( ) σ e 1 2 0.030 6.3 97.3 5.0% 90.0% 5.0% 0.025 0.020 Lognorm(35,35) 0.015 Minimum 0.00 Maimum + Mean 35.00 Std Dev 35.00 0.010 0.005 0.000-20 20 40 60 The lognormal distribution ehibits skew and has a fairly long tail (allowing it to model large claims) 80 100 120 140 53 Question: For a lognormal distribution, the: A. Mean equals the median. B. Standard deviation equals 1. C. Probability of a negative value outcome is zero. D.Probability of a positive outcome is 50%. 54
Eample: Losses from large fires can often be modeled using a lognormal distribution, suppose that the loss severity due to fire for buildings of a particular type has a lognormal distribution with µ=16.960176 and σ= 0.385253164. Determine the probability that a large fire results in losses eceeding 40 million dollars. 55 APPENDIX 56
Standard Normal Table The Standard Normal table gives the probability less than a desired value for Z (i.e., from negative infinity to Z) Eample: Find the cumulative area that corresponds to a z score of 2.71. 57 Find the area by finding 2.7 in the left hand column, and then moving across the row to the column under 0.01. z.00.01.02.03.04.05.06.07.08.09 0.0.5000.5040.5080.5120.5160.5199.5239.5279.5319.5359 0.1.5398.5438.5478.5517.5557.5596.5636.5675.5714.5753 0.2.5793.5832.5871.5910.5948.5987.6026.6064.6103.6141 2.6.9953.9955.9956.9957.9959.9960.9961.9962.9963.9964 2.7.9965.9966.9967.9968.9969.9970.9971.9972.9973.9974 2.8.9974.9975.9976.9977.9977.9978.9979.9979.9980.9981 The area to the left of z = 2.71 is 0.9966. 58
Eample: Find the cumulative area that corresponds to a z score of 0.25. z.09.08.07.06.05.04.03.02.01.00 3.4.0002.0003.0003.0003.0003.0003.0003.0003.0003.0003 3.3.0003.0004.0004.0004.0004.0004.0004.0005.0005.0005 0.3.3483.3520.3557.3594.3632.3669.3707.3745.3783.3821 0.2.3859.3897.3936.3974.4013.4052.4090.4129.4168.4207 0.1.4247.4286.4325.4364.4404.4443.4483.4522.4562.4602 0.0.4641.4681.4724.4761.4801.4840.4880.4920.4960.5000 Find the area by finding 0.2 in the left hand column, and then moving across the row to the column under 0.05. The area to the left of z = 0.25 is 0.4013 59 Normal Distributions: Finding Values Finding z Scores Eample: Find the z score that corresponds to a cumulative area of 0.9973. z.00.01.02.03.04.05.06.07.08.09 0.0.5000.5040.5080.5120.5160.5199.5239.5279.5319.5359 0.1.5398.5438.5478.5517.5557.5596.5636.5675.5714.5753 0.2.5793.5832.5871.5910.5948.5987.6026.6064.6103.6141 2.6.9953.9955.9956.9957.9959.9960.9961.9962.9963.9964 2.7.9965.9966.9967.9968.9969.9970.9971.9972.9973.9974 2.8.9974.9975.9976.9977.9977.9978.9979.9979.9980.9981 Find the z score by locating 0.9973 in the body of the Standard Normal Table. The values at the beginning of the corresponding row and at the top of the column give the z score. The z-score is 2.78. 60
Finding z Scores Eample: Find the z score that corresponds to a cumulative area of 0.4170. z.09.08.07.06.05.04.03.02.01.00 3.4.0002.0003.0003.0003.0003.0003.0003.0003.0003.0003 0.2.0003.0004.0004.0004.0004.0004.0004.0005.0005.0005 0.3.3483.3520.3557.3594.3632.3669.3707.3745.3783.3821 0.2.3859.3897.3936.3974.4013.4052.4090.4129.4168.4207 0.1.4247.4286.4325.4364.4404.4443.4483.4522.4562.4602 0.0.4641.4681.4724.4761.4801.4840.4880.4920.4960.5000 Use the closest area. Find the z-score by locating 0.4170 in the body of the Standard Normal Table. Use the value closest to 0.4170. The z-score is 0.21. 61 62
63 f( ) µ σ µ µ + σ 1 2 2 ( µ ) 2σ f( ) = e for < < 2 2πσ 64
Transforming a z Score to an Score To transform a standard z score to a data value,, in a given population, use the formula = µ +zσ. 65 Eample: The monthly electric bills in a city are normally distributed with a mean of $120 and a standard deviation of $16. Find the value corresponding to a z score of 1.60. = µ +zσ = 120 +1.60(16) = 145.6 We can conclude that an electric bill of $145.60 is 1.6 standard deviations above the mean. 66
Pareto A random variable X is said to have Pareto distribution with parameters α>0 and θ>0 if its probability density function is given by where θα θ f X ( ) = θ + 1 α < + 67 Moments of the Pareto distribution: E[ X ] = αθ θ 1 θ >1 2 θα Var[ X ] = 2 ( θ 1) ( θ 2) θ > 2 12 1.005 1.349 5.0% 90.0% 5.0% 10 α 8 6 Pareto(10,1) Minimum 1.000 Maimum + Mean 1.111 Std Dev 0.124 4 2 0 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 68
Gamma distribution A random variable X is said to have Gamma distribution with parameters r>0 and λ>0 if its probability density function is given by f X ( ) 1 = βγ( α) β 0 α 1 e / β > 0 0 where Γ denotes the gamma function. 1 Γ( ) = r r z e 0 z dz, r 0, 1, 2,K Γ( n) = ( n 1)! for all positive integers n 69 It is a fleible distribution, that can fit a wide variety of random patterns. The Gamma Distribution is related to the eponential distribution in that the sum eponential distributed random variables give a gamma random variable. 70
Moments of the Gamma distribution: E[X ] = βα Var[ X ] = β 2 α 0.040 3.6 47.4 5.0% 90.0% 5.0% 0.035 0.030 0.025 Gamma(2,10) 0.020 Minimum 0.00 Maimum + Mean 20.00 Std Dev 14.14 0.015 0.010 0.005 0.000-10 0 10 20 30 40 50 60 70 71