Commonly Used Distributions

Similar documents
Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

STAT Chapter 7: Central Limit Theorem

PROBABILITY DISTRIBUTIONS

4 Random Variables and Distributions

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

STAT 241/251 - Chapter 7: Central Limit Theorem

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

The Bernoulli distribution

Chapter 7 1. Random Variables

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Chapter 4 Probability Distributions

Statistics and Probability

Overview. Definitions. Definitions. Graphs. Chapter 4 Probability Distributions. probability distributions

2011 Pearson Education, Inc

Statistics 6 th Edition

Chapter 3 Discrete Random Variables and Probability Distributions

Probability. An intro for calculus students P= Figure 1: A normal integral

Random Variables Handout. Xavier Vilà

Statistical Tables Compiled by Alan J. Terry

MATH 3200 Exam 3 Dr. Syring

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

BIOL The Normal Distribution and the Central Limit Theorem

ECON 214 Elements of Statistics for Economists 2016/2017

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

What was in the last lecture?

Probability Distributions for Discrete RV

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Sampling Distributions For Counts and Proportions

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

IEOR 165 Lecture 1 Probability Review

Unit 5: Sampling Distributions of Statistics

Business Statistics 41000: Probability 3

Unit 5: Sampling Distributions of Statistics

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Random variables. Contents

Chapter 4 Continuous Random Variables and Probability Distributions

Homework Assignments

Statistics for Business and Economics

Probability and Statistics

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 4 Continuous Random Variables and Probability Distributions

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Random Variable: Definition

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

The normal distribution is a theoretical model derived mathematically and not empirically.

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Chapter 5. Sampling Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Appendix A. Selecting and Using Probability Distributions. In this appendix

Simple Random Sample

. (i) What is the probability that X is at most 8.75? =.875

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Business Statistics 41000: Probability 4

5.3 Statistics and Their Distributions

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

4.3 Normal distribution

Some Discrete Distribution Families

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Chapter 4 and 5 Note Guide: Probability Distributions

Lecture 3: Probability Distributions (cont d)

CH 5 Normal Probability Distributions Properties of the Normal Distribution

The Normal Distribution

Continuous random variables

Describing Uncertain Variables

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

6. Continous Distributions

MA : Introductory Probability

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Central Limit Theorem (cont d) 7/28/2006

Chapter 3 Discrete Random Variables and Probability Distributions

Binomial Random Variables. Binomial Random Variables

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

Overview. Definitions. Definitions. Graphs. Chapter 5 Probability Distributions. probability distributions

Statistical Intervals (One sample) (Chs )

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

χ 2 distributions and confidence intervals for population variance

Continuous Distributions

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Welcome to Stat 410!

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

2 of PU_2015_375 Which of the following measures is more flexible when compared to other measures?

Statistics for Managers Using Microsoft Excel 7 th Edition

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

MAS187/AEF258. University of Newcastle upon Tyne

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Transcription:

Chapter 4: Commonly Used Distributions 1 Introduction Statistical inference involves drawing a sample from a population and analyzing the sample data to learn about the population. We often have some knowledge about the probability mass function or probability density function of the population. In this chapter, we describe some of the standard families of curves. 2 1

Section 4.1: The Binomial Distribution We use the Bernoulli distribution when we have an experiment which can result in one of two outcomes. One outcome is labeled success, and the other outcome is labeled failure. The probability of a success is denoted by p. The probability of a failure is then 1 p. Such a trial is called a Bernoulli trial with success probability p. 3 Examples of Bernoulli Trials 1. The simplest Bernoulli trial is the toss of a coin. The two outcomes are heads and tails. If we define heads to be the success outcome, then p is the probability that the coin comes up heads. For a fair coin, p = 1/2. 2. Another Bernoulli trial is a selection of a component from a population of components, some of which are defective. If we define success to be a defective component, then p is the proportion of defective components in the population. 4 2

Binomial Distribution If a total of n Bernoulli trials are conducted, and The trials are independent. Each trial has the same success probability p. X is the number of successes in the n trials. then X has the binomial distribution with parameters n and p, denoted X ~ Bin(n, p). 5 Example 4.1 A fair coin is tossed 10 times. Let X be the number of heads that appear. What is the distribution of X? 6 3

Probability Mass Function of a Binomial Random Variable If X ~ Bin(n, p), the Probability Mass Function of X is p x = P X = x = n! x! n x! px (1 p) n x, x = 0,1,, n 0, otherwise (4.2) 7 Binomial Probability Histogram Figure 4.1 (a) Bin(10, 0.4) (b) Bin(20, 0.1) 8 4

Example 4.2 The probability that a newborn baby is a girl is approximately 0.49. Find the probability that of the next five single births in a certain hospital, no more than two are girls. 9 Another Use of the Binomial Assume that a finite population contains items of two types, successes and failures, and that a simple random sample is drawn from the population. Then if the sample size is no more than 5% of the population, the binomial distribution may be used to model the number of successes. 10 5

Example 4.3 A lot contains several thousand components, 10% of which are defective. Nine components are sampled from the lot. Let X represent the number of defective components in the sample. Find the probability that exactly two are defective. 11 Tables for Binomial Probabilities Table A.1 (in Appendix A) presents the binomial probabilities of the form P(X x) for n 20 and selected values of p. Excel: BINOM.DIST(number_s, trials, probability_s, cumulative) Minitab: Calc Probability Distributions Binomial 12 6

Example 4.4 Of all the new vehicles of a certain model that are sold, 20% require repairs to be done under warranty during the first year of service. A particular dealership sells 14 such vehicles. What is the probability that fewer than five of them require warranty repairs? 13 Mean and Variance of a Binomial Random Variable Mean: μ X = np (4.3) Variance: σ X 2 = np(1 p) (4.4) 14 7

Section 4.2: The Poisson Distribution One way to think of the Poisson distribution is as an approximation to the binomial distribution when n is large and p is small. It is the case when n is large and p is small that the mass function depends almost entirely on the mean np, and very little on the specific values of n and p. We can therefore approximate the binomial mass function with a quantity λ = np; this λ is the parameter in the Poisson distribution. 15 Probability Mass Function, Mean, and Variance of Poisson Dist. If X ~ Poisson(λ), the probability mass function of X is p x = P X = x = e λ λ x, x = 0,1, 2 x! 0, otherwise (4.6) (4.7) Mean and Variance: μ X = λ, σ X 2 = λ (4.8) Note: X must be a discrete random variable and λ must be a positive constant. 16 8

Poisson Probability Histogram Figure 4.2 (a) Poisson(1) (b) Poisson(10) 17 Poisson Probabilities Excel: POISSON.DIST(x, mean, cumulative) Minitab: Calc Probability Distributions Poisson 18 9

Example 4.9 Particles are suspended in a liquid medium at a concentration of 6 particles per ml. A large volume of the suspension is thoroughly agitated, and then 3 ml are withdrawn. What is the probability that exactly 15 particles are withdrawn? 19 Section 4.3: The Normal Distribution The normal distribution (also called the Gaussian distribution) is by far the most commonly used distribution in statistics. This distribution provides a good model for many, although not all, continuous populations. The normal distribution is continuous rather than discrete. The mean of a normal population may have any value, and the variance may have any positive value. 20 10

Probability Density Function, Mean, and Variance of Normal Dist. The probability density function of a normal population with mean and variance 2 is given by f x = 1 (x μ)2 exp σ 2π 2σ 2, < x < If X ~ N(, 2), then the mean and variance of X are given by μ X = μ, σ X 2 = σ 2 (4.9) 21 68-95-99.7% Rule This figure represents a plot of the normal probability density function with mean and standard deviation. Note that the curve is symmetric about, so that is the median as well as the mean. It is also the case for the normal population. About 68% of the population is in the interval. About 95% of the population is in the interval 2. About 99.7% of the population is in the interval 3. 22 11

Standard Units The proportion of a normal population that is within a given number of standard deviations of the mean is the same for any normal population. For this reason, when dealing with normal populations, we often convert from the units in which the population items were originally measured to standard units. Standard units tell how many standard deviations an observation is from the population mean. 23 Standard Normal Distribution In general, we convert to standard units by subtracting the mean and dividing by the standard deviation. Thus, if x is an item sampled from a normal population with mean and variance 2, the standard unit equivalent of x is the number z, where z = (x )/ The number z is sometimes called the z-score of x. The z-score is an item sampled from a normal population with mean 0 and standard deviation of 1. This normal distribution is called the standard normal distribution. 24 (4.10) 12

Example 4.12 Resistances in a population of wires are normally distributed with mean 20 mω and standard deviation 3 mω. The resistance of two randomly chosen wires are 23 mω and 16 mω. Convert these amounts to standard units. 25 Example 4.12 (Cont.) The resistance of a wire has a z-score of 1.7. Find resistance of the wire in the original units of mω. 26 13

Finding Areas Under the Normal Curve The proportion of a normal population that lies within a given interval is equal to the area under the normal probability density above that interval. This would suggest integrating the normal pdf, but this integral does not have a closed form solution. So, the areas under the curve are approximated numerically and are available in Table A.2. This table provides area under the curve for the standard normal density. We can convert any normal into a standard normal so that we can compute areas under the curve. The table gives the area in the left-hand tail of the curve. Other areas can be calculated by subtraction or by using the fact that the total area under the curve is 1. 27 Normal Probabilities Excel: NORMDIST(x, mean, standard_dev, cumulative) NORMINV(probability, mean, standard_dev) NORMSDIST(z) NORMSINV(probability) Minitab: Calc Probability Distributions Normal 28 14

Example 4.15 Find the area under normal curve to the left of z = 0.47. 29 Example 4.16 Find the area under the curve to the right of z = 1.38. 30 15

Example 4.17 Find the area under the normal curve between z = 0.71 and z = 1.28. 31 Example 4.18 What z-score corresponds to the 75 th percentile of a normal curve? 32 16

Linear Functions of Normal Random Variables Let X ~ N(μ, σ 2 ) and let a 0 and b be constants. Then ax + b ~N(a + b, a 2 2 ). (4.11) Let X 1, X 2,, X n be independent and normally distributed with means 1, 2,, n and variances 12, 22,, n2. Let c 1, c 2,, c n be constants, and c 1 X 1 + c 2 X 2 + + c n X n be a linear combination. Then c 1 X 1 + c 2 X 2 + + c n X n ~ N(c 1 1 + c 2 2 + + c n n, c 12 12 + c 22 22 + + cn 2 n2 ) (4.12) 33 Example 4.22 A chemist measures the temperature of a solution in o C. The measurement is denoted C, and is normally distributed with mean 40 o C and standard deviation 1 o C. The measurement is converted to o F by the equation F = 1.8C + 32. What is the distribution of F? 34 17

Distributions of Functions of Normal Random Variables Let X 1, X 2,, X n be independent and normally distributed with mean and variance 2. Then X~N μ, σ2 n (4.13) Let X and Y be independent, with X ~ N( X, X2 ) and Y ~ N( Y, Y2 ). Then X + Y~N μ X + μ Y, σ X 2 + σ Y 2 X Y~N μ X μ Y, σ X 2 + σ Y 2 (4.14) (4.15) 35 Section 4.4: The Lognormal Distribution For data that contain outliers, the normal distribution is generally not appropriate. The lognormal distribution, which is related to the normal distribution, is often a good choice for these data sets. If X~N(μ, σ 2 ), then the random variable Y = e X has the lognormal distribution with parameters and 2. If Y has the lognormal distribution with parameters and 2, then the random variable X = lny has the N(μ, σ 2 ) distribution. 36 18

Lognormal pdf, mean, and variance The pdf of a lognormal random variable Y with parameters and 2 is f(x) = 1 [ln(x) μ]2 exp, x > 0 σx 2π 2σ 2 0, otherwise (4.16) Mean: E Y = exp μ + σ2 2 (4.17) Variance: V Y = exp 2μ + 2σ 2 exp(2μ + σ 2 ) 37 Lognormal Probability Density Function = 0 = 1 38 19

Lognormal Probabilities Excel: LOGNORMDIST(x, mean, standard_dev) Minitab: Calc Probability Distributions Lognormal 39 Example 4.24 When a pesticide comes into contact with the skin, a certain percentage of it is absorbed. The percentage that is absorbed during a given time period is often modeled with a lognormal distribution. Assume that for a given pesticide, the amount that is absorbed (in percent) within two hours is lognormally distributed with of 1.5 and of 0.5. Find the probability that more than 5% of the pesticide is absorbed within two hours. 40 20

Section 4.5: The Exponential Distribution The exponential distribution is a continuous distribution that is sometimes used to model the time that elapses before an event occurs. Such a time is often called a waiting time. The probability density of the exponential distribution involves a parameter, which is a positive constant whose value determines the density function s location and shape. We write X ~ Exp( ). 41 Exponential R.V.: pdf, cdf, mean and variance The pdf of an exponential random variable X is f(x) = λ e λx, x > 0 0, x 0 The cdf of an exponential random variable is F(x) = 1 e λx, x > 0 0, x 0 The mean of an exponential random variable is μ x = 1/ The variance of an exponential random variable is σ x2 = 1/ 2 42 (4.18) (4.19) (4.20) (4.21) 21

Exponential Probability Density Function 43 Exponential Probabilities Excel: EXPONDIST(x, lambda, cumulative) Minitab: Calc Probability Distributions Exponential 44 22

Example 4.26 A radioactive mass emits particles according to a Poisson process at a mean rate of 15 particles per minute. At some point, a clock is started. 1. What is the probability that more than 5 seconds will elapse before the next emission? 2. What is the mean waiting time until the next particle is emitted? 45 Lack of Memory Property The exponential distribution has a property known as the lack of memory property: If T ~ Exp( ), and t and s are positive numbers, then P(T > t + s T > s) = P(T > t) 46 23

Example 4.27 / 4.28 The lifetime of a transistor in a particular circuit has an exponential distribution with mean 1.25 years. 1. Find the probability that the circuit lasts longer than 2 years. 2. Assume the transistor is now three years old and is still functioning. Find the probability that it functions for more than two additional years. 3. Compare the probability computed in 1. and 2. 47 Section 4.6: Some Other Continuous Distributions The uniform distribution has two parameters, a and b, with a < b. If X is a random variable with the continuous uniform distribution then it is uniformly distributed on the interval (a, b). We write X ~ U(a, b). The pdf is f(x) = 1, a < x < b b a 0, otherwise (4.22) 48 24

Uniform Distribution: Mean and Variance If X ~ U a, b, Then the mean is μ X = a+b 2 (4.23) and the variance is σ X 2 = (b a)2 12 (4.24) 49 The Gamma Distribution First, let s consider the gamma function: For r > 0, the gamma function is defined by Γ r = 0 t r 1 e t dt (4.25) The gamma function has the following properties: 1. If r is any integer, then (r) = (r 1)! 2. For any r, (r + 1) = r (r) 3. Γ r = π 50 25

Gamma R.V. The pdf of the gamma distribution with parameters r > 0 and λ > 0 is f(x) = λ x r 1 e λx Γ(r) 0, x 0, x > 0 (4.26) The mean and variance are given by μ X = r λ and σ 2 X = r λ 2, respectively. If r = 1, the gamma distribution is the same as the exponential. If r = k/2, where k is a positive integer, the distribution is called a chi-square distribution with k degrees of freedom. 51 (4.27) (4.28) Gamma Probability Density Function 52 26

Gamma Probabilities Excel: GAMMADIST(x, alpha, beta, cumulative) GAMMAINV(probability, alpha, beta) GAMMALN(x) Minitab: Calc Probability Distributions Gamma 53 Special Cases of Gamma Distribution If r is a positive integer, the gamma distribution is called an Erlang Distribution. If r = k/2 where k is a positive integer, the (r, 1/2) distribution is called chi-square distribution with k degrees of freedom. 54 27

The Weibull Distribution The Weibull distribution is a continuous random variable that is used in a variety of situations. A common application of the Weibull distribution is to model the lifetimes of components. The Weibull probability density function has two parameters, both positive constants, that determine the location and shape. We denote these parameters and. If = 1, the Weibull distribution is the same as the exponential distribution with parameter =. 55 Weibull R.V. The pdf of the Weibull distribution is f(x) = αβα x α 1 e (βx)α, x > 0 0, x 0 The mean of the Weibull is μ X = 1 β Γ 1 + 1 α (4.29) (4.31) The variance of the Weibull is σ X 2 = 1 β 2 Γ 1 + 2 α Γ 1 + 1 α 2 (4.32) 56 28

Weibull Probability Density Function 57 Weibull Probabilities Excel: WEIBULL(x, alpha, beta, cumulative) Minitab: Calc Probability Distributions Weibull 58 29

Section 4.7: Probability Plots Scientists and engineers often work with data that can be thought of as a random sample from some population. In many cases, it is important to determine the probability distribution that approximately describes the population. More often than not, the only way to determine an appropriate distribution is to examine the sample to find a sample distribution that fits. 59 Finding a Distribution Probability plots are a good way to determine an appropriate distribution. Here is the idea: Suppose we have a random sample X 1,, X n. We first arrange the data in ascending order. Then assign evenly spaced values between 0 and 1 to each X i. There are several acceptable ways to this; the simplest is to assign the value (i 0.5)/n to X i. The distribution that we are comparing the X s to should have a mean and variance that match the sample mean and variance. We want to plot (Xi, F(Xi)), if this plot resembles the cdf of the distribution that we are interested in, then we conclude that that is the distribution the data came from. 60 30

Probability Plot: Example i X i (i-.5)/n Q i 1 3.01 0.1 2.4369 2 3.35 0.3 3.9512 3 4.79 0.5 5.0000 4 5.96 0.7 6.0488 5 7.89 0.9 7.5631 8.0000 7.0000 6.0000 5.0000 Qi 4.0000 3.0000 2.0000 1.0000 0.0000 0 2 4 6 8 10 61 Probability Plot: Example 62 31

Software Many software packages take the (i 0.5)/n assigned to each X i, and calculate the quantile (Q i ) corresponding to that number from the distribution of interest. Then it plots each (Xi, Qi). If this plot is a reasonably straight line then you may conclude that the sample came from the distribution that we used to find quantiles. 63 Normal Probability Plots The sample plotted on the left comes from a population that is not close to normal. 32

Normal Probability Plots The sample plotted on the left comes from a population that is not close to normal. The sample plotted on the right comes from a population that is close to normal. Section 4.8: The Central Limit Theorem The Central Limit Theorem Let X 1,, X n be a random sample from a population with mean and variance 2. Let X = X 1+ +X n be the sample mean. n Let S n = X 1 + + X n be the sum of the sample observations. Then if n is sufficiently large, X~N μ, σ2 n and S n ~N nμ, nσ 2 approximately. (4.33) (4.34) 66 33

Rule of Thumb For most populations, if the sample size is greater than 30, the Central Limit Theorem approximation is good. Normal approximation to the Binomial: If X~Bin(n, p) and if np > 5, and n(1 p) > 5, then X ~ N(np, np(1 p)) approximately. (4.35) Normal Approximation to the Poisson: If X ~ Poisson(λ), where λ > 10, then X ~ N(λ, λ). (4.36) 67 Normal approximation to Binomial Bin(100, 0.2) and N (20, 16) 68 34

Continuity Correction The binomial distribution is discrete, while the normal distribution is continuous. The continuity correction is an adjustment, made when approximating a discrete distribution with a continuous one, that can improve the accuracy of the approximation. If you want to include the endpoints in your probability calculation, then extend each endpoint by 0.5. Then proceed with the calculation. If you want exclude the endpoints in your probability calculation, then include 0.5 less from each endpoint in the calculation. 69 Continuity Correction P(45 X 55) P(45<X<55) 70 35

Example 4.32 If a fair coin is tossed 100 times, use the normal curve to approximate the probability that the number of heads is between 45 and 55 inclusive. 71 Example 4.34 The number of hits on a website follow a Poisson distribution, with a mean of 27 hits per hour. Find the probability that there will be 90 or more hits in three hours. 72 36

Summary Discrete Distributions Bernoulli Binomial Poisson. Continuous distributions Normal Exponential Uniform Gamma Weibull. Central Limit Theorem. Normal approximations to the Binomial and Poisson dist. 73 37