ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

ME3620 Theory of Engineering Experimentation Chapter III. Random Variables and Probability Distributions Chapter III 1

3.2 Random Variables In an experiment, a measurement is usually denoted by a variable such as X. In a random experiment, a variable whose measured value can change (from one replicate of the experiment to another) is referred to as a random variable. Chapter III 2

Chapter III 3

3.3 Probability Used to quantify likelihood or chance Used to represent risk or uncertainty in engineering applications Probability statements describe the likelihood that particular values occur. The likelihood is quantified by assigning a number from the interval [0, 1] to the set of values (or a percentage from 0 to 100%). Higher numbers indicate that the set of values is more likely. A probability is usually expressed in terms of a random variable. Chapter III 4

3.3 Probability Complement of an Event Given a set E, the complement of E is the set of elements that are not in E. The complement is denoted as E. Mutually Exclusive Events The sets E 1, E 2,...,E k are mutually exclusive if the intersection of any pair is empty. That is, each element is in one and only one of the sets E 1, E 2,..., E k. Probability Properties X represents the value of measurement of a variable Chapter III 5

3.3 Probability Probability Properties This property states that the maximum value for a property is 1 The probability of any event cannot be negative This property states that the proportion of measurements that fall in E 1 E 2 E k is the sum of the proportions that fall in E 1 and E 2 and, and E k whenever sets are mutually exclusive. For example P(X 10) = P(X 0) + P(0 < X 5) + P(5 < X 10) Also P(X E ) = 1 - P(X E) For example, P(X 2) = 1 - P(X > 2) In general, for any fixed number x: P(X x) = 1 - P(X > x) Chapter III 6

Example: The homework scores of a given assignment are listed on the second and fourth columns of the next data set, 1 33.5625 18 82.375 2 54.21875 19 82.875 3 61.8125 20 83.375 4 63.6875 21 84.3125 5 65.1875 22 84.8125 6 66.8125 23 85.4375 7 67.6875 24 85.6875 8 71.84375 25 86.5 9 73.65625 26 87.4375 10 74.75 27 87.75 11 76.75 28 88.70625 12 78.6875 29 89.375 13 78.75 30 90.4375 14 80.875 31 92.25 15 81.3125 32 92.4375 16 81.9375 33 96.0625 17 82.25 34 96.4375 a) What is the probability of a student getting 80% or below? b) What is the probability of a student getting 86% or below? c) What is the probability of a student getting between 80% and 86%? d) What is the probability of a student getting a score larger than 86%? Example. Problem 3 13 Example. Problem 3 17 Chapter III 7

3 4 Continuous Random Variables 3 4.1 Probability Density Function (pdf) The probability distribution or simply distribution, f(x),of a random variable X is a description of the set of the probabilities associated with the possible values for X. The Probability Density Function (pdf) is used to describe the probability distribution of a continuous random variable X. The probability that X is between a and b is determined as the integral of f(x) from a to b. Chapter III 8

3 4.2 Cumulative Distribution Function Another way to describe the probability distribution of a random variable is by defining a function that provides the probability than X is less or equal to x. Chapter III 9

Example 3 3 Chapter III 10

Example 3 4 a) Determine the cumulative density function (cdf). b) Determine the probability that the distance to the first surface flaw is less than 1000 μm. c) Determine the probability that the distance to the first surface flaw exceeds 2000 μm. d) Determine the probability that the distance to the first surface flaw is between 1000 μm and 2000 μm. Chapter III 11

3 4.3 Mean and Variance For sample data x 1, x 1,, x n the sample mean (x ) is determined by: x x x x n x n For sample data x 1, x 1,, x n the sample variance (s 2 ) is a measure of the dispersion or scatter in the data: s x x x x x x x x n 1 n 1 Chapter III 12

For sample data x 1, x 1,, x n the sample variance (s 2 ) is a measure of the dispersion or scatter in the data: s x x x x x x x x n 1 n 1 s x x n n 1 The population variance [σ 2, or V(x)] of a random variable X is: σ = Chapter III 13

Example: The homework scores of a given assignment are listed on the second and fourth columns of the next data set, 1 33.5625 18 82.375 2 54.21875 19 82.875 3 61.8125 20 83.375 4 63.6875 21 84.3125 5 65.1875 22 84.8125 6 66.8125 23 85.4375 7 67.6875 24 85.6875 8 71.84375 25 86.5 9 73.65625 26 87.4375 10 74.75 27 87.75 11 76.75 28 88.70625 12 78.6875 29 89.375 13 78.75 30 90.4375 14 80.875 31 92.25 15 81.3125 32 92.4375 16 81.9375 33 96.0625 17 82.25 34 96.4375 a) What is mean? b) What is the variance? c) What is the standard deviation? Example. Problem 3 24 Example. Problem 3 27 Chapter III 14

3 5 Important Continuous Distribution Functions 3 5.1 Normal Distribution Normal distribution is also known as Gaussian distribution, Bell curve, or Natural distribution. The normal distribution is the most widely used model for the distribution of a random variable. Whenever a random experiment is replicated, the random variable that equals the average (or total) result over the replicates tends to have a normal distribution as the number of replicates become larger. Chapter III 15

3 5.1 Normal Distribution Random variables with different mean (μ) and variance (σ 2 ) can be modeled by normal probability density functions with appropriate choices of the center and width of the curve. μ determines the center and σ 2 determines the width. Thus, a random variable X with probability density function: f x 1 e σ 2π has a normal distribution (and it is called a normal random variable) with parameters μ and σ, where - < μ <, and σ > 0. The notation N(μ, σ 2 ) is often used to denote a normal distribution with mean μ and variance σ 2. Chapter III 16

3 5.1 Normal Distribution For example, the figure shows the plot of three random variables that follow a normal distribution, two of them have the same mean (μ = 5) but different variance (σ 2 ), the third variable has a mean of μ = 15. As mentioned previously, the variance is a measure of the scatter of the data, therefore variables with larger the variance will have a flatter Gaussian curve. Chapter III 17

3 5.1 Normal Distribution The figure shows the plot (created on excel) of two random variables that follow a normal distribution, the variables have the same mean μ = 1, but different standard deviations σ 1 = 4, σ 2 = 7. 0.1 0.08 0.06 0.04 0.02 0 10 9 8 7 6 5 4 3 2 1 STD = 4 0 1STD = 72 3 4 5 6 7 8 9 10 Chapter III 18

3 5.1 Normal Distribution The following figure summarizes some important characteristics of a normal distribution. Thus, the probabilities of a variable X that follows a normal distribution of: P(μ σ < X < μ + σ) = 0.6827 P(μ 2σ < X < μ + 2σ) = 0.9545 P(μ 3σ < X < μ + 3σ) = 0.9973 μ ± 2σ 0.95 95% Confidence Interval (C. I.) μ ± 3σ 0.99 99% C. I. Since more than 0.9973 of a probability of a normal distribution is within the interval (μ 3σ < X < μ + 3σ), 6σ is called the width of a normal distribution. The area under the curve of a normal pdf from < x < is 1. Chapter III 19

3 5.1 Normal Distribution Standard Normal Random Variable Table 1 provides cumulative probabilities for a standard normal random variable. Standard Normal Cumulative Distribution Function Chapter III 20

Example: Determine the following probabilities: a) P(Z 1.12) b) P(Z > 1.12) c) P(Z 0.43) d) P(Z > 0.43) e) P(.06 Z 1.18) f) Find the value of z such that P(Z z) = 0.33 g) Find the value of z such that P(Z > z) = 0.22 Standard Normal Cumulative Distribution Function Chapter III 21

Standard Normal Cumulative Distribution Function The variable Z, defined as: represents the distance of X from its mean in terms of standard deviations. It is important to notice that Z is a dimensionless parameter. Example. Problem 3 55. Chapter III 22

3 6 Probability Plots 3 6.1 Normal Probability Plots How do we know if a normal distribution is a reasonable model for data? Probability plotting is a graphical method for determining whether sample data conform to a hypothesized distribution based on a subjective visual examination of the data. Probability plotting typically uses special graph paper, known as probability paper, that has been designed for the hypothesized distribution. Probability paper is widely available for the normal, lognormal, Weibull, and various chi-square and gamma distributions. Chapter III 23

3 6.1 Normal Probability Plots To construct a probability plot: a) Rank the data in ascending order, that is from smaller to largest: x 1, x 2,, x n, where x 1 is the smaller and x n the largest. b) Using the probability paper of the hypothesized distribution, plot the ordered observations x j on the abscissae axis (horizontal axis) and the observed cumulative frequency [( j 0.5)/n] on the axis of the ordinate (vertical axis). c) Add a trend line. If the hypothesized distribution adequately describes the data, the plotted points will fall along a straight line. If the plotted points deviate significantly and systematically from the straight line the hypothesized model is not appropriate. Chapter III 24

3 6.1 Normal Probability Plots To construct a normal probability plot, using ordinary graph paper: a) Determine a set of standardized normal scores using the cumulative frequency as For example, if ( j 0.5)/n = 0.026, (z j ) = 0.026 implies that z j = 1.94313 EXCEL has a function to determine z j. The name of the function is: NORM.S.INV(argument) b) Using a scatter plot (on excel), plot the ordered observations x j on the abscissae axis and the standardized normal scores on the axis of the ordinate (vertical axis). c) Add a trend line.. P Z z Φ z ) Chapter III 25

3 6.1 Normal Probability Plots If the hypothesized distribution adequately describes the data, the plotted points will fall along a straight line. If the plotted points deviate significantly and systematically from the straight line the hypothesized model is not appropriate. Example 3 18. Example. Problem 3 83. Chapter III 26

3 8 Binomial Distribution A trial with only two possible outcomes is frequently used as a starting point of a random experiment. These type of experiments with only two possible outcomes are called Bernoulli trial. For example: Coin toss, roll of a die expecting 4, etc. Then if the trials that constitute the random experiment are independent. Implying that the outcome from one trial has no effect on the outcome to be obtained from any other trial. Additionally, the probability of a success on each trial is constant and known. Chapter III 27

3 8 Binomial Distribution Thus, in a binomial experiment: 1: The number of observations n is fixed. 2: Each observation is independent. 3: Each observation represents one of two outcomes ("success" or "failure"). 4: The probability of "success" p is the same for each trial. The probability distribution that describes these types of experiments is the Binomial Distribution Chapter III 28

3 8 Binomial Distribution n is the total number of samples x is the number of successful events. p is the probability of a successful event in a single trial. (1 p) is the probability of failure of the event in a single trial. n x nc n! x! n x! The mean and variance for a Binomial Distribution are defined as μ= E(X) = np and σ 2 = V(X) = np(1 p) Chapter III 29

3 8 Binomial Distribution Example. Problem 3 107. Example. Problem 3 108. Chapter III 30

3 8 Poisson Process The Poisson process is one of the most widely-used counting processes. It is usually used when it is necessary to count the occurrences of certain events that appear to happen at a certain rate, but completely at random (without a certain structure). For example, it is known from historical data, that a certain region has 19 days of rain during the three summer months. Other than this information, the timings of days with rain appears to be totally random. This process falls within the category of a Poisson process. Chapter III 31

3 8.1 Poisson Distribution The probability distribution that models a Poisson process is called Poisson distribution Mean = λ = np, Average number of expected occurrence. Variance = σ 2 = λ = np p, Probability of occurrence in a single trial. n, total number of events Example. A taxi cab company owns 100 taxis, each car has a probability of breaking down on a given day of p = 0.05. a) Find the probability that three of the cars will breakdown today b) Find the probability that at most three of the cars will breakdown today c) Find the probability that at least three of the cars will breakdown today Chapter III 32

3 10 Normal Approximation to the Binomial and Poisson Distributions Binomial Distribution Approximation A binomial random variable is the total count of successes from repeated independent trials. When the number of trials, n, is large the binomial random variable can be approximated to a normal random variable. Consequently, the normal distribution can be used to approximate binomial probabilities when n is large. Since for a normal distribution the variable Z is defined as: Z x μ σ Then, when modeling a binomial variable using a normal distribution approximation μ= E(X) = np and σ 2 = V(X) = np(1 p) Chapter III 33

Binomial Distribution Approximation Thus Since expressing a binomial variable in terms of a normal distribution is an approximation, an additional correction factor (known as continuity correction) can be introduced to further improve the approximation. In general a ±0.5 is added to the binomial values to improve the approximation. Thus, the ±0.5 correction is applied such that increases the binomial probability that it is to be approximated. Example. Problem 3 148. Chapter III 34

Poisson Distribution Approximation Similarly as a binomial random variable can be modeled using a normal distribution, a Poisson distribution can be approximated to a normal distribution. In order for a Poisson probability distribution to be approximated to a normal probability distribution λ = np > 5. Thus, if X is a Poisson random variable with E(X) = λ and V(X) = λ Z x λ λ is approximately a standard normal random variable. Example. Problem 3 154. Chapter III 35

Weibull Distribution Approximation The Weibull distribution is used to model the time until failure of a number of different physical systems. The parameters involved in the description of this distribution allow to adapt the distribution to systems in which the number of failures increases with time (bearing wear), decreases with time (some semiconductors) or remains constant (failures produced by external factors to the system) Chapter III 36

Weibull Distribution Approximation β Is a shape parameter and it is equal to the slope of regression line in Weibull plot paper δ Is the scale parameter or characteristic life (time), at a reliability failure of 63.2 %. x Life of product Chapter III 37

Weibull Distribution Approximation The cumulative Weibull distribution function is: The mean (life) and variance of the Weibull distribution are: where the Gamma function, Γ, is defined as: Γ is tabulated in tables or can be readily determined in software such as EXCEL using the function GAMMA(argument) Chapter III 38

Weibull Distribution Approximation Example: Bearings are tested to the failure cycle, according to the data gathered and plotted on probability paper, the slope of the regression line is β = 1.5 and 63.2 % of the bearings will fail after δ = 5.7 10 5 cycles. a) Find the probability that a bearing will fail before 4 10 5 cycles. b) Find the mean life cycle of this bearing. c) Find the number of cycles at which 10 % of the bearings will fail. Chapter III 39

3 13 Random Samples Statistics and the Central Limit Theorem In statistics data is defined as the observed values of random variables obtained from replicates of a random experiment. If the random variables that represent the observations of n replicates are X 1, X 2,, X n, and since replicates are identical, then each random variable follows the same distribution. Additionally, the random variables are independent from each other. Thus, Consider now a large population of objects of which a subset of n items is randomly selected. If the total population has a given distribution, it follows that the randomly sampled items will also have the same given distribution. Chapter III 40

3 13 Random Samples Statistics and the Central Limit Theorem Thus, Statistical Analysis can be performed using information a) Measured directly from the entire population (true μ and σ) b) Measured indirectly from sampling (X, and σ ) σ is known as the Standard Error Mean (S. E. M.) and it is defined as σ σ being the standard deviation of the entire population and n the number of items sampled. Thus, for a sample of a population with normal distribution Chapter III 41

3 13 Random Samples Statistics and the Central Limit Theorem Example: Assume that the weight of medium size propane tanks follows a normal distribution. It is known that the mean weight of the entire population is μ = 35 lb with a standard deviation of σ = 3.6 lb. Determine, for a random sample of n = 34 tanks, a) What is the probability that the mean, X, of the sample is less than 33 lb? b) What is the probability that the mean, X, of the sample is between 34 lb and 36.5 lb? Example. Problem 207. For part b), change t = 1970 min instead of 2200 min Chapter III 42