Probability Distributions

Similar documents
Random Variables and Probability Distributions

Business Statistics 41000: Probability 3

Continuous Distributions

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Probability. An intro for calculus students P= Figure 1: A normal integral

What was in the last lecture?

2.1 Properties of PDFs

Chapter 4 Continuous Random Variables and Probability Distributions

Statistics for Business and Economics

Chapter 4 Continuous Random Variables and Probability Distributions

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

ELEMENTS OF MONTE CARLO SIMULATION

The Normal Distribution

Probability distributions relevant to radiowave propagation modelling

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Chapter 2 Uncertainty Analysis and Sampling Techniques

Continuous random variables

4.3 Normal distribution

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Bivariate Birnbaum-Saunders Distribution

CHAPTERS 5 & 6: CONTINUOUS RANDOM VARIABLES

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2012 Prepared by: Thomas W. Engler, Ph.D., P.E

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Commonly Used Distributions

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Non-linearities in Simple Regression

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Normal Probability Distributions

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

Lean Six Sigma: Training/Certification Books and Resources

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E

Statistical Tables Compiled by Alan J. Terry

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Describing Uncertain Variables

Numerical Descriptions of Data

PROBABILITY DISTRIBUTIONS

Central Limit Theorem, Joint Distributions Spring 2018

The Normal Distribution. (Ch 4.3)

Monte Carlo Simulation (Random Number Generation)

IEOR E4703: Monte-Carlo Simulation

The Impact of Fading on the Outage Probability in Cognitive Radio Networks

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Section 7.1: Continuous Random Variables

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Populations and Samples Bios 662

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

Sampling Distribution

Overview. Transformation method Rejection method. Monte Carlo vs ordinary methods. 1 Random numbers. 2 Monte Carlo integration.

MAS3904/MAS8904 Stochastic Financial Modelling

Monte Carlo Methods for Uncertainty Quantification

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

STATISTICS and PROBABILITY

Lecture 3: Review of Probability, MATLAB, Histograms

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Mixed Logit or Random Parameter Logit Model

Maximum Likelihood Estimates for Alpha and Beta With Zero SAIDI Days

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Measure of Variation

EE266 Homework 5 Solutions

Chapter 7 1. Random Variables

ก ก ก ก ก ก ก. ก (Food Safety Risk Assessment Workshop) 1 : Fundamental ( ก ( NAC 2010)) 2 3 : Excel and Statistics Simulation Software\

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

3.1 Measures of Central Tendency

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Useful Probability Distributions

10. Monte Carlo Methods

Lecture Stat 302 Introduction to Probability - Slides 15

Unit 5: Sampling Distributions of Statistics

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Unit 5: Sampling Distributions of Statistics

Lecture 2. Probability Distributions Theophanis Tsandilas

Equivalence Tests for the Ratio of Two Means in a Higher- Order Cross-Over Design

2.1 Mathematical Basis: Risk-Neutral Pricing

TELECOMMUNICATIONS ENGINEERING

discussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

King s College London

2011 Pearson Education, Inc

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Pricing CDOs with the Fourier Transform Method. Chien-Han Tseng Department of Finance National Taiwan University

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Statistics 511 Supplemental Materials

TABLE OF CONTENTS - VOLUME 2

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

MFE/3F Questions Answer Key

5.3 Statistics and Their Distributions

Data Simulator. Chapter 920. Introduction

Market Volatility and Risk Proxies

Numerical Simulation of Stochastic Differential Equations: Lecture 1, Part 1. Overview of Lecture 1, Part 1: Background Mater.

ROM SIMULATION Exact Moment Simulation using Random Orthogonal Matrices

Financial Risk Forecasting Chapter 7 Simulation methods for VaR for options and bonds

induced by the Solvency II project

Chapter 6: Normal Probability Distributions

Transcription:

Probability Distributions Probability Distributions CEE 2L. Uncertainty, Design, and Optimization Department of Civil and Environmental Engineering Duke University Philip Scott Harvey, Henri P. Gavin and Jeffrey T. Scruggs Spring, 26 Consider a continuous, random variable (rv) with support over the domain. The probability density function (PDF) of is the function f () such that for any two numbers a and b in the domain, with a < b, P [a < b] = b a f () d For f () to be a proper distribution, it must satisfy the following two conditions:. The PDF f () is positive-valued; f () for all values of. 2. The rule of total probability holds; the total area under f () is ; f () d =. Alternately, may be described by its cumulative distribution function (CDF). The CDF of is the function F () that gives, for any specified number, the probability that the random variable is less than or equal to the number is written as P [ ]. For real values of, the CDF is defined by so, F () = P [ b] = b f () d, P [a < b] = F (b) F (a) By the first fundamental theorem of calculus, the functions f () and F () are related as f () = d d F ()

2 CEE 2L. Uncertainty, Design, and Optimization Duke University Spring 26 P.S.H., H.P.G. and J.T.S. A few important characteristics of CDF s of are:. CDF s, F (), are monotonic non-decreasing functions of. 2. For any number a, P [ > a] = P [ a] = F (a) 3. For any two numbers a and b with a < b, P [a < b] = F (b) F (a) = b a f ()d 2 Descriptors of random variables The epected or mean value of a continuous random variable with PDF f () is the centroid of the probability density. µ = E[] = f () d The epected value of an arbitrary function of, g(), with respect to the PDF f () is µ g() = E[g()] = g() f () d The variance of a continuous rv with PDF f () and mean µ gives a quantitative measure of how much spread or dispersion there is in the distribution of values. The variance is calculated as σ 2 = V[] = = = = = ( µ ) 2 f () d The standard deviation (s.d.) of is σ = V[]. The coefficient of variation (c.o.v.) of is defined as the ratio of the standard deviation σ to the mean µ : c = σ for non-zero mean. The c.o.v. is a normalized measure of dispersion (dimensionless). A mode of a probability density function, f (), is a value of such that the PDF is maimized; d d f () =. =mode The median value, m, is is the value of such that P [ m ] = P [ > m ] = F ( m ) = F ( m ) =.5. µ

Probability Distributions 3 3 Some common distributions A few commonly-used probability distributions are described at the end of this document: the uniform, triangular, eponential, normal, and log-normal distributions. For each of these distributions, this document provides figures and equations for the PDF and CDF, equations for the mean and variance, the names of Matlab functions to generate samples, and empirical distributions of such samples. 3. The Normal distribution The Normal (or Gaussian) distribution is perhaps the most commonly used distribution function. The notation N (µ, σ 2 ) denotes that is a normal random variable with mean µ and variance σ 2. The standard normal random variable, Z, or z-statistic, is distributed as N (, ). The probability density function of a standard normal random variable is so widely used it has its own special symbol, φ(z), φ(z) = ep z2 2π 2 Any normally distributed random variable can be defined in terms of the standard normal random variable, through the change of variables = µ + σ Z. If is normally distributed, it has the PDF µ f () = φ = σ 2πσ 2 ep ( ( µ ) 2 2σ 2 ) There is no closed-form equation for the CDF of a normal random variable. Solving the integral Φ(z) = 2π z e u2 /2 du would make you famous. Try it. The CDF of a normal random variable is epressed in terms of the error function, erf(z). If is normally distributed, P [ ] can be found from the standard normal CDF µ P [ ] = F () = Φ. Values for Φ(z) are tabulated and can be computed, e.g., the Matlab command... Prob le_ = normcdf(,mu,sig). The standard normal PDF is symmetric about z =, so φ( z) = φ(z), Φ( z) = Φ(z), and P [ > ] = F () = Φ (( µ )/σ ) = Φ ((µ )/σ ). The linear combination of two independent normal rv s and 2 (with means µ and µ 2 and variances σ 2 and σ2 2 ) is also normally distributed, a + b 2 N and more specifically, a b N ( aµ b, a 2 σ 2 ). σ ( ) aµ + bµ 2, a 2 σ 2 + b 2 σ2 2,

4 CEE 2L. Uncertainty, Design, and Optimization Duke University Spring 26 P.S.H., H.P.G. and J.T.S. Given the probability of a normal rv, i.e., given P [ ], the associated value of can be found from the inverse standard normal CDF, µ σ = z = Φ (P [ ]). Values of the inverse standard normal CDF are tabulated, and can be computed, e.g., the Matlab command... = norminv(prob le_,mu,sig). 3.2 The Log-Normal distribution The Normal distribution is symmetric and can be used to describe random variables that can take positive as well as negative values, regardless of the value of the mean and standard deviation. For many random quantities a negative value makes no sense (e.g., modulus of elasticity, air pressure, and distance). Using a distribution which admits only positive values for such quantities eliminates any possibility of non-sensical negative values. The log-normal distribution is such a distribution. If ln is normally distributed (i.e., ln N (µ ln, σ ln )) then is called a log-normal random variable. In other words, if Y (= ln ) is normally distributed, e Y (= ) is log-normally distributed. µ Y = µ ln, σ 2 Y = σ 2 ln, P [Y y] P [ln ln ] P [ ] = F Y (y) F ln (ln ) F () = Φ y µy ( σ Y ) Φ ln µln σ ln The mean and standard deviation of a log-normal variable are related to the mean and standard deviation of ln. µ ln = ln µ 2 σ2 ln σln 2 = ln ( + (σ /µ ) 2) If (σ /µ ) <.3, σ ln (σ /µ ) = c The median, m, is a useful parameter of log-normal rv s. By definition of the median value, half of the population lies above the median, and half lies below, so ln m µ ln Φ =.5 σ ln ln m µ ln σ ln = Φ (.5) = and, ln m = µ ln m = ep(µ ln ) µ = m + c 2 For the log-normal distribution mode < median < mean. If c <.5, median mean. If ln is normally distributed ( is log-normal) then (for c <.3) ln ln m P [ ] Φ If ln N (µ ln, σ 2 ln ), and ln Y N (µ ln Y, σ 2 ln Y ), and Z = an /Y m then c ln Z = ln a + n ln m ln Y N (µ ln Z, σ 2 ln Z) where µ ln Z = ln a + nµ ln mµ ln Y = ln a + n ln m m ln y m and σln 2 Z = (nσ ln ) 2 + (mσ ln Y ) 2 = n 2 ln( + c 2 ) + m2 ln( + c 2 Y ) = ln( + c2 Z )

Probability Distributions 5 Uniform U[a, b] Triangular T (a, b, c) = R, b > a = R, a c b /(b-a) 2/(b-a) p.d.f., f() p.d.f., f() a µ σ µ µ+σ b a µ σ c µ µ+σ µ+2σ b c.d.f., F() /2 c.d.f., F() a µ σ µ µ+σ b f() = { b a, [a, b], otherwise, a a F () = b a, [a, b], b a µ σ c µ µ+σ µ+2σ b f() = 2( a) (b a)(c a) 2(b ) (b a)(b c), [a, c], [c, b], otherwise, a ( a) 2 (b a)(c a), [a, c] F () = (b )2 (b a)(b c), [c, b], b µ = 2 (a + b) µ = 3 (a + b + c) σ 2 = 2 (b a)2 σ 2 = 8 (a2 + b 2 + c 2 ab ac bc) = a + (b-a).*rand(,n); = triangular rnd(a,b,c,,n); empirical p.d.f..3.25.2.5..5 2 3 4 5 6 empirical p.d.f..5.4.3.2. 2 3 4 5 6 empirical c.d.f..8.6.4.2 µ=3. σ=.2 empirical c.d.f..8.6.4.2 µ=2.7 σ=.8 2 3 4 5 6 2 3 4 5 6

6 CEE 2L. Uncertainty, Design, and Optimization Duke University Spring 26 P.S.H., H.P.G. and J.T.S. Normal N (µ, σ 2 ) Log-Normal ln N (µ ln, σln 2 ) = R, µ R, σ > = R +, µ ln R +, σ ln > p.d.f., f() p.d.f., f() µ 2σ µ σ µ µ+σ µ+2σ µ σ µ µ+σ µ+2σ.977.84 c.d.f., F().5 c.d.f., F().59.23 µ 2σ µ σ µ µ+σ µ+2σ f() = ep ( µ)2 2πσ 2 2σ 2 F () = 2 [ ] + erf µ 2σ 2 µ σ µ µ+σ µ+2σ f() = ep 2πσln 2 F () = 2 [ + erf ( (ln µ ln ) 2 2σ 2 ln ] ln µ ln 2σ 2 ln ) µ = µ µ = m + c 2 σ 2 = σ2 σ 2 = 2 m c 2 + c 2 = mu + sigma*randn(,n); = logn rnd(m,c,,n); empirical p.d.f..45.4.35.3.25.2.5..5-4 -3-2 - 2 3 4 empirical p.d.f..7.6.5.4.3.2. 2 3 4 5 empirical c.d.f..8.6.4.2 µ=. σ=. empirical c.d.f..8.6.4.2 µ=2. σ=. -4-3 -2-2 3 4 2 3 4 5

Probability Distributions 7 Eponential E(µ) Rayleigh R(m) Laplace L(µ, σ 2 ) = R +, µ > = R +, m > = R, µ R, σ > p.d.f., f() /(e µ) p.d.f., f() p.d.f., f() µ 2µ 3µ µ σ m µ µ+σ µ+2σ µ 2σ µ σ µ µ+σ µ+2σ c.d.f., F() -/e µ 2µ 3µ f() = µ ep( /µ) F () = ep( /µ) c.d.f., F().8.6.4.2 µ σ m µ µ+σ µ+2σ f() = m 2 ep c.d.f., F().97.878.5.23.3 ( 2 (/m)2) f() = ( F () = ep 2 (/m)2) F () = µ 2σ µ σ µ µ+σ µ+2σ ( 2 2σ ep ) 2 µ σ ) ( 2 ep 2 µ ( σ 2 ep ) 2 µ σ < µ µ µ = µ µ = m π/2 µ = µ σ 2 = µ2 σ 2 = m2 (4 π)/2 σ 2 = σ2 = ep rnd(mu,,n); = rayleigh rnd(mode,,n); = laplace rnd(mu,sigma,,n); empirical p.d.f..8.6.4.2.5.5 2 2.5 3 3.5 4 empirical p.d.f..7.6.5.4.3.2..5.5 2 2.5 3 3.5 4 empirical p.d.f..5.4.3.2. -4-3 -2-2 3 4 empirical c.d.f..8.6.4.2 µ=..5.5 2 2.5 3 3.5 4 empirical c.d.f..8.6.4.2 m=..5.5 2 2.5 3 3.5 4 empirical c.d.f..8.6.4.2 µ=. σ=.4-4 -3-2 - 2 3 4

8 CEE 2L. Uncertainty, Design, and Optimization Duke University Spring 26 P.S.H., H.P.G. and J.T.S. 4 Sums and Differences of Independent Normal Random Variables Consider two normally-distributed random variables, N (µ, σ 2 ) and Y N (µ Y, σy 2 ). Any weighted sum of normal random variables is also normally-distributed. Z = a by Z N (aµ bµ Y, (aσ ) 2 + (bσ Y ) 2) µ Z = aµ bµ Y σ 2 Z = (aσ ) 2 + (bσ Y ) 2 5 Products and Quotients of Independent LogNormal Random Variables Consider two log-normally-distributed random variables, ln N (µ ln, σ 2 ln ) and ln Y N (µ ln Y, σ 2 ln Y ). Any product or quotient of lognormal random variables is also lognormally-distributed. ln Z N Z = /Y ( ) µ ln µ ln Y, σln 2 + σln 2 Y µ ln Z = µ ln µ ln Y σ 2 ln Z = σ 2 ln + σ 2 ln Y c 2 Z = c 2 + c 2 Y + c 2 c 2 Y 6 Eamples. The strength, S, of a particular grade of steel is log-normally distributed with median 36 ksi and c.o.v. of.5. What is the probability that the strength of a particular sample is greater than 4 ksi? P [S > 4] = P [S 4] = ln 4 ln 36 3.69 3.58 Φ = Φ.5.5 = Φ(.72) =.759 =.24.8.7.6 s mode = 35.2 ksi s median = 36. ksi s mean = 36.4 ksi.5 p.d.f..4.3.2. P[S>4] 2 25 3 35 4 45 5 55 6 strength of steel, s, ksi

Probability Distributions 9 2. Highway truck weights in Michigan, W, are assumed to be normally distributed with mean k and standard deviation 4 k. The load capacity of bridges in Michigan, R, are also assumed to be normally distributed with mean 2 k and standard devation 3 k. What is the probability of a truck eceeding a bridge load rating? E = W R. If E > the truck weight eceededs the bridge capacity. µ E = µ W µ R = 2 = k. σ E = 4 2 + 3 2 = 5 k. ( ) P [E > ] = P [E ] = Φ = Φ(2) =.977 =.23 5 E=W R 5 W 4 R 3 2 k 3. Windows in the Cape Hattaras Lighthouse can withstand wind pressures of R. R is lognormal with median of 4 psf and coefficient of variation of.25. The peak wind pressure during a hurricane P in psf is given by the equation P =.65 3 CV 2 where C is a log-normal coefficient with median of.8 and coefficient of variation of.2 and V is the wind speed with median fps and coefficient of variation of.3. What is the probability of the wind pressure eceeding the strength of the window? The peak wind pressure is also log-normal. ln P = ln(.65 3 ) + ln C + 2 ln V µ ln P = ln(.65 3 ) + µ ln C + 2µ ln V µ ln P = ln(.65 3 ) + ln(.8) + 2 ln() = 3.43 σ 2 ln P = ln( +.2 2 ) + 2 ln( +.3 2 ) =.26... σ ln P =.46 The wind pressure eceeds the resistance if P/R > (that is, if ln P ln R > ) ln E = ln P ln R µ ln E = µ ln P µ ln R = 3.43 ln(4) =.646 σ 2 ln E =.26 + ln( +.25 2 ) =.2722... σ ln E =.527 The probability of the wind load load eceeding the resistance of the glass is, +.646 P [E > ] = P [E ] = P [ln E ] = Φ = Φ(.2383) =..527 4. Earthquakes with M > 6 earthquake shake the ground at a building site randomly. The peak ground acceleration (PGA) is log-normally distributed with median of.2 g and a coefficient of variation of.25. Assume that the building will sustain no damage for ground motion shaking up to.3 g. What is the probability of damage from an earthquake of M > 6? ln(.3) ln(.2) P [D M > 6] = P [P GA >.3] = P [P GA.3] = Φ =.947 =.53..25

CEE 2L. Uncertainty, Design, and Optimization Duke University Spring 26 P.S.H., H.P.G. and J.T.S. There have been two earthquakes with M > 6 in the last 5 years. What is the probability of no damage from earthquakes with M > 6 in the net 2 years? From the law of total probability, P [D M > 6] =.53 =.947 P [D ] in 2 yr = P [D EQ M > 6 in 2yr] P [ EQ M > 6 in 2yr] + P [D EQ M > 6 in 2yr] P [ EQ M > 6 in 2yr] + P [D 2 EQ M > 6 in 2yr] P [2 EQ M > 6 in 2yr] + P [D 3 EQ M > 6 in 2yr] P [3 EQ M > 6 in 2yr] + where P [D n EQ M > 6] = (P [D EQ M > 6]) n (assuming damage from an earthquake does not weaken the building... ) So, P [D ] in 2 yr = (.947) n (2/25)n ep( 2/25) n! n= [ = ep(.8) +.947.8 + (.947) 2.82! 2! = ep(.8) ep(.947.8) =.958 + (.947) 3.83 3! + ] The probability of damage from earthquakes in the net 2 years (given the assumptions in this eample) is close to 4%. Would that be an acceptable level of risk for you? 7 Empirical PDFs, CDFs, and eceedence rates (nonparametric statistics) The PDF and CDF of a sample of random data can be computed directly from the sample, without assuming any particular probability distribution... (such as a normal, eponential, or other kind of distribution). A random sample of N data points can be sorted into increasing numerical order, so that 2 i i i+ N N. In the sorted sample there are i data points less than or equal to i. So, if the sample is representative of the population, and the sample is big enough the probability that a random is less than or equal to the i th sorted value is i/n. In other words, P [ i ] = i/n. Unless we know that no value of can eceed N, we must accept some probability that > N. So, P [ N ] should be less than and in such cases we can write P [ i ] = (i /2)/N. The empirical CDF computed from a sorted sample of N values is ˆF ( i ) = i N... or... ˆF ( i ) = i /2 N The empirical PDF is basically a histogram of the data. The following Matlab lines plot empirical CDFs and PDFs from a vector of random data,.

Probability Distributions N = length(); % number o f v a l u e s in t h e sample 2 nbins = floor (N /5); % number o f b i n s in t h e histogram 3 [ f, ] = hist (, nbins ); % compute t h e histogram 4 f = f / N * nbins /(ma()-min)); % s c a l e t h e histogram to a PDF 5 F_ = ([: N ] -.5)/ N; % e m p i r i c a l CDF 6 subplot (2); bar(,f ); % p l o t e m p i r i c a l PDF 7 subplot (22); plot ( sort (), F_ ); % p l o t e m p i r i c a l CDF 8 probability_of_failure = sum( >) / N % p r o b a b i l i t y t h a t > The number of values in the sample greater than i is (N i). If the sample is representative, the probability of a value eceeding i is Prob[ > i ] = F ( i ) i/n. If the N samples were collected over a period of time T, the average eceedence rate (number of events greater than i per unit time) is ν( i ) = N( F ( i ))/T N( i/n)/t = (N i)/t. 8 Random variable generation using the Inverse CDF method A sample of a random variable having virtually any type of CDF, P [ ] = P = F () can be generated from a sample of a uniformly-distributed random variable, U, ( < U < ), as long as the inverse CDF, = F (P ) can be computed. There are many numerical methods for generating a sample of uniformly-distributed random numbers. It is important to be aware that samples from some methods are more random than samples from others. The Matlab command u = rand(,n) computes a (row) vector sample of N uniformly-distributed random numbers with < u <. If is a continuous rv with CDF F () and U has a uniform distribution on (, ), then the random variable F (U) has the distribution F. Thus, in order to generate a sample of data distributed according to the CDF F, it suffices to generate a sample, u, of the rv U U[, ] and then make the transformation = F (u). For eample, if is eponentially-distributed, the CDF of is given by F () = e /µ, so F (u) = µ ln( F ()). Therefore if u is a value from a uniformly-distributed rv in [, ], then = µ ln(u) is a value from an eponentially distributed random variable. (If U is uniformly distributed in [,] then so is U.) As another eample, if is log-normally distributed, the CDF of is ln ln m F () = Φ. σ ln If u is a sample from a standard uniform distribution, then ] = ep [ln m + Φ (u)σ ln is a sample from a lognormal distribution. Note that since epressions for Φ(z) and Φ (P ) do not eist, the generation of normally-distributed random variables requires other numerical methods. = mu + sig*randn(,n) computes a (row) vector sample of N normally-distributed random numbers.

2 CEE 2L. Uncertainty, Design, and Optimization Duke University Spring 26 P.S.H., H.P.G. and J.T.S. cdf F() cdf F() f (u) U u u = F () = F (u) f (u) U u u = F () = F (u) =a =b =a =b f () = df d f () = df d =a =b =a =b Figure. Eamples of the generation of uniform random variables from the inverse CDF method. f (u) U u cdf u = F () = F (u) F() f (u) U u cdf u = F () F() = F (u) f () = df d f () = df d Figure 2. Eamples of the generation of random variables from the inverse CDF method. The density of the horizontal arrows u is uniform, whereas the density of the vertical arrows, = F (u), is proportional to F (), that is, proportional to f ().

Probability Distributions 3 9 Functions of Random Variables and Monte Carlo Simulation The probability distributions of virtually any function of random variables can be computed using the powerful method of Monte Carlo Simulation (MCS). MCS involves computing values of functions with large samples of random variables. For eample, consider a function of three random variables,, 2, and 3, where is normally distributed with mean of 6 and standard deviation of 2, 2 is log-normally distributed with median of 2 and coefficient of variation of.3, and 3 is Rayleigh distributed with mode of. The function Y = sin( ) + 2 ep( 3 ) 2 is a function of these three random variables and is therefore also random. The distribution function and statistics of Y may be difficult to derive analytically, especially if the function Y = g() is complicated. This is where MCS is powerful. Given samples of N values of, 2 and 3, a sample of N values of Y can also be computed. The statistics of Y (mean, variance, PDF, and CDF) can be estimated by computing the average value, sample variance, histogram, and emperical CDF of the sample of Y. The probability P [Y > ] can be estimated by counting the number of positive values in the sample and dividing by N. The Matlab command P_Y_gt_ = sum(y>)/n may be used to estimate this probability. Monte Carlo Simulation in Matlab % MCS intro.m 2 % Monte Carlo Simulation... an i n t r o d u c t o r y eample 3 % 4 % Y = g (, 2,3) = s i n () + s q r t (2) ep( 3) 2 ; 5 % 6 % H. P. Gavin, Dept. C i v i l and Environmental Engineering, Duke Univ, Jan. 22 7 8 % 2 3 9 % normal lognormal Rayleigh mu = 6; med2 = 2; mod3 = 2; sd = 2; cv2 =.3; 2 3 N_MCS = ; % number o f random v a l u e s in t h e sample 4 5 % () g e n e r a t e a l a r g e sample f o r each random v a r i a b l e in t h e problem... 6 7 = mu + sd *randn(, N_MCS ); 8 2 = logn_rnd (med2,cv2,, N_MCS ); 9 3 = Rayleigh_rnd ( mod3,, N_MCS ); 2 2 % (2) e v a l u a t e t h e f u n c t i o n f o r each random v a r i a b l e to compute a new sample 22 23 Y = sin + sqrt ( 2) - ep(- 3) - 2; 24 25 % suppose p r o b a b i l i t y o f f a i l u r e i s Prob [ g (, 2,3) > ]... 26 27 Probability_of_failure = sum(y >) / N_MCS 28 29 % (3) p l o t histograms o f t h e random v a r i a b l e s 3 3 sort_ = sort ( ); 32 sort_2 = sort ( 2 ); 33 sort_3 = sort ( 3 ); 34 CDF = ([: N_MCS ] -.5) / N_MCS ; % e m p i r i c a l CDF o f a l l q u a n t i t i e s

4 CEE 2L. Uncertainty, Design, and Optimization Duke University Spring 26 P.S.H., H.P.G. and J.T.S..25.4.2.8.35.3 P.D.F..5..6.4.25.2.5.5.2..5 2 4 6 8 24 2 3 4 5 6 2 3 4 5 6 7 8.8.8.8 C.D.F..6.4.6.4.6.4.2.2.2 2 4 6 8 24 2 3 4 5 6 2 3 4 5 6 7 8 : normal 2 : log-normal 3 : Rayleigh P.D.F..8.7.6.5.4.3.2. P F -2.5-2 -.5 - -.5.5.8 C.D.F..6.4 Y>.2-2.5-2 -.5 - -.5.5 Y = g(, 2, 3 ) Figure 3. Analytical and empirical PDF s and CDF s for, 2, and 3, and the Empirical PDF and CDF for Y = g(, 2, 3 )

Probability Distributions 5 nbins = floor ( N_MCS /2); 2 figure () 3 c l f 4 subplot (23) 5 [ f, ] = hist (, nbins ); % histogram o f 6 f = f / N_MCS * nbins /(ma-min( )); % s c a l e histogram to PDF 7 hold on 8 plot ( sort_, norm ( sort_,mu, sd ), -k, LineWidth,3); 9 s t a i r s (,f, LineWidth,) hold off ylabel ( P.D.F. ) 2 subplot (234) 3 hold on 4 plot ( sort_, normcdf ( sort_,mu, sd ), -k, LineWidth,3); 5 s t a i r s ( sort_,cdf, LineWidth,) 6 hold off 7 ylabel ( C.D.F. ) 8 label ( _ : normal ) 9 subplot (232) 2 [ f, ] = hist ( 2, nbins ); % histogram o f 2 2 f = f / N_MCS * nbins /(ma( 2)-min( 2 )); % s c a l e histogram to PDF 22 hold on 23 plot ( sort_2, logn_ ( sort_2,med2, cv2 ), -k, LineWidth,3); 24 s t a i r s (,f, LineWidth,) 25 hold off 26 subplot (235) 27 hold on 28 plot ( sort_2, logn_cdf ( sort_2,[ med2, cv2 ]), -k, LineWidth,3); 29 s t a i r s ( sort_2,cdf, LineWidth,) 3 hold off 3 label ( _2 : log - normal ) 32 subplot (233) 33 [ f, ] = hist ( 3, nbins ); % histogram o f 3 34 f = f / N_MCS * nbins /(ma( 3)-min( 3 )); % s c a l e histogram to PDF 35 hold on 36 plot ( sort_3, Rayleigh_ ( sort_3, mod3 ), -k, LineWidth,3); 37 s t a i r s (,f, LineWidth,) 38 hold off 39 subplot (236) 4 hold on 4 plot ( sort_3, Rayleigh_cdf ( sort_3, mod3 ), -k, LineWidth,2); 42 s t a i r s ( sort_3,cdf, LineWidth,) 43 hold off 44 label ( _3 : Rayleigh ) 45 figure (2) 46 c l f 47 subplot (2) 48 [ f, ] = hist ( Y, nbins ); % histogram o f Y 49 f = f / N_MCS * nbins /(ma( Y)-min( Y )); % s c a l e histogram to PDF 5 hold on 5 s t a i r s (,f, LineWidth,2) 52 plot ([ ],[.5], -k ); 53 hold off 54 tet (.,., P_F ) 55 ylabel ( P.D.F. ) 56 subplot (22) 57 hold on 58 s t a i r s ( sort (Y),CDF, LineWidth,2) 59 plot ([ ],[ ], -k ); 6 hold off 6 tet (.5,.5, Y > ) 62 ylabel ( C.D.F. ) 63 label ( Y = g(_,_2, _3 ) );