continuous rv Let X be a continuous rv. Then a probability distribution or probability density function (pdf) of X is a function f(x) such that for any two numbers a and b with a b, P(a X b) = b a f (x)dx. Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c c f (x)dx = 0, hence P(a X b) = P(a X < b) = P(a < X b) = P(a < X < b).
example: The waiting time X (in minutes) for bus route 4 has the pdf given by { 1 f (x) = 10, 0 x 10 0, otherwise Then the probability of waiting between 2 and 5 minutes is P(2 X 5) = 5 1 2 10 dx = x 10 x=5 x=2 = 0.3 A continuous rv is said to have a uniform distribution on the interval [A, B] if the pdf of X is f (x; A, B) = { 1 B A, A x B 0, otherwise.
The cdf F(x) of a continuous rv X is defined for any number x by F (x) = P(X x) = x f (y)dy. use cdf to compute probabilities: P(X < a) = F (a), P(X > a) = 1 F (a), P(a X b) = F (b) F (a) Obtain f(x) from F(x): proposition: If X is a continuous rv with pdf f(x) and cdf F(x), then at every x at which the derivative F (x) exists, F (x) = f (x).
example For X with a uniform density f (x) = 1 F (x) = x f (y)dy = x 1 A B A dy = Hence F (x) = Obtain f (x) from F (x): F (x) = d dx ( x A B A ) = 1 f (x) = 0, otherwise. B A B A, A x B, we have y B A y=x y=a = x A B A, A x B. 0, x < A x A B A, A x B 1, x B = f (x), fora < x < B, and
exercise Suppose a continuous rv X has pdf f (x) = 2x, 0 x 1. Find F (x), P(X 0.8), P(0.5 X 1).
percentiles Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous rv, denoted by η(p), is defined by p = F [η(p)] = η(p) f (y)dy. The median of a continuous distribution, denoted by µ, is the 50th percentile.
expected value The expected value or the mean of a continuous rv X with pdf f (x) is µ x = E(X ) = xf (x)dx. Expected value of E(h(X )) : E(h(X )) = h(x)f (x)dx. The variance of a continuous rv X with pdf f(x) and mean µ is σx 2 = V (X ) = E(X µ)2 = (x µ)2 f (x)dx = E(X 2 ) µ 2 = x 2 f (x)dx µ 2. The standard deviation of X is σ x = V (X ).
example The pdf of weekly gravel sales X was { 3 f (x) = 2 (1 x 2 ), 0 x 1 0, otherwise. E(X ) = xf (x)dx = 1 0 x 3 2 (1 x 2 )dx = 3 2 3 2 ( x2 2 x4 4 ) x=1 x=0 = 3 8. 1 0 (x x 3 )dx = E(X 2 ) = x 2 f (x)dx = 1 0 x 2 3 2 (1 x 2 )dx = 1 0 3 2 (x 2 x 4 )dx = 1 5. so V (X ) = 1 5 ( 3 8 )2 = 0.059 and σ X = 0.059 = 0.244.
The normal distribution A continuous rv is said to have a normal distribution with parameters µ and σ where < µ < and σ > 0, if the pdf of X is f (x; µ, σ) = 1 2πσ e (x µ)2 /2σ 2, < x <. Often abbreviated as X N(µ, σ 2 ). X has a standard normal distribution if µ = 0 and σ = 1, often denoted Z N(0, 1). f (z) = 1 2π e z2 /2. its cdf is Φ(z) = P(Z z) = z f (t)dt
Find probabilities How to find P(a X b) if X N(µ, σ 2 ). First, find P(a Z b). examples: P(Z 1.25) = Φ(1.25) = 0.8944. P(Z > 1.25) = 1 P(z 1.25) = 1 0.8944 = 0.1056. P( 0.38 Z 1.25) = Φ(1.25) Φ( 0.38) = 0.8944 0.3520 = 0.5424. percentiles of the standard normal distribution: 99th percentile, z=2.33 95th percentile, z=1.64 or 1.65 or 1.645. 90th percentile, z=1.28 z α notation: z 0.01 = 2.33, z 0.05 = 1.645, z 0.025 = 1.96. α : right tail probability
Nonstandard normal distribution X N(µ, σ 2 ). Z = X µ σ. P(a X b) = P( a µ σ Z b µ σ ). e.g. adult female heights in North America have approximately a normal distribution with a µ = 65 inches and σ = 3.5 inches. The probability that X falls below 70 inches: P(X < 70) = P(Z 70 65 3.5 ) = P(Z < 1.43) = 0.9236. The probability that X falls between 60 and 70 inches is 0.9236-0.0764=0.8472. i.e., about 85% of the heights are between 60 and 70 inches. P(60 < X < 70) = P( 60 65 3.5 Z 70 65 3.6 ) = P( 1.43 < Z < 1.43) = P(Z < 1.43) P(Z < 1.43) = 0.9236 0.0764 = 0.8472.
exercise Suppose the test scores follow a normal distribution with µ = 82 and σ = 4. Find the probabilities that a randomly selected test score a). falls below 88, b). falls below 75, c). falls between 75 and 88.
answer: P(X < 88) = P(Z < 88 82 4 ) = P(Z < 1.50) = 0.9332, P(X < 75) = P(Z < 1.75) = 0.0401. P(75 < X < 88) = 0.9332 0.0401 = 0.8931.
empirical rule If X N(µ, σ 2 ), then P(X is within one standard deviation of its mean) = P(µ σ X µ + σ) = P( µ σ µ σ Z µ+σ µ σ ) = P( 1.00 Z 1.00) = Φ(1.00) Φ( 1.00) = 0.6826, similarly, we can find P(X is within 2 standard deviations of its mean) = 0.9544 P(X is within 3 standard deviations of its mean) = 0.9974 Empirical rule: If the population distribution of a random variable is (approximately) normal, then 1. Roughly 68% of the values are within 1 SD of its mean. 2. Roughly 95% of the values are within 2 SDs of its mean. 3. Roughly 99.7% of the values are within 3 SDs of its mean.
example Adult female heights in North America have approximately a normal distribution with a µ = 65 inches and σ = 3.5 inches, then About 68% of the heights fall between [65 1 3.5, 65 + 1 3.5] = [61.5, 68.5] inches. About 95% of the heights fall between [65 2 3.5, 65 + 2 3.5] = [58, 72] inches. And almost all the heights fall between [65 3 3.5, 65 + 3 3.5] = [54.5, 75.5] inches.
percentiles of normal distribution the 2.5th percentile of Z is z=-1.96, i.e., 2.5% of the values are below z = 1.96. Example continued: find a height such that 2.5% of the heights are below this values, i.e., find the 2.5th percentile. x = µ + zσ = 65 1.96 3.5 = 58.1 inches. Final exam scores have approximately normal distribution with mean 76 and standard deviation 8. Find the 75th percentile of test scores, i.e., 75% of the test scores are below this value.
note P(Z < 0.67) = 0.75, so x = µ + zσ = 76 + 0.67 8 = 81.36.
Approximate the binomial distribution X binomial(n, p), if np 10, n(1 p) 10, then approximately X N(np, np(1 p)). So P(X a) P(Z a np ) np(1 p) better approximated by P(Z a+0.5 np ). np(1 p) Example: X binomial(50, 0.25). P(X 10) P(Z 10 + 0.5 50 0.25 50 0.25 0.75 ) = P(Z 0.65) = 0.2578. The exact probability is 0.2622.
The exponential distribution X is said to have an exponential distribution with parameter (λ) > 0 if the pdf of X is { λe f (x; λ) = λx, x 0 0, otherwise E(X ) = 1 λ, V (X ) = 1, F (x) = 1 e λx, x 0. λ 2 example: The response time X (in seconds) at a computer terminal has an exponential distribution with λ = 0.2. The probability that the response time is at most 10 seconds is P(X 10) = F (10) = 1 e 0.2 10 = 0.865. The probability that the response time is between 5 and 10 sec is P(5 X 10) = 10 5 0.2e 0.2x dx = e 0.2x x=10 x=5 = 0.233.
gamma distribution A continuous rv X is said to have a gamma distribution if its pdf is given by f (x; α, β) = 1 β α Γ(α) x α 1 e x/β, x 0, α > 0, β > 0. Gamma function: For α > 0, Γ(α) = 0 x α 1 e x dx. Note for α > 1, Γ(α) = (α 1)Γ(α 1). For any positive integer, Γ(n) = (n 1)! E(X ) = αβ, V (X ) = αβ 2.
the chi-squared distribution The chi-squared distribution is a gamma distribution with α = ν 2 and β = 2. 1 f (x; ν) = ν 2 ν/2 Γ( ν )x 2 1 e x/2, x 0. 2 ν : degrees of freedom. X χ 2 (ν).
1. Suppose the proportion of gas sold at a gas station in a week, X, has pdf given by f (x) = 6x(1 x), 0 x 1. a. Find P(X 0.5), the probability that it sells at least half of its gas in a week. b. Find E(X 2 ). 2. If adult female heights are normally distributed, find the probability that the height of a randomly selected women is within 0.67 SDs of the mean. 3. It is known that 30% of vehicles on I 81 are trucks. If you take a random sample of 50 vehicles, use the normal approximation to find the probability that at most 15 of them are trucks.
Solutions 1. a. P(X > 0.5) = 1 0.5 6x(1 x)dx = 1 0.5 (6x 6x 2 )dx = (3x 2 2x 3 ) 1 0.5 = 0.5. b. E(X 2 ) = 1 0 x 2 6x(1 x)dx = ( 6x4 4 6x5 5 ) 1 0 = 0.3. 2. P(µ 0.67σ X µ + 0.67σ) = P( 0.67 Z 0.67) = 0.7486 0.2514 = 0.4972 0.50. 3. P(X 15) = P(Z 15+0.5 50 0.3 50 0.3 0.7 ) = P(Z < 0.1543) = 0.56. The exact binomial probability is 0.569. R code to get the cumulative binomial probability P(X 15): > pbinom(15,50,0.3)