Probability and Sampling Distributions Random variables Section 4.3 (Continued)
The mean of a random variable The mean (or expected value) of a random variable, X, is an idealization of the mean,, of quantitative data recorded after many repetitions of a chance happening. Denote by µ X, or just µ If X is discrete, with S = { x 1,, x k } and P(x i ) = p i, then µ X = x i p i If X is continuous, µ X is the point at which its corresponding density curve would balance
Example: # girls among three children A couple wants to have three children. X = # girls. x i 0 1 3 p i 1/8 3/8 3/8 1/8 µ X = x i p i = (0)(1/8) + (1)(3/8) + ()(3/8) + (3)(1/8) = 1.5 Note X cannot take the value µ X = 1.5.
Random variables from other random variables New random variables can be defined from other random variables. Examples: Test scores: X = verbal score, Y = quantitative score Z = X + Y = total score X = temperature, in C Y = temperature, in F = 9/5 X + 3 Y = (X µ X ) = squared-deviation of X from its mean
The standard deviation of a random variable The variance, σ X,of a random variable, X, is the mean of Y = (X µ X ) σ X is an idealization of the variance, s, of quantitative data recorded after many repetitions of a chance-happening The standard deviation of a random variable, X, is σ X = σ X σ X is an idealization of the standard deviation, s, of quantitative data recorded after many repetitions of a chance-happening
Example: # girls among three children A couple wants to have three children. X = # girls. x i 0 1 3 p i 1/8 3/8 3/8 1/8 µ X = x i p i = (0)(1/8) + (1)(3/8) + ()(3/8) + (3)(1/8) = 1.5 σ X = (x i µ X ) p i = (-1.5) (1/8) + (-0.5) (3/8) + (0.5) (3/8) + (1.5) (1/8) = 0.75 σ X = σ X = 0.75 = 0.87
Relationships between random variables The correlation, ρ XY, between random variables X and Y is the mean of Z = {(X µ X )/ σ X } {(Y µ Y )/ σ Y } ρ XY is an idealization of the correlation, r, of two-variable quantitative data recorded after many repetitions of a chance-happening -1 ρ XY 1 Random variables X and Y are independent if for no event defined through X alone (e.g., A = { a X b }) does knowledge of the occurrence of that event change the probability distribution of Y If X and Y are independent then ρ XY = 0
Some rules Suppose X and Y are random variables: µ a+bx = a + b µ X and σ a+bx = b σ X The correlation between X and Y is the same as that between a + b X and c + d Y Addition rule for means: µ X+Y = µ X + µ Y
Some rules (continued) Addition rule for variances of independent random variables: If X and Y are independent then and σ X+Y = σ X + σ Y σ X Y = σ X + σ Y General addition rule for variances: and σ X+Y = σ X + σ Y + ρ XY σ X σ Y σ X Y = σ X + σ Y ρ XY σ X σ Y
Example: Investments Setup: rates of returns X = T-bills Y = A certain index fund Z = my portfolio = 0. X + 0.8 Y The 1950-003 history: µ X = 5.0%, σ X =.9% µ Y = 13.%, σ Y = 17.6% ρ XY = -0.11
Example: Investments (continued) Summary of my portfolio: Z = 0. X + 0.8 Y µ Z = 0. µ X + 0.8 µ Y = (0.)(5.0) + (0.8)(13.) = 11.56% σ 0.X = (0.) σ X = (0.) (.9) = 0.34 σ 0.8Y = (0.8) σ Y = (0.8) (17.6) = 198.5 σ Z = σ 0.X + σ 0.8Y + ρ XY σ 0.X σ 0.8Y = 0.34 + 198.5 + (-0.11) ( 0.34) ( 198.5) = 196.79 σ Z = σ Z = 196.79 = 14.03%
Probability and Sampling Distributions The Sampling Distribution of a Sample Mean Section 4.4 009 W.H. Freeman and Company
Long-run averages The law of large numbers states that a cumulative average of numbers drawn from a population with mean µ will stabilize at µ in the long run.
Statistics as random variables A statistic is a random variable. Its probability distribution is an idealization of its sampling distribution. Important example: the mean,, of a sample of size n drawn by SRS from a population with mean µ The mean of is The sample mean is unbiased for the population mean The standard deviation of The sample mean is less variable than an individual measurement The sample mean is less variable when n is larger is
Role of Normal distributions If the population distribution is N(µ, σ), then the sampling distribution of the sample mean is N(µ, σ/ n)
The central limit theorem The central limit theorem states that, whatever the shape of the population distribution, when n is large the sampling distribution of the sample mean is approximately N(µ, σ/ n) Larger n is required when the population distribution is less like a Normal distribution
Example: Bottling operation Fill-level of bottling machine has µ = 98 and σ = 3 milliliters What is the probability that the fill-level of some randomly selected bottle is less than 95ml? I don t know
Example: Bottling operation (continued) Fill-level of bottling machine has µ = 98 and σ = 3 milliliters What is the probability that the average fill-level in some randomly selected six-pack is less than 95ml?
Example: Bottling operation (continued) Fill-level of bottling machine is N(µ = 98, σ = 3) milliliters What is the probability that the fill-level of some randomly selected bottle is less than 95ml?
Example: Bottling operation (continued) Fill-level of bottling machine is N(µ = 98, σ = 3) milliliters What is the probability that the average fill-level in some randomly selected six-pack is less than 95ml?
Probability Theory General Probability Rules Section 5.1
Probability rules 0 P(A) 1 P(S) = 1 Venn diagrams Complement rule: P(A c ) = 1 P(A) Addition rule for disjoint events: If A and B are disjoint then P(A or B) = P(A) + P(B) General addition rule: For any events A and B, P(A or B) = P(A) + P(B) P(A and B)
Example: Pick a card Pick a card from a standard 5-card deck S = { A,,, 10, J, Q, K, A,,, 10, J, Q, K, A,,, 10, J, Q, K, A,,, 10, J, Q, K } Each card is equally likely P( Ace or ) = P( Ace ) + P( ) P( Ace and ) = 4/5 + 13/5 1/5 = 16/5 0.31
Independent events Events A and B are independent if knowledge of the occurrence of one event does not change the probability of occurrence of the other event Disjoint events are not independent (if one occurs the other cannot) Independence is not easily visualized in a Venn diagram Multiplication rule for independent events: If A and B are independent then P(A and B) = P(A)P(B)
Example: Birth genders Births are independent: P(B) = 1/, P(G) = 1/ A couple wants to have two children P(GG) = P(G)P(G) = 1/4 P(GB) = P(G)P(B) = 1/4 P(BG) = P(B)P(G) = 1/4 P(BB) = P(B)P(B) = 1/4 A couple wants to have three children P(GGG) = P(G)P(G)P(G) = 1/8 P(GGB) = P(G)P(G)P(B) = 1/8, etc.
Example: Reliability Transatlantic cable has 66 repeaters: Per repeater, P(Fail) = 0.001, P(Function) = 0.999 Repeaters fail independently of one another P(All function) = P(1 st funct.) P( nd funct.) P(661 st funct.) P(66 nd funct.) = 0.999 66 0.5