Chapter 6: Random Variables and Distributions These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Pec, published by CENGAGE Learning, 2015. Random variables A random variable is a function from the sample space, S, of an experiment to the real numbers, X : S R, so we might characterize a random variable as a function which assigns a numerical value to an outcome of an experiment. Random variables can be defined on discrete and on continuous sample spaces. discrete: flip a coin, flip three coins, roll a die, roll two dice, roulette wheel, spinner continuous: random number generators: U[0, 1], N(0, 1), N(µ, σ) Expected value of a random variable, E(X) = µ X Variance of a random variable, V ar(x) = σ 2 X Linear combinations of random variables, Y = ax 1 + bx 2 Expected value and variance of a linear combination of random variables. If Y = ax 1 + bx 2, then E(Y ) = ae(x 1 ) + be(x 2 ), and V ar(y ) = a 2 V ar(x 1 ) + b 2 E(X 2 ). Distributions of random variables Calculation of probability using a continuous distribution, P (X x). The area of the blue region in the following figure is the probability that the random variable X N(µ, σ) taes on a value less than or equal to 5. That probability is denoted P (X 5). X ~ N(µ, σ) y -3-1 1 3 5 7 9 x Spring 2016 Page 1 of 6
Normal distributions Normal random variable, X N(µ, σ). z-score, z = (x µ)/σ. If z = (x µ)/σ, then x = µ + z σ. Standardized normal random variable, Z N(0, 1). If X N(µ, σ) and Z = (X µ)/σ, then Z N(0, 1). This is why our textboo need only contain a table of values for the standard normal distribution. Areas of regions under a normal distribution curve. Percentiles. The 68-95-99.7% rule. Q-Q plots. Calculations with X N(0, 1) Suppose that the random variable X has a standard normal distribution, X N(0, 1). X ~ N(0, 1) 0.0 0.1 0.2 0.3 0.4-3 -2-1 0 1 2 3 There are four useful procedures in R for woring with normal distributions: dnorm, pnorm, qnorm, rnorm. a. pnorm(2) P (X 2) b. pnorm(2) - pnorm(-2) P ( 2 X 2) c. 1 - pnorm(2) P (X 2) d. qnorm(0.60) q 60 such that P (X q 60 ) = 0.60, the 60th percentile e. rnorm(3) three random numbers from the standard normal distribution, for instance 0.3612443 0.1075216 1.0473477 f. dnorm() used for drawing the graph of the bell curve Spring 2016 Page 2 of 6
Calculations with X N(µ, σ) Agresti and Franlin report that female students at the University of Georgia have an approximately normal height distribution, with mean µ W = 65 inches and standard deviation σ W =3.5 inches. Male students have an approximately normal height distribution, with mean µ M = 70 inches and standard deviation σ M =4.0 inches. Let W N(µ W, σ W ), and M N(µ M, σ M ), and calculate the following (using R and using Agresti and Franlin, Appendix A, pp.a-1 and A-2): P (W 66), P (M 72), q such that P (W q) = 0.30, q such that P (M q) = 0.25 Calculate the z-score of a person with W = 63, of a person with M = 67. How tall is a woman with z-score 0.6? How tall is a man with z-score -0.7? See page 5 of these notes for R expressions which will calculate the answers to these questions. Men's and Women's Heights men women 0.00 0.02 0.04 0.06 0.08 0.10 Student s t, Chi-Square, F 55 60 65 70 75 80 85 height (in) Student s t, Chi-Square, and F distributions play ey roles in the sequel. All of them are families of continuous distributions. Student s t distributions resemble Normal distributions but they have fatter tails. Chi-Square and F distributions have domains the half line [0, ), so neither one is symmetric. Discrete distributions For X to be a Bernoulli random variable, and hence have a Bernoulli distribution, X Bernoulli(p), we require i. a binary outcome for a single event (generally coded as success, 1, or failure, 0) ii. a fixed probability of success, P (X = 1) = p, and failure, P (X = 0) = 1 p, for that event iii. exactly one event Examples of Bernoulli random variables include the outcome of a coin flip (h or t), or driver was wearing a seat belt (yes or no), or basetball player made a baset (1 or 0). Spring 2016 Page 3 of 6
Expected value and variance of a Bernoulli random variable, X Bernoulli(p): Expected value, µ X = p. Variance, σx 2 = p(1 p). Bernoulli distribution, p=1/6 0.0 0.2 0.4 0.6 0.8 1.0 0 1 Binomial random variable, X Binomial(n, p). The probability of successes in n trials. Expected value, µ X = np. Variance, σx 2 = np(1 p). Normal approximation to a binomial distribution. binomial distribution, p=1/6, n=10 0.00 0.10 0.20 0.30 0 2 4 6 8 10 Spring 2016 Page 4 of 6
Conditions for a binomial distribution For X to be a binomial random variable, and hence have a binomial distribution, X Binomial(n, p), we require i. a binary outcome for each event (coin flip produces h or t) ii. a single fixed probability of success for each event (p = 0.5) iii. a fixed number of events (n = 10 coin flips) Normal approximations to binomial distributions The distribution of a binomial random variable, X Binomial(n, p), has mean np and standard deviation np(1 p). It can be approximated by a normal probability distribution with the same mean and standard deviation, Y N(µ = np, σ = np(1 p)). The fit improves as n gets larger. binomial distribution, p=1/6, n=10 binomial distribution, p=1/6, n=30 0.00 0.10 0.20 0.30 0.00 0.05 0.10 0.15 0 1 2 3 4 5 6 0 2 4 6 8 10 12 binomial distribution, p=1/6, n=50 binomial distribution, p=1/6, n=100 0.00 0.05 0.10 0.15 0.00 0.04 0.08 0 5 10 15 20 0 10 20 30 40 Answers The following R expressions calculate the answers to the questions about heights of men and women at the University of Georgia posed above. For each calculation, draw a corresponding normal curve and shade the area or mar the measurement in question. pnorm(66, mean = 65, sd = 3.5), 1 pnorm(72, mean = 70, sd = 4.0), qnorm(0.30, mean = 65, sd = 3.5), qnorm(1 0.25, mean = 70, sd = 4.0), 63 65 67 70 z =, z =, 3.5 4.0 x = 65 + 0.6 3.5, x = 70 0.7 4.0. Spring 2016 Page 5 of 6
Exercises We will attempt to solve some of the following exercises as a community project in class today. Finish these solutions as homewor exercises, write them up carefully and clearly, and hand them in at the beginning of class next Friday. Homewor 6a distributions Exercises from Chapter 6: 6.9 (dice), 6.20 (batteries), 6.23 (reaction time), 6.33 (freezers), 6.41 (standard normal) Homewor 6b distributions Exercises from Chapter 6: 6.43 (paint), 6.45 (cors), 6.73 (diss), 6.84 (Sophie), 6.114 (testing normality) Spring 2016 Page 6 of 6