Chapter 5: Probability - PDF Free Download

Chapter 5: These notes reflect material from our text, Exploring the Practice of Statistics, by Moore, McCabe, and Craig, published by Freeman, 2014. quantifies randomness. It is a formal framework with a very specific vocabulary and notation. Imagine an experiment with a specific set of outcomes (say, flipping a fair coin twice). S is the sample space of all possible outcomes. Subsets of S are called events and are denoted with letters like A and B. The empty set, φ, is the event that contains no outcomes. Two events are disjoint if their intersection is empty. The Russian mathematician Kolmogorov helped to clarify the essential properties of a probability function, P. P(S) = 1 for the entire sample space S 0 P(A) 1 for any event A S P( n i=1 A i) = n i=1 P(A i) for disjoint events A i First examples : flip a coin, flip three coins, roll a die, roll two dice If you roll a die once the result is completely uncertain, because the individual outcomes are equally likely. But now begin to methodically roll the die and after each toss calculate the total number of 6 s observed so far divided by the total number of rolls at this point. Call this a cumulative proportion and graph these cumulative proportions for a large number of rolls of the die, say 100,000 rolls. A computer did this and displayed the following graph. In this particular simulation, the first ten rolls of the die produced the sequence 0001010010, where 1 means a 6 was rolled and 0 means something else appeared. Calculate the first ten cumulative sums for this short sequence and compare your results to the following chart. What is the height of the dotted red line? 0.30 0.25 0.20 p^n 0.15 0.10 0.05 0.00 1 10 100 1,000 10,000 100,000 n (number of rolls) Fig. Cumulative proportions of a 6 in 100,000 rolls of a fair die, from OpenIntro Statistics, chapter 2 Display discrete probabilities in a table Flip a fair coin outcome h t probability 0.5 0.5 Spring 2017 Page 1 of 12

Venn diagram A B Rules of Mutually exclusive events. A B = φ Unions. P(A B) = P(A) + P(B) P(A B). Complements. P(A c ) = 1 P(A). Independent events. P(A B) = P(A)P(B) when A and B are independent. Conditional probability. P(A B) = P(A B)/P(B) when P(B) 0 Intersections. P(A B) = P(A B)P(B) Spring 2017 Page 2 of 12

Contingency tables and conditional probabilities Vocabulary for diagnostic testing, S medical state present, P OS test positive : sensitivity P(P OS S), specificity P(NEG S c ), incidence P(S) Consider the Triple Blood Test for Down Syndrome (Agresti and Franklin, chapter 5, pp.232-233) Blood Test Status P OS NEG T otal D (Down) 48 6 54 D c (unaffected) 1307 3921 5228 T otal 1355 3927 5282 Calculate the following probabilities based on the figures in this study: sensitivity P(P OS D), specificity P(NEG D c ), incidence P(D) false positives P(P OS D c ), false negatives P(NEG D) An individual being tested would be most concerned about P(D P OS). What is this probability? Why is it so small? Hint: Calculate P(D c P OS). Again, an individual being tested would want to know P(D N EG). How would that probability compare to the a priori P(D)? Triple Blood Test POS NEG status unaffected Down blood test Spring 2017 Page 3 of 12

Using R to Compute Conditional Probabilities Construct a data frame named down to represent the Down Syndrome contingency table, and then use addmargins(down) to compute its row and column totals. down <- c(48, 1307, 6, 3921) dim(down) <- c(2, 2) dimnames(down) <- list(status=c("down", "unaffected"), "blood test"=c("pos", "neg")) down # blood test # status pos neg # down 48 6 # unaffected 1307 3921 addmargins(down) # blood test # status pos neg Sum # down 48 6 54 # unaffected 1307 3921 5228 # Sum 1355 3927 5282 Then prop.table(down, 1) will divide each row by its row sum. The numbers in each row are conditional probabilities. And prop.table(down, 2) will divide each column by its column sum. The numbers in each column are conditional probabilities. Therefore, each of the eight numbers shown below is a conditional probability of the form P(A B) for some A and B. Identify the correct A and B for each number. prop.table(down, 1) # blood test # status pos neg # down 0.8888889 0.1111111 # unaffected 0.2500000 0.7500000 prop.table(down, 2) # blood test # status pos neg # down 0.03542435 0.001527884 # unaffected 0.96457565 0.998472116 What values do these tables indicate for P(pos down) and P(down pos)? Spring 2017 Page 4 of 12

Boston Smallpox Epidemic of 1721 The following contingency table (OpenIntro Statistics, pp.83 87) refers to the Boston smallpox epidemic of 1721. A total of 6224 residents of Boston contracted smallpox in this epidemic and 850 of them died. The epidemic was marked by vigorous public debate of the value (or lack thereof) of a type of inoculation known as variolation (which was dangerous). The Reverend Cotton Mather advocated inoculation but the physician William Douglass was firmly against it. See the article in Harvard s Contagion for more details. An effective smallpox vaccination procedure was eventually demonstrated by Edward Jenner in England in 1796, and succeeding efforts to eradicate smallpox from the world were finally declared to be successful in 1980 by the World Health Organization. Cotton Mather, on the other hand, lives on in infamy for his role in the Salem witch trials. Inoculated Result yes no T otal lived 238 5136 5374 died 6 844 850 T otal 244 5980 6224 Smallpox Epidemic, Boston, 1721 yes no died result lived innoculated Spring 2017 Page 5 of 12

Tree Diagrams The following tree diagram, generated by OpenIntro software, summarizes the relevant statistics for the Boston smallpox epidemic of 1721. Here Inoculated is a categorical explanatory variable with levels yes and no. In the Inoculated column of the tree diagram are the probabilities P(yes) and P(no). The categorical response variable Result has levels lived and died. The conditional probabilities in the Result column are P(lived yes), P(died yes), P(lived no), P(died no). The probabilities calculated by the software in the third column are P(lived and yes), P(died and yes), P(lived and no), P(died and no), because P(A) P(B A) = P(A B). Innoculated yes, 0.0392 Result lived, 0.9754 died, 0.0246 0.0392*0.9754 = 0.03824 0.0392*0.0246 = 0.00096 no, 0.9608 lived, 0.8589 died, 0.1411 0.9608*0.8589 = 0.82523 0.9608*0.1411 = 0.13557 Fig. Smallpox in Boston, 1721, from OpenIntro Statistics, chapter 2, pp.83-87 Spring 2017 Page 6 of 12

Random variables A random variable is a function from the sample space, S, of an experiment to the real numbers, X : S R, so we might characterize a random variable as a function which assigns a numerical value to an outcome of an experiment. Random variables can be defined on discrete and on continuous sample spaces. discrete: flip a coin, flip three coins, roll a die, roll two dice, roulette wheel, spinner continuous: random number generators: U[0, 1], N(0, 1), N(µ, σ) Expected value of a random variable, E(X) = µ X Variance of a random variable, Var(X) = σ 2 X Linear combinations of random variables, Y = ax 1 + bx 2 Expected value and variance of a linear combination of random variables. If Y = ax 1 + bx 2, then E(Y ) = ae(x 1 ) + be(x 2 ), and Var(Y ) = a 2 Var(X 1 ) + b 2 V(X 2 ). Distributions of random variables Calculation of probability using a continuous distribution, P(X x). The area of the blue region in the following figure is the probability that the random variable X N(µ, σ) takes on a value less than or equal to 5. That probability is denoted P(X 5). X ~ N(µ, σ) y -3-1 1 3 5 7 9 x Normal distributions Normal random variable, X N(µ, σ). z-score, z = (x µ)/σ. If z = (x µ)/σ, then x = µ + z σ. Spring 2017 Page 7 of 12

Standardized normal random variable, Z N(0, 1). If X N(µ, σ) and Z = (X µ)/σ, then Z N(0, 1). This is why our textbook need only contain a table of values for the standard normal distribution. Areas of regions under a normal distribution curve. Percentiles. The 68-95-99.7% rule. Q-Q plots. Calculations with X N(0, 1) Suppose that the random variable X has a standard normal distribution, X N(0, 1). X ~ N(0, 1) 0.0 0.1 0.2 0.3 0.4-3 -2-1 0 1 2 3 There are four useful procedures in R for working with normal distributions: dnorm, pnorm, qnorm, rnorm. a. pnorm(2) P(X 2) b. pnorm(2) - pnorm(-2) P( 2 X 2) c. 1 - pnorm(2) P(X 2) d. qnorm(0.60) q 60 such that P(X q 60 ) = 0.60, the 60th percentile e. rnorm(3) three random numbers from the standard normal distribution, for instance 0.3612443 0.1075216 1.0473477 f. dnorm() used for drawing the graph of the bell curve Spring 2017 Page 8 of 12

Calculations with X N(µ, σ) Agresti and Franklin report that female students at the University of Georgia have an approximately normal height distribution, with mean µ W = 65 inches and standard deviation σ W =3.5 inches. Male students have an approximately normal height distribution, with mean µ M = 70 inches and standard deviation σ M =4.0 inches. Let W N(µ W, σ W ), and M N(µ M, σ M ), and calculate the following (using R and using Agresti and Franklin, Appendix A, pp.a-1 and A-2): P(W 66), P(M 72), q such that P(W q) = 0.30, q such that P(M q) = 0.25 Calculate the z-score of a person with W = 63, of a person with M = 67. How tall is a woman with z-score 0.6? How tall is a man with z-score -0.7? See page 11 of these notes for R expressions which will calculate the answers to these questions. Men's and Women's Heights men women 0.00 0.02 0.04 0.06 0.08 0.10 Student s t, Chi-Square, F 55 60 65 70 75 80 85 height (in) Student s t, Chi-Square, and F distributions play key roles in the sequel. All of them are families of continuous distributions. Student s t distributions resemble Normal distributions but they have fatter tails. Chi-Square and F distributions have domains the half line [0, ), so neither one is symmetric. Discrete distributions For X to be a Bernoulli random variable, and hence have a Bernoulli distribution, X Bernoulli(p), we require i. a binary outcome for a single event (generally coded as success, 1, or failure, 0) ii. a fixed probability of success, P(X = 1) = p, and failure, P(X = 0) = 1 p, for that event iii. exactly one event Examples of Bernoulli random variables include the outcome of a coin flip (h or t), or driver was wearing a seat belt (yes or no), or basketball player made a basket (1 or 0). Spring 2017 Page 9 of 12

Expected value and variance of a Bernoulli random variable, X Bernoulli(p): Expected value, µ X = p. Variance, σx 2 = p(1 p). Bernoulli distribution, p=1/6 probability density 0.0 0.2 0.4 0.6 0.8 1.0 0 1 k Binomial random variable, X Binomial(n, p). The probability of k successes in n trials. Expected value, µ X = np. Variance, σx 2 = np(1 p). Normal approximation to a binomial distribution. binomial distribution, p=1/6, n=10 probability density 0.00 0.10 0.20 0.30 0 2 4 6 8 10 k Spring 2017 Page 10 of 12

Conditions for a binomial distribution For X to be a binomial random variable, and hence have a binomial distribution, X Binomial(n, p), we require i. a binary outcome for each event (coin flip produces h or t) ii. a single fixed probability of success for each event (p = 0.5) iii. a fixed number of events (n = 10 coin flips) Normal approximations to binomial distributions The distribution of a binomial random variable, X Binomial(n, p), has mean np and standard deviation np(1 p). It can be approximated by a normal probability distribution with the same mean and standard deviation, Y N(µ = np, σ = np(1 p)). The fit improves as n gets larger. binomial distribution, p=1/6, n=10 binomial distribution, p=1/6, n=30 probability density 0.00 0.10 0.20 0.30 probability density 0.00 0.05 0.10 0.15 0 1 2 3 4 5 6 k 0 2 4 6 8 10 12 k binomial distribution, p=1/6, n=50 binomial distribution, p=1/6, n=100 probability density 0.00 0.05 0.10 0.15 probability density 0.00 0.04 0.08 0 5 10 15 20 k 0 10 20 30 40 k Answers The following R expressions calculate the answers to the questions about heights of men and women at the University of Georgia posed above. For each calculation, draw a corresponding normal curve and shade the area or mark the measurement in question. pnorm(66, mean = 65, sd = 3.5), 1 pnorm(72, mean = 70, sd = 4.0), qnorm(0.30, mean = 65, sd = 3.5), qnorm(1 0.25, mean = 70, sd = 4.0), 63 65 67 70 z, z, 3.5 4.0 x 65 + 0.6 3.5, x 70 0.7 4.0. Spring 2017 Page 11 of 12

Exercises We will attempt to solve some of the following exercises as a community project in class today. Finish these solutions as homework exercises, write them up carefully and clearly, and hand them in at the beginning of class next Friday. Homework 5a probability models Exercises from Sections 5.1, 5.2: 5.2 (graduation rates), 5.3 (free throws), 5.24 (blood types), 5.26 (Canada) Homework 5b random variables Exercises from Sections 5.3, 5.4: 5.46 (households), 5.54 (foreign-born), 5.65 (fruits and veggies), 5.75 (sums) Homework 5c binomial distributions and probability rules Exercises from Sections 5.5, 5.6 and Chapter 5 exercises: 5.94 (music), 5.102 (die), 5.118 (tree diagram), 5.142 (SAT scores) Spring 2017 Page 12 of 12