Unit 4 The Bernoulli and Binomial Distributions

Similar documents
Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

TOPIC: PROBABILITY DISTRIBUTIONS

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

4 Random Variables and Distributions

Statistics 6 th Edition

Unit 4 Bernoulli and Binomial Distributions Week #6 - Practice Problems. SOLUTIONS Revised (enhanced for q4)

MAKING SENSE OF DATA Essentials series

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

2. Modeling Uncertainty

Chapter 5. Sampling Distributions

Bernoulli and Binomial Distributions

Chapter 3 Discrete Random Variables and Probability Distributions

Discrete Probability Distributions

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Section Random Variables and Histograms

Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes

Section 0: Introduction and Review of Basic Concepts

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

2011 Pearson Education, Inc

VIDEO 1. A random variable is a quantity whose value depends on chance, for example, the outcome when a die is rolled.

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

x is a random variable which is a numerical description of the outcome of an experiment.

ECON 214 Elements of Statistics for Economists 2016/2017

The binomial distribution

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Random variables. Discrete random variables. Continuous random variables.

The normal distribution is a theoretical model derived mathematically and not empirically.

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

Part 10: The Binomial Distribution

TYPES OF RANDOM VARIABLES. Discrete Random Variable. Examples of discrete random. Two Characteristics of a PROBABLITY DISTRIBUTION OF A

Statistical Methods in Practice STAT/MATH 3379

Section Distributions of Random Variables

A useful modeling tricks.

STAT 201 Chapter 6. Distribution

Binomial Random Variable - The count X of successes in a binomial setting

Chapter 5 Student Lecture Notes 5-1. Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 4 Discrete Random Variables

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Lean Six Sigma: Training/Certification Books and Resources

The Binomial Distribution

Useful Probability Distributions

The Binomial Distribution

E509A: Principle of Biostatistics. GY Zou

PROBABILITY AND STATISTICS CHAPTER 4 NOTES DISCRETE PROBABILITY DISTRIBUTIONS

Sampling & populations

Fixed number of n trials Independence

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Section Distributions of Random Variables

LESSON 9: BINOMIAL DISTRIBUTION

STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions

Statistics for Managers Using Microsoft Excel 7 th Edition

We use probability distributions to represent the distribution of a discrete random variable.

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances

Business Statistics 41000: Probability 4

Elementary Statistics Lecture 5

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 11. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

6.3: The Binomial Model

Lecture 9. Probability Distributions. Outline. Outline

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Stat 20: Intro to Probability and Statistics

Example 1: Identify the following random variables as discrete or continuous: a) Weight of a package. b) Number of students in a first-grade classroom

Chapter 6: Random Variables

Lecture 9. Probability Distributions

Discrete Probability Distribution

Chapter 4. Section 4.1 Objectives. Random Variables. Random Variables. Chapter 4: Probability Distributions

STAT Mathematical Statistics

STA Module 3B Discrete Random Variables

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Discrete Random Variables

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

4.3 Normal distribution

Chapter 3: Probability Distributions and Statistics

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

STT315 Chapter 4 Random Variables & Probability Distributions AM KM

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

MATH 118 Class Notes For Chapter 5 By: Maan Omran

Unit 04 Review. Probability Rules

Business Statistics. Chapter 5 Discrete Probability Distributions QMIS 120. Dr. Mohammad Zainal

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

Chapter 5. Discrete Probability Distributions. McGraw-Hill, Bluman, 7 th ed, Chapter 5 1

3.3 Probability Distribution(p102)

CSSS/SOC/STAT 321 Case-Based Statistics I. Random Variables & Probability Distributions I: Discrete Distributions

CD Appendix F Hypergeometric Distribution

Math 166: Topics in Contemporary Mathematics II

Theoretical Foundations

Math : Spring 2008

The Bernoulli distribution

(c) The probability that a randomly selected driver having a California drivers license

9 Expectation and Variance

Chapter 3. Discrete Probability Distributions

Mean, Variance, and Expectation. Mean

Binomial Distributions

Unit 6 Bernoulli and Binomial Distributions Homework SOLUTIONS

Stats CH 6 Intro Activity 1

Transcription:

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 1 of 21 Unit 4 The Bernoulli and Binomial Distributions If you believe in miracles, head for the Keno lounge - Jimmy the Greek The Amherst Regional High School provides flu vaccinations to a random sample of 200 students. How many will develop the flu? A new treatment for stage IV melanoma is given to 75 cases. How many will survive two or more years? In a sample of 300 cases of uterine cancer, how many have a history of IUD use? The number of events in each of these scenarios is a random variable that is modeled well using a Binomial probability distribution. When the number of trials is just one, the probability model is called a Bernoulli trial. The Bernoulli and Binomial probability distributions are used to describe the chance occurrence of success/failure outcomes. They are also the basis of logistic regression which is used to identify the possibly multiple predictors of success/failure outcomes.

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 2 of 21 Table of Contents Topic 1. Unit Roadmap. 2. Learning Objectives. 3. Introduction to Discrete Probability Distributions... 4. Statistical Epectation.. 5. The Population Variance is a Statistical Epectation.. 6. The Bernoulli Distribution.. 7. Introduction to Factorials and Combinatorials. 8. The Binomial Distribution.. 9. Calculation of Binomial Probabilities.. 10. Resources for the Binomial Distribution.. 3 4 5 7 10 11 13 16 19 21

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 3 of 21 1. Unit Roadmap Unit 4. Bernoulli and Binomial / Populations are the measurements of our observations of nature. This unit focuses on nominal data that are binary. Previously, we learned that data can be of several types nominal, ordinal, quantitative, etc. A nominal variable that is binary or dichotomous has eactly two possible values. Eamples are vital status (alive/dead), eposure (yes/no), tumor remission(yes/no), etc. The frequentist view of probability says that probability is the relative frequency in an indefinitely large number of trials. In this framework, a probability distribution model is a model of chance. It describes the way that probability is distributed among the possible values that a random variable can take on. The Bernoulli and Binomial probability distribution models are often very good descriptions of patterns of occurrence of events that are of interest in public health; eg - mortality, disease, and eposure. Relationships

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 4 of 21 2. Learning Objectives When you have finished this unit, you should be able to: Eplain the frequentist approach to probability. Define a discrete probability distribution. Eplain statistical epectation for a discrete random variable.. Define the Bernoulli probability distribution model. Eplain how to count the # ways using the tools of factorials and combinatorials. Define the Binomial probability distribution model. Calculate binomial probabilities.

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 5 of 21 3. Introduction to Discrete Probability Distributions A discrete probability distribution is defined by (i) a listing of all the possible random variable values, together with (ii) their associated probabilities of occurrence. The listing of possible random variable outcomes must comprise ALL the possibilities (be ehaustive) Each possibility has a likelihood of occurrence ( chances of occurrence ) that is a number somewhere between 0 and 1. Looking ahead We ll have to refine these notions when we come to speaking about continuous distributions because, in those situations, the number of possible outcomes is infinite!. Eample: Gender of a randomly selected student We ll use capital X as our placeholder for the random variable name: X = Gender of randomly selected student from the population of students at a University We ll use small as our placeholder for a value of the random variable X: = 0 if gender of the selected student is male = 1 if gender of the selected student is female

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 6 of 21 Value of the Random Variable X is = Probability that X has value is Pr [ X = ] = 0 = male 1 = female 0.53 0.47 Note that this roster ehausts all possibilities. Note that the sum of these individual probabilities, because the sum is taken over all possibilities, is 100% or 1.00. Some useful terminology - 1. For discrete random variables, a probability model is the set of assumptions used to assign probabilities to each outcome in the sample space. The sample space is the universe, or collection, of all possible outcomes. 2. A probability distribution defines the relationship between the outcomes and their likelihood of occurrence. 3. To define a probability distribution, we make an assumption (the probability model) and use this to assign likelihoods.

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 7 of 21 4. Statistical Epectation Statistical epectation was introduced in Appendi 2 of Unit 2 Introduction to Probability, pp 54-55. A variety of wordings might provide a clearer feel for statistical epectation. Statistical epectation is the long range average. Think of this as what you can epect in the long run. The statistical epectation of what the state of Massachusetts will pay out is the long range average of the payouts taken over all possible individual payouts. Statistical epectation represents an on balance, even if on balance is not actually possible. IF $1 has a probability of occurrence = 0.50 $5 has a probability of occurrence = 0.25 $10 has a probability of occurrence = 0.15 and $25 has a probability of occurrence = 0.10 THEN in the long run, or on balance, the epected winning is $5.75 because $5.75 = [$1](0.50) + [$5](0.25) +[$10](0.15) + [$25](0.10) Notice that the on balance dollar amount of $5.75 is not an actual possible winning

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 8 of 21 In the long run, what can the State of Massachusetts epect to pay out on average? The answer is a weighted sum of the possible winnings, where the weights are the associated chances of occurrence. [ Statistical epectation = $5.75 ] = [$1 winning] (percent of the time this winning occurs=0.50) + [$5 winning] (percent of the time this winning occurs =0.25) + [$10 winning] (percent of the time this winning occurs = 0.15) + [$25 winning] ](percent of the time this winning occurs = 0.10) You can replace the word statistical epectation with net result, long range average, in the long run, or, on balance. Statistical Epectation Discrete Random Variable X For a discrete random variable X (e.g. winning in lottery) Having probability distribution as follows: Value of X, = P[X = ] = $ 1 0.50 $ 5 0.25 $10 0.15 $25 0.10 The statistical epectation of the random variable X is written as E[X]=μ. When X is discrete, it is calculated as the weighted sum of all the possible values, using weights equal to associated probabilities of occurrence Pr[X=] E [ X ] = μ = all possible X= []P(X = ) In the likely winnings eample, μ = $5.75

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 9 of 21 We can calculate the statistical epectation of other things, too. Eample Suppose we want to know how much we can epect to win or lose, by taking into account the cost of the purchase of the lottery ticket. Suppose a lottery ticket costs $15 to purchase We can epect to win a -$9.25. Put another way, we can epect to lose $9.25. Here s how it works. [ Statistical epectation of amount won = -$9.25 ] = [$1 winning - $15 cost] (percent of the time this winning occurs=0.50) + [$5 winning - $15 cost] (percent of the time this winning occurs =0.25) + [$10 winning - $15 cost] (percent of the time this winning occurs = 0.15) + [$25 winning] - $15 cost ](percent of the time this winning occurs = 0.10) Statistical Epectation Discrete Random Variable Y = [X-15] Value of Y, y = P[Y=y] = $ 1 - $15 = -$14 0.50 $ 5 - $15 = -$10 0.25 $10 - $15 = -$5 0.15 $25 - $15 = +$10 0.10 The realization of the loss random variable Y has statistical epectation E[Y]=μ Y Y μ = [y] P(Y=y) = - $9.25 all possible Y=y

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 10 of 21 5. The Population Variance is a Statistical Epectation Eample, continued - One play of the Massachusetts State Lottery. The random variable X is the winnings. X possible values =$1, $5, $10, and $25. The statistical epectation of X is μ = $5.75. Recall that this figure is what the state of Massachusetts can epect to pay out, on average, in the long run. What about the variability in X? In learning about population variance σ 2 for the first time, we understood this to be a measure of the variability of individual values in a population. The population variance σ 2 of a random variable X is the statistical epectation of the quantity [ X μ ] 2 Discrete Random Variables Variance σ 2 = Statistical Epectation of [X-μ] 2 = E[X-μ] 2 For a discrete random variable X (e.g. winning in lottery) Having probability distribution as follows: Value of [X-μ] 2 = P[X = ] = [1-5.75] 2 = 22.56 0.50 [5 5.75] 2 = 0.56 0.25 [10 5.75] 2 = 18.06 0.15 [25 5.75] 2 = 370.56 0.10 The variance of a random variable X is the statistical epectation of the random variable [X-μ] 2 is written as Var[X]=σ 2 When X is discrete, it is calculated as the weighted sum of all the possible values [-μ] 2, using weights equal to associated probabilities of occurrence Pr[X=] ( ) ( ) 2 2 2 σ = E X-μ = [ -μ ] P(X=) all possible X= In the likely winnings eample, σ 2 = 51.19 dollars squared.

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 11 of 21 6. The Bernoulli Distribution The Bernoulli Distribution is an eample of a discrete probability distribution. It is often the probability model that is used for the analysis of proportions and rates. Eample The fair coin toss. We ll use capital Z as our placeholder for the random variable name here: Z = Face of coin toss We ll use small z as our placeholder for a value of the random variable Z: z = 1 if heads z = 0 if tails We ll use π and (1-π) as our placeholder for the associated probabilities π = Pr[Z=1] eg This is the probability of heads and is equal to.5 when the coin is fair (1-π) = Pr[Z=0] Bernoulli Distribution (π) ( Bernoulli Trial ) A random variable Z is said to have a Bernoulli Distribution if it takes on the value 1 with probability π and takes on the value 0 with probability (1-π). Value of Z = P[Z = z] = 1 π 0 (1 π) (1) μ = Mean = E[Z] = Statistical Epectation of Z μ = π (2) σ 2 = Variance = Var[Z] = E[ (Z-μ) 2 ] = Statistical Epectation of (Z-μ) 2 σ 2 = π (1 π) A Bernoulli Distribution is used to model the outcome of a SINGLE event trial Eg mortality, MI, etc.

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 12 of 21 Mean (μ) and Variance (σ 2 ) of a Bernoulli Distribution Mean of Z = μ = π The mean of Z is represented as E[Z]. E[Z] = π because the following is true: E[Z] = [z]probability[z = z] All possible z = [0]Pr[Z=0]+[1]Pr[Z=1] = [ 0]( 1 π ) + [ 1]( π ) = π Variance of Z = σ 2 = (π)(1-π) The variance of Z is Var[Z] = E[ (Z (EZ) 2 ]. Var[Z] = π(1-π) because the following is true: 2 2 Var[Z] = E[(Z - π ) ] = [(z - π ) ]Probability[Z = z] All possible z 2 2 = [(0 - π) ]Pr[Z = 0] +[(1- π ) ]Pr[Z = 1] = [ π 2 ]( 1 π ) + [( 1 π ) 2 ]( π ) = π( 1 π)[ π + ( 1 π)] = π ( 1 π)

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 13 of 21 7. Introduction to Factorials and Combinatorials From 1 Trial to Many Trials - When we do a SINGLE trial of event/non-event occurrence, this is a Bernoulli trial. When we do SEVERAL trials of event/non-event occurrence, this is a Binomial random variable. We need to understand factorials and combinatorials in order to understand the Binomial distribution Preliminary Introduction to the factorial The factorial is just a shorthand that saves us from having to write out in longhand multiplications of the form (3)(2)(1) or (5)(4)(3)(2)(1) or (10)(9)(8)(7)(6)(5)(4)(3)(2)(1) well, you get the idea.. Notation: n factorial is written n! Definition: n! = (n)(n-1)(n-2) (3)(2)(1) Eample - 3! = (3)(2)(1) = 6 Eample - 8! = (8)(7)(6)(5)(4)(3)(2)(1) = 40,320 Definition: 0! = 1 Factorial n! = (n)(n-1)(n-2) (2)(1) 0! = 1 by convention

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 14 of 21 Motivating the Combinatorial Eample A bo contains 1 red marble and nine green marbles. Five draws are made at random with replacement. In how many ways can you get 2 reds and 3 greens? One outcome that satisfies 2 reds and 3 greens occurs when the first 2 draws are both red and the remaining three draws are all green: R R G G G Another outcome that satisfies 2 reds and 3 greens occurs when the first and last draws are red and the middle draws are green: R G G G R So now - what is the total number of outcomes that satisfy 2 reds and 3 greens? The answer is a combinatorial. Here, it is solved as five choose 2 and is equal to: 5 5! (5)(4)(3)(2)(1) "5 choose 2" ways = = = = 10 2 2! 3! (2)(1)(3)(2)(1) Check: If you wanted to, you could check this result for yourself by listing all the ways to obtain 2 red and 3 green. I ve given you 2 ( RRGGG and RGGGR ). There are 8 more.

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 15 of 21 The Combinatorial Question: How many ways can we choose from n? Another wording of this question is what is the number of combinations of n items that can be formed by taking them at a time? Notation: One notation for this is n C. Another is n Combinatorial The number of ways to take items from n without order is: n n n! C = =! (n-)! Note - n n = n- n n = = 1 0 n

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 16 of 21 8. The Binomial Distribution What is the probability that n independent event/non-event trials, each with probability of event equal to π will yield events? This is answered using a binomial distribution model. Binomial Distribution (n, π) (n independent Bernoulli Trials) A random variable X is said to have a Binomial (n,π) if it is the sum of n independent Bernoulli (π) trials. Value of X = P[X = ] = 0 (1-π) n 1 n π (1 π) n-1 n π 1 - π n π n ( ) n- (1) μ = Mean = E[X] = Statistical Epectation of X μ = nπ (2) σ 2 = Variance = Var[X] = E[ (X-μ) 2 ] = Statistical Epectation of (X-μ) 2 σ 2 = nπ (1 π) Eg - - What is the probability that 5 draws, with replacement, from an urn with 10 marbles (1 red, 9 green) will yield eactly 2 red? Answer: Pr[X=2] for Binomial (n=5, π=1/10) - What is the probability that among 100 vaccinated for flu, with subsequent probability of flu equal to.04, that 13 will suffer flu? Answer: Pr[X=13] for Binomial (n=100, π=.04)

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 17 of 21 Binomial Formula The binomial formula is the binomial distribution probability that you use to calculate a binomial probabilities of the form: What is the probability that n independent Bernoulli trials, each with probability of success = π yields a total of events of success? The probability of obtaining eactly events of success in n independent trials, each with the same probability of event success equal to π: ( )! ( n-)! ) n n- n! Pr[X=] = π 1-π = π 1-π ( ) n-

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 18 of 21 Another Look Binomial Distribution (n, π) X = sum of n independent Bernoulli(π) Trials Z The n Bernoulli trials are Z 1 Z 2 Z n - Each Z i has possible values of 1 ( success ) or 0 ( failure ) - Pr [ Z i = 1 ] = π and Pr [ Z i = 0 ] = (1-π) for i=1, 2,, n The Binomial random variable is X = Z 1 + Z 2 + + Z n. X is distributed Binomial(n, π) X = i=n i=1 Z i For X ~ Binomial (n, π ), the probability that X = is given by the binomial formula: [ ] n!! ( n-)! ) ( ) n- Probability X= = π 1-π, where X has possible values = 0, 1, 2,, n E [X ] and the variance Var [X] is obtained by working with the Z 1 Z 2 Z n E [X ] is actually is E[ i= 1 Var [ X ] is actually Var[ n n i= 1 Z i ] = n π Z i ] = n π (1-π)

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 19 of 21 9. Calculation of Binomial Probabilities A roulette wheel lands on each of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 with probability =.10. Write down the epression for the calculation of the following. #1. The probability of 5 or 6 eactly 3 times in 20 spins. #2. The probability of digit greater than 6 at most 3 times in 20 spins.

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 20 of 21 Solution for #1. The event is an outcome of either 5 or 6 Thus, Probability [event] = π =.20 20 spins says that the number of trials is n = 20 Thus, X is distributed Binomial(n=20, π=.20) F Pr[X = 3] = 20 3 HG I KJ. 20 1. 20 20 3 = F H G I K J =.2054 3 20 3 3 17 [. 20] [. 80] Solution for #2. The event is an outcome of either 7 or 8 or 9 Thus, Pr[event] = π =.30 As before, n = 20 Thus, X is distributed Binomial(n=20, π=.30) Translation: At most 3 times is the same as saying 3 times or 2 times or 1 time or 0 times which is the same as saying less than or equal to 3 times Pr[X 3] = Pr[X = 0] + Pr[X = 1] + Pr[X = 2] + Pr[X = 3] RF = S H G T IU KJ V W 20 3. 30 [. 70] 20 = 0 20 0 20 20 20 0 20 1 19 [. ] [. ] [. ] [. ]... 30. 70 1 2 3 F = H G I K J F + H G I K J F + H G I K J F + H G I 30 70 30 70 30 70 K J =.10709 2 18 3 17

PubHlth 540 Fall 2013 4. Bernoulli and Binomial Page 21 of 21 10. Resources for the Binomial Distribution Note - To link directly to these resources, visit the PubHlth540 course web site (wwwuni.oit.umass.edu/~biep540w). From the welcome page, click on BERNOULLI AND BINOMIAL DISTRIBUTIONS at left. Additional Reading A 2 page lecture on the Binomial Distribution from University of North Carolina. http://www.unc.edu/~knhighto/econ70/lec7/lec7.htm Calculation of Binomial Probabilities Vassar Stats Eact Probability Calculator http://faculty.vassar.edu/lowry/binomialx.html