PHP 2510 Random variables; some discrete distributions Random variables - what are they? Probability mass function; cumulative distribution function Some discrete random variable models: Bernoulli Binomial Geometric Negative binomial PHP 2510 Sept 18, 2008 1
Random variables A random variable is essentially a random number. Formally, a random variable maps elements of a sample space to the set of real numbers. Example. Toss a fair coin 3 times. The sample space of all possible sequences is Ω = {hhh, hht, hth, thh, htt, tht, tth, ttt} Examples of random variables: X = number of heads Y = number of consecutive heads Z = 1 if three heads, 0 if not. We denote random variables by italic uppercase letters. PHP 2510 Sept 18, 2008 2
Discrete random variable takes on a finite (or countable) number of distinct values, such as the number of illnesses in a year. Continuous random variables take on values along a continuum, such as time until an event, or height of a randomly selected person. Our focus today is on discrete random variables PHP 2510 Sept 18, 2008 3
Random variables and probability mass functions A probability mass function (PMF) describes the frequency or probability of each value of a random variable. Example. Let X be the number of heads in three tosses of a fair coin. The PMF of X is P (X = 0) = 1/8 P (X = 1) = 3/8 P (X = 2) = 3/8 P (X = 3) = 1/8 PHP 2510 Sept 18, 2008 4
Example. Let Y be the number of consecutive heads in three tosses of a fair coin. P (Y = 0) = 1/8 P (Y = 1) = 4/8 P (Y = 2) = 2/8 P (Y = 3) = 1/8 Example. Let Z = 1 if 3 heads are tossed, and Z = 0 otherwise. The PMF of Z is P (Z = 0) = 7/8 P (Z = 1) = 1/8 PHP 2510 Sept 18, 2008 5
PMF and CDF of a random variable The probability mass function (PMF) is usually denoted by p(x) = P (X = x). For a discrete variable having outcomes x 1, x 2,..., the PMF sums to one: p(x i ) = 1 i The cumulative distribution function (CDF) is defined as F (x) = P (X x). PHP 2510 Sept 18, 2008 6
Example. Let X denote the number of heads in three tosses of a coin. This table shows the PMF and CDF of X: x p(x) F (x) 0 1/8 1/8 1 3/8 4/8 2 3/8 7/8 3 1/8 1 PHP 2510 Sept 18, 2008 7
Bernoulli distribution A Bernoulli random variable takes on only two values: 0 (failure) and 1 (success). The probability of success is π, then the probability of failure is 1 π. p(1) = π p(0) = 1 π, or p(x) = π x (1 π) x, for x = 0 or 1. Example: The prevalence of HIV infection is 11%. Let X be the HIV status of a randomly chosen people. X = 1 if HIV+; X = 0 if HIV-. Then, X has a Bernoulli distribution. p(x = 1) = 0.11, p(x = 0) = 0.89. PHP 2510 Sept 18, 2008 8
Binomial distribution The binomial model for a random variable X characterizes number of successes in n repeated trials of an experiment that can result either in success or failure. Example 1. X = number of heads on 10 tosses of a fair coin Example 2. Y = number of winning lottery tickets out of 10 million purchased Example 3. Z = number of 100 patients in a clinical trial who have cancer remission following an experimental treatment Example 4. W = number of the 3 transferred embryos that implant in a woman s uterus following in-vitro fertilization PHP 2510 Sept 18, 2008 9
Mass function for binomial distribution When trials are independent, probability of having x successes in n trials is the same, regardless of the ordering of successes and failures. First, any particular sequence of x successes occurs with prob = π π π (1 π) (1 π) (1 π) }{{}}{{} x successes = π x (1 π) n x n x failures There are ( n x) ways of assigning x successes in a sequence of n trials. Then, P (X = x) = (number of ways to have x successes) π x (1 π) n x ( ) n = π x (1 π) n x. x PHP 2510 Sept 18, 2008 10
Example: Number of smokers in a sample of size n 29% of Americans are smokers. Suppose you select 3 people at random from the population (i.e. n = 3). Let X denote the number of smokers in the sample. PHP 2510 Sept 18, 2008 11
1st person 2nd person 3rd person x P (X = x) 1 1 1 3 0.02 0 1 1 2 0.06 1 1 0 2 0.06 1 0 1 2 0.06 1 0 0 1 0.15 0 1 0 1 0.15 0 0 1 1 0.15 0 0 0 0 0.36 PHP 2510 Sept 18, 2008 12
Construct mass function for X ( ) 3 P (X = 0) = 0 ( ) 3 P (X = 1) = 1 P (X = 2) = P (X = 3) =.29 0.71 3 =.36.29 1.71 2 =.45 PHP 2510 Sept 18, 2008 13
Quick review If the sample contains at least one smoker, what is the probability it contains exactly one smoker? Ans =.70 PHP 2510 Sept 18, 2008 14
Example calculations with the binomial distribution Example 1: Roll 5 fair dice. Let X = number of sixes. Find: 1. P (X = 0) 2. P (X > 0) 3. P (X = 2 X > 0) 4. E(X) PHP 2510 Sept 18, 2008 15
Example 2: Testing whether a die is fair. 1. A die is rolled 5 times, and a six does not come up. Is the die fair? (p(0) =.40) 2. A die is rolled 10 times, and a six does not come up. Is it fair? (p(0) =.16) 3. A die is rolled 50 times, and six only comes up twice. Is it fair? (p(2) =.005, p(1) =.001, p(0) =.0001). PHP 2510 Sept 18, 2008 16
Geometric distribution The geometric distribution is useful for modeling waiting times on a discrete scale. Assume independent trials where success probability is pi Geometric variable X characterizes the number of trials until the first success. To have the first success occur on trial k, need k 1 failures before the first success. Probability mass function is P (X = k) = (1 π) k 1 π PHP 2510 Sept 18, 2008 17
Example. Probability of contracting HIV in a single sexual encounter is 1 in 500. Let X denote the encounter during which a person gets infected for the first time. Assume each encounter is independent and carries the same risk. The mass function is P (X = k) = ( ) k 1 ( ) 499 1 500 500 Example. What is the probability of contracting HIV within the first 3 encounters? P (X = 1) = =.002 P (X = 2) = =.001996 P (X = 3) = =.001992 P (X 3) = =.006 PHP 2510 Sept 18, 2008 18