Binomial distribution Jon Michael Gran Department of Biostatistics, UiO MF9130 Introductory course in statistics Tuesday 24.05.2010 1 / 28
Overview Binomial distribution (Aalen chapter 4, Kirkwood and Sterne chapter 15.1-15.2) Probability distributions for counting variables Binomial distribution Introduction to hypothesis testing 2 / 28
Probability distributions for counting variables Concepts Random (stochastic) trial: In forehand, we don t know the outcome, but we know the set of possible outcomes Random variable: Number linked to the outcome. We don t know this value before the trial is carried out Probability distribution: The set of probabilities for each of the possible values 3 / 28
Sucesses and Failures Often, a process has only two outcomes. A few examples can be: When tossing a coin, you get either head or tails An industrial process produces a product that can be either usable or defect A HIV test look for the presence or absence of antibodies in the blood 4 / 28
Sucesses and Failures Often, a process has only two outcomes. A few examples can be: When tossing a coin, you get either head or tails An industrial process produces a product that can be either usable or defect A HIV test look for the presence or absence of antibodies in the blood Or, there are just two outcomes of interest: Throwing a dice, you are only interested in if you get a six or not 4 / 28
Binomial trials A series of random trials satisfying the following requirements: In each trial one registers whether an event A occurs or not The probability of A is the same in each trial, and is denoted by p All trials are independent of each other 5 / 28
Examples: Binomial trials Tossing a coin: A: Tails, p = 1/2 Throwing a dice: A: Six, p = 1/6 Child births: A: Girl, p = 0.5 A: Spina bifida (ryggmargsbrokk), p = 0.001 A: Birth weight < 2500, p =? Epidemiology: A: Myocardial infarction, p =? Genetics: Mother and father are carriers of the gene for cystic fibrosis: A: Child ill, p = 1/4 6 / 28
Concepts Counting variables: How many times does event A occur in a series of trials? Discrete variables (measured only by means of whole numbers) Probability distributions for counting variables: What is the probability for each possible number of events A? 7 / 28
The probability for a certain sequence Say you do n trials, looking for an event A to occur with probability p in each trial The result is a sequence like A, Ā, Ā, A, Ā, A, A, Ā,..., A Now say that A take place x times, meaning n x occurrences of Ā What is the probability of such a sequence? 8 / 28
The probability for a certain sequence Say you do n trials, looking for an event A to occur with probability p in each trial The result is a sequence like A, Ā, Ā, A, Ā, A, A, Ā,..., A Now say that A take place x times, meaning n x occurrences of Ā What is the probability of such a sequence? Recall that probabilities for independent events may be multiplied! P(sequence above) = p(1 p)(1 p)p(1 p)pp(1 p)...p x number of p and n x number of (1 p): P(given sequence) = p x (1 p) n x 8 / 28
The probability for a certain sequence Say you do n trials, looking for an event A to occur with probability p in each trial The result is a sequence like A, Ā, Ā, A, Ā, A, A, Ā,..., A Now say that A take place x times, meaning n x occurrences of Ā What is the probability of such a sequence? Recall that probabilities for independent events may be multiplied! P(sequence above) = p(1 p)(1 p)p(1 p)pp(1 p)...p x number of p and n x number of (1 p): P(given sequence) = p x (1 p) n x But, you can get x successes out of n trials in many different orders. What about the probability of x? 8 / 28
The number of non-ordered selections Want to find the number of ways that x objects can be chosen from a total of n objects, regardless of order 9 / 28
The number of non-ordered selections Want to find the number of ways that x objects can be chosen from a total of n objects, regardless of order This number is given by the binomial coefficient ( n) x 9 / 28
The number of non-ordered selections Want to find the number of ways that x objects can be chosen from a total of n objects, regardless of order This number is given by the binomial coefficient ( n) x ( n x) = n (n 1) (n 2)... (n x+1) x! = n! x!(n x)!, where n (n 1) (n 2)... (n x + 1) is the number of ordered selections when picking x objects out of a total of n objects x! = x (x 1) (x 2)... 2 1 is the number of ways to order s objects (or the number of permutations of s objects) For example: ( 4) 3 = 4 3 2 1 2 3 = 4, and ( 10) 4 = 10 9 8 7 1 2 3 4 = 420 9 / 28
Three ways to calculate the binomial coefficient ( ) n x Use the formula ( n) x = n (n 1) (n 2)... (n x+1) x! = n! x!(n x)! where n! = 1 2 3... (n 1) n, and 0! = 1 10 / 28
Three ways to calculate the binomial coefficient ( ) n x Use the formula ( n) x = n (n 1) (n 2)... (n x+1) x! = n! x!(n x)! where n! = 1 2 3... (n 1) n, and 0! = 1 Use a calculator or computer. Usually the notation is ( n ) x = ncx(n, x) 10 / 28
Three ways to calculate the binomial coefficient ( ) n x Use the formula ( n) x = n (n 1) (n 2)... (n x+1) x! = n! x!(n x)! where n! = 1 2 3... (n 1) n, and 0! = 1 Use a calculator or computer. Usually the notation is ( n ) x = ncx(n, x) Use Pascal s triangle (see next slide) 10 / 28
Pascal s triangle ( n x) refers to row n, place number x + 1 from the left 11 / 28
Pascal s triangle ( n x) refers to row n, place number x + 1 from the left It can be shown directly from the definition (formula) of the binomial coefficient that ( n) ( x + n ) ( x+1 = n+1 ) x+1 11 / 28
Summary of the last four slides... We looked for an event A with probability p in n trials and got a sequence A, Ā, Ā, A, Ā, A, A, Ā,..., A. In this sequence, A took place x times and Ā took place (n x) times. The probability of that particular sequence was p x (1 p) n x The sequence can be orded in many different ways, all with the same probability The number of ways to sort the sequence is given by ( n) x We can now derive (or have already) the binomial distribution function, that is, the probability P(x) for every possible x. 12 / 28
Binomial distribution Binomial probability distribution We observe n trials. The probability that A occurs exactly x times is given by P(X = x) = ( n x) p x (1 p) n x, or P(X = x) = the number of ways to distribute x events A in a sequence of length n the probability of one particular sequence with x events A 13 / 28
Example: The binomial distribution for n = 8 and p = 0.15 Often written Binomial(n=8, p=0.15), or Bin(8,0.15) 14 / 28
Properties of probability distributions in general If you sum or integrate over all possible outcomes for a probability distribution, you should get 1 Not surprising, from the probability theory lecture! Most probability distributions have an expected value (corresponding to the mean) and a variance (or standard deviation) For the binomial distribution: E(X) = np Var(X) = np(1 p) 15 / 28
Example: binomial distribution for n = 8 and p = 0.15 (cont.) Let s say p = 0.15 is the probability that a person signing up to test for a certain disease get a positive result. A certain day you are going to do n = 8 such tests. What is the expected number of positive tests (the mean)? What is the variance and the standard deviation? What is the probability that you get 2 or more positives? 16 / 28
Histograms from a binomial distribution with n = 8 trials, and four different values of p 17 / 28
Example: Blood type (without using the formula) Consider three randomly sampled individuals. How many have blood type A? Independent individuals (random sampling) The only outcomes are bloodtype A or not bloodtype A Constant probabilities: P(A) = 0.4 and P(not A) = 1 0.4 = 0.6 18 / 28
All possible combinations for the three persons: Person 1 Person 2 Person 3 Probability A A A 0.4 0.4 0.4 = 0.064 A A Not A 0.4 0.4 0.6 = 0.096 A Not A A 0.4 0.6 0.4 = 0.096 Not A A A 0.6 0.4 0.4 = 0.096 A Not A Not A 0.4 0.6 0.6 = 0.144 Not A A Not A 0.6 0.4 0.6 = 0.144 Not A Not A A 0.6 0.6 0.4 = 0.144 Not A Not A Not A 0.6 0.6 0.6 = 0.216 = 1 19 / 28
The binomial distribution for the number of people with blood type A is then: Number of people Probability with blood type A 0 0.216 1 3 0.144 = 0.432 2 3 0.096 = 0.288 3 0.064 = 1 20 / 28
Example: Family with four kids We re looking at a family with four kids, with no monozygotic twins, so the gender of the kids are (approx.) independent. The probability of getting a boy in Norway is 0.514. What is the probability that two of the kids are boys? P(X = 2) = ( 4) 2 0.5142 0.486 2 = 4 3 2 1 0.5142 0.486 2 = 0.374 21 / 28
When the gender distribution of 7745 American families are given, together with the probability distribution from the binomial distribution we get this table: We observe a good agreement! 22 / 28
Example: Multiple choice exam Suppose a person knows absolutely nothing, and simply guesses on a 12 item test with three alternatives. What is their probability of passing the test, if 65% is the passing mark? You need 8 right to pass: 0.65 12 = 7.8 P(X = 8) = ( 12 8 ) 0.338 0.66 4 = 0.15 P(X = 9) = ( 12 9 ) 0.338 0.66 3 = 0.0033 P(X = 10) = ( 12 10) 0.338 0.66 2 = 0.00050 P(X = 11) = ( 12 11) 0.338 0.66 1 = 0.000045 P(X = 12) = ( 12) 0.338 0.66 0 = 0.0000019 The probability to pass is P(X 8) 0.15 23 / 28
Introduction to hypothesis testing Statistical hypothesis testing A method to draw conclusions from uncertain data, estimating the uncertainty in the conclusion Based on the computation of a particular probability, called the P value (The probability for the given result or a more extreme result to occur) We introduce hypothesis testing trough an example related to the binomial distribution (the binomial test). More about Hypothesis testing in general tomorrow. 24 / 28
Example: Clinical trial Want to try out a new medicine against migraine. Let s denote the new medicine N and the traditional one T. Each patient get N one month, and T another month. We want to find out which month the patient feels the best/have less migraine. The trial is randomized and made blind to make it fair. Cross over study: Eight patients try both medications in randomized order 25 / 28
Say that 7 of totally 8 patients preferred N Is N better than T? Let p be the probability of a patient preferring N Null hypothesis: H 0 : p = 1/2 (Both treatments equally good) Alternative hypothesis: H A : p > 1/2 (N is better) 26 / 28
If the null hypothesis holds, X is binomially distributed with n = 8 and p = 1/2 P value: P(X 7 H 0 ) = ( 8) 7 (1/2)7 (1/2) 1 + ( 8) 8 (1/2)8 (1/2) 0 = 0.035 = 3.5% Mindset: If P is small, this indicates that H 0 is not likely to be true We reject H 0 and accept H A if P is smaller than the level of significance. This is often set to 5%, or sometimes 1% or less With a 5% level of significance we would reject H 0 and say that N is a better drug than T 27 / 28
Summary Key words (Discrete) Probability distributions Binomial trials Counting variables The binomial coefficient The binomial distribution Notation x! ( n) x P(X = x) 28 / 28