Probability Distributions: Discrete

Similar documents
Probability Distributions: Discrete

Probability Distributions for Discrete RV

Binomial Random Variables. Binomial Random Variables

Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes

Chapter 3 Discrete Random Variables and Probability Distributions

STAT Mathematical Statistics

Some Discrete Distribution Families

Part V - Chance Variability

MA : Introductory Probability

Chapter 3 Discrete Random Variables and Probability Distributions

Binomial and Normal Distributions

The Binomial Distribution

4.2 Bernoulli Trials and Binomial Distributions

The Binomial distribution

Probability mass function; cumulative distribution function

Probability. An intro for calculus students P= Figure 1: A normal integral

The normal distribution is a theoretical model derived mathematically and not empirically.

The Binomial Probability Distribution

Binomial Distributions

AP Statistics Ch 8 The Binomial and Geometric Distributions

Lesson 97 - Binomial Distributions IBHL2 - SANTOWSKI

Opening Exercise: Lesson 91 - Binomial Distributions IBHL2 - SANTOWSKI

CPS-111:Tutorial 6. Discrete Probability II. Steve Gu Feb 22, 2008

Binomial Distribution and Discrete Random Variables

Stat 20: Intro to Probability and Statistics

EXERCISES ACTIVITY 6.7

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

Statistical Methods for NLP LT 2202

Bernoulli and Binomial Distributions

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

The Binomial Distribution

23.1 Probability Distributions

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Binomial and multinomial distribution

What do you think "Binomial" involves?

Math 361. Day 8 Binomial Random Variables pages 27 and 28 Inv Do you have ESP? Inv. 1.3 Tim or Bob?

Probability Models.S2 Discrete Random Variables

Statistics for Managers Using Microsoft Excel 7 th Edition

***SECTION 8.1*** The Binomial Distributions

CS145: Probability & Computing

Discrete Random Variables

MATH 264 Problem Homework I

Math 160 Professor Busken Chapter 5 Worksheets

Probability Basics. Part 1: What is Probability? INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder. March 1, 2017 Prof.

MATH1215: Mathematical Thinking Sec. 08 Spring Worksheet 9: Solution. x P(x)

Statistics 6 th Edition

6.1 Binomial Theorem

STAT 111 Recitation 2

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Math 14 Lecture Notes Ch. 4.3

x is a random variable which is a numerical description of the outcome of an experiment.

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Probability Distributions

STOR Lecture 7. Random Variables - I

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 4 Discrete Random variables

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom

Marquette University MATH 1700 Class 8 Copyright 2018 by D.B. Rowe

Binomial distribution

5. In fact, any function of a random variable is also a random variable

ECON 214 Elements of Statistics for Economists 2016/2017

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

Stat 211 Week Five. The Binomial Distribution

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

MATH 118 Class Notes For Chapter 5 By: Maan Omran

Chapter 8 Additional Probability Topics

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

The Binomial Distribution

The Binomial Distribution

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

LECTURE CHAPTER 3 DESCRETE RANDOM VARIABLE

Statistical Methods in Practice STAT/MATH 3379

Chapter 4 Discrete Random variables

Lecture 17: More on Markov Decision Processes. Reinforcement learning

VIDEO 1. A random variable is a quantity whose value depends on chance, for example, the outcome when a die is rolled.

Probability Distributions

Chapter 4 and 5 Note Guide: Probability Distributions

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Chapter 4. Section 4.1 Objectives. Random Variables. Random Variables. Chapter 4: Probability Distributions

Probability Distribution Unit Review

The binomial distribution

Binomial Coefficient

II - Probability. Counting Techniques. three rules of counting. 1multiplication rules. 2permutations. 3combinations

CS 237: Probability in Computing

We use probability distributions to represent the distribution of a discrete random variable.

5.1 Personal Probability

Chapter 6: Discrete Probability Distributions

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

6.3: The Binomial Model

Some Characteristics of Data

Learning Objec0ves. Statistics for Business and Economics. Discrete Probability Distribu0ons

Probability and Statistics for Engineers

Chapter 5. Statistical inference for Parametric Models

4 Random Variables and Distributions

Probability Distributions. Definitions Discrete vs. Continuous Mean and Standard Deviation TI 83/84 Calculator Binomial Distribution

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

The Bernoulli distribution

Transcription:

Probability Distributions: Discrete INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 19, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 1 of 6

Refresher: Random variables Random variables take on values in a sample space. This week we will focus on discrete random variables: Coin flip: {H,T } Number of times a coin lands heads after N flips: {0,1,2,...,N} Number of words in a document: Positive integers {1,2,...} Reminder: we denote the random variable with a capital letter; denote a outcome with a lower case letter. E.g., X is a coin flip, x is the value (H or T ) of that coin flip. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 2 of 6

Refresher: Discrete distributions A discrete distribution assigns a probability to every possible outcome in the sample space For example, if X is a coin flip, then P(X = H) = 0.5 P(X = T) = 0.5 Probabilities have to be greater than or equal to 0 and probabilities over the entire sample space must sum to one P(X = x) = 1 x INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 3 of 6

Mathematical Conventions 0! If n! = n (n 1)! then 0! = 1 if definition holds for n > 0. n 0 Example for 3: 3 2 =9 (1) 3 1 =3 (2) 3 1 = 1 3 (3) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 4 of 6

Mathematical Conventions n 0 Example for 3: 0! If n! = n (n 1)! then 0! = 1 if definition holds for n > 0. 3 2 =9 (1) 3 1 =3 (2) 3 0 =1 (3) 3 1 = 1 3 (4) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 4 of 6

Today: Types of discrete distributions There are many different types of discrete distributions, with different definitions. Today we ll look at the most common discrete distributions. And we ll introduce the concept of parameters. These discrete distributions (along with the continuous distributions next) are fundamental INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 5 of 6

Probability Distributions: Discrete INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 19, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 1 of 7

Bernoulli distribution A distribution over a sample space with two values: {0,1} Interpretation: 1 is success ; 0 is failure Example: coin flip (we let 1 be heads and 0 be tails ) A Bernoulli distribution can be defined with a table of the two probabilities: X denotes the outcome of a coin flip: P(X = 0) = 0.5 P(X = 1) = 0.5 X denotes whether or not a TV is defective: P(X = 0) = 0.995 P(X = 1) = 0.005 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 2 of 7

Bernoulli distribution Do we need to write out both probabilities? P(X = 0) = 0.995 P(X = 1) = 0.005 What if I only told you P(X = 1)? Or P(X = 0)? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 3 of 7

Bernoulli distribution Do we need to write out both probabilities? P(X = 0) = 0.995 P(X = 1) = 0.005 What if I only told you P(X = 1)? Or P(X = 0)? P(X = 0) = 1 P(X = 1) P(X = 1) = 1 P(X = 0) We only need one probability to define a Bernoulli distribution Usually the probability of success, P(X = 1). INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 3 of 7

Bernoulli distribution Another way of writing the Bernoulli distribution: Let θ denote the probability of success (0 θ 1). P(X = 0) = 1 θ P(X = 1) = θ An even more compact way to write this: P(X = x) = θ x (1 θ ) 1 x This is called a probability mass function. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 4 of 7

Probability mass functions A probability mass function (PMF) is a function that assigns a probability to every outcome of a discrete random variable X. Notation: f(x) = P(X = x) Compact definition Example: PMF for Bernoulli random variable X {0,1} f(x) = θ x (1 θ ) 1 x In this example, θ is called a parameter. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 5 of 7

Parameters Define the probability mass function Free parameters not constrained by the PMF. For example, the Bernoulli PMF could be written with two parameters: f(x) = θ x 1 θ 1 x 2 But θ 2 1 θ 1... only 1 free parameter. The complexity number of free parameters. Simpler models have fewer parameters. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 7

Sampling from a Bernoulli distribution How to randomly generate a value distributed according to a Bernoulli distribution? Algorithm: 1 Randomly generate a number between 0 and 1 r = random(0, 1) 2 If r < θ, return success Else, return failure INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 7

Probability Distributions: Discrete INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 19, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 1 of 8

Binomial distribution Bernoulli: distribution over two values (success or failure) from a single event binomial: number of successes from multiple Bernoulli events Examples: The number of times heads comes up after flipping a coin 10 times The number of defective TVs in a line of 10,000 TVs Important: each Bernoulli event is assumed to be independent Notation: let X be a random variable that describes the number of successes out of N trials. The possible values of X are integers from 0 to N: {0,1,2,...,N} INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 2 of 8

Binomial distribution Suppose we flip a coin 3 times. There are 8 possible outcomes: P(HHH) = P(H)P(H)P(H) = 0.125 P(HHT) = P(H)P(H)P(T) = 0.125 P(HTH) = P(H)P(T)P(H) = 0.125 P(HTT) = P(H)P(T)P(T) = 0.125 P(THH) = P(T)P(H)P(H) = 0.125 P(THT) = P(T)P(H)P(T) = 0.125 P(TTH) = P(T)P(T)P(H) = 0.125 P(TTT) = P(T)P(T)P(T) = 0.125 What is the probability of landing heads x times during these 3 flips? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 3 of 8

Binomial distribution What is the probability of landing heads x times during these 3 flips? 0 times: P(TTT) = 0.125 1 time: P(HTT) + P(THT) + P(TTH) = 0.375 2 times: P(HHT) + P(HTH) + P(THH) = 0.375 3 times: P(HHH) = 0.125 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 4 of 8

Binomial distribution The probability mass function for the binomial distribution is: f(x) = N x }{{} N choose x θ x (1 θ ) N x Like the Bernoulli, the binomial parameter θ is the probability of success from one event. Binomial has second parameter N: number of trials. The PMF important: difficult to figure out the entire distribution by hand. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 5 of 8

Aside: Binomial coefficients The expression ( n k ) is called a binomial coefficient. Also called a combination in combinatorics. ( n k ) is the number of ways to choose k elements from a set of n elements. For example, the number of ways to choose 2 heads from 3 coin flips: HHT, HTH, THH ( 3 ) = 2 3 Formula: n n! = k k!(n k)! Pascal s triangle depicts the values of ( n k ). INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 8

Bernoulli vs Binomial A Bernoulli distribution is a special case of the binomial distribution when N = 1. For this reason, sometimes the term binomial is used to refer to a Bernoulli random variable. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 8

Example Probability that a coin lands heads at least once during 3 flips? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Example Probability that a coin lands heads at least once during 3 flips? P(X 1) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Example Probability that a coin lands heads at least once during 3 flips? P(X 1) = P(X = 1) + P(X = 2) + P(X = 3) = 0.375 + 0.375 + 0.125 = 0.875 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Probability Distributions: Discrete INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 19, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 1 of 8

Categorical distribution Recall: the Bernoulli distribution is a distribution over two values (success or failure) categorical distribution generalizes Bernoulli distribution over any number of values Rolling a die Selecting a card from a deck AKA discrete distribution. Most general type of discrete distribution specify all (but one) of the probabilities in the distribution rather than the probabilities being determined by the probability mass function. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 2 of 8

Categorical distribution If the categorical distribution is over K possible outcomes, then the distribution has K parameters. We will denote the parameters with a K -dimensional vector θ. The probability mass function can be written as: f(x) = K k=1 θ [x=k] k where the expression [x = k] evaluates to 1 if the statement is true and 0 otherwise. All this really says is that the probability of outcome x is equal to θ x. The number of free parameters is K 1, since if you know K 1 of the parameters, the K th parameter is constrained to sum to 1. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 3 of 8

Categorical distribution Example: the roll of a (unweighted) die P(X = 1) = 1 6 P(X = 2) = 1 6 P(X = 3) = 1 6 P(X = 4) = 1 6 P(X = 5) = 1 6 P(X = 6) = 1 6 If all outcomes have equal probability, this is called the uniform distribution. General notation: P(X = x) = θ x INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 4 of 8

Sampling from a categorical distribution How to randomly select a value distributed according to a categorical distribution? The idea is similar to randomly selected a Bernoulli-distributed value. Algorithm: 1 Randomly generate a number between 0 and 1 r = random(0, 1) 2 For k = 1,...,K : Return smallest r s.t. r < k i=1 θ k INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 5 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 Random number in (0,1): r = 0.452383 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 Random number in (0,1): r = 0.452383 r < θ 1? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 Random number in (0,1): r = 0.452383 r < θ 1? r < θ 1 + θ 2? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 Random number in (0,1): r = 0.452383 r < θ 1? r < θ 1 + θ 2? r < θ 1 + θ 2 + θ 3? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 Random number in (0,1): r = 0.452383 r < θ 1? r < θ 1 + θ 2? r < θ 1 + θ 2 + θ 3? Return X = 3 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 Random number in (0,1): r = 0.117544 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 Random number in (0,1): r = 0.117544 r < θ 1? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 8

Sampling from a categorical distribution Example: simulating the roll of a die P(X = 1) = θ 1 = 0.166667 P(X = 2) = θ 2 = 0.166667 P(X = 3) = θ 3 = 0.166667 P(X = 4) = θ 4 = 0.166667 P(X = 5) = θ 5 = 0.166667 P(X = 6) = θ 6 = 0.166667 Random number in (0,1): r = 0.117544 r < θ 1? Return X = 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die Random number in (0,1): r = 0.209581 P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 Random number in (0,1): r = 0.209581 r < θ 1? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 Random number in (0,1): r = 0.209581 r < θ 1? r < θ 1 + θ 2? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 Random number in (0,1): r = 0.209581 r < θ 1? r < θ 1 + θ 2? r < θ 1 + θ 2 + θ 3? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 Random number in (0,1): r = 0.209581 r < θ 1? r < θ 1 + θ 2? r < θ 1 + θ 2 + θ 3? r < θ 1 + θ 2 + θ 3 + θ 4? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 Random number in (0,1): r = 0.209581 r < θ 1? r < θ 1 + θ 2? r < θ 1 + θ 2 + θ 3? r < θ 1 + θ 2 + θ 3 + θ 4? r < θ 1 + θ 2 + θ 3 + θ 4 + θ 5? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 Random number in (0,1): r = 0.209581 r < θ 1? r < θ 1 + θ 2? r < θ 1 + θ 2 + θ 3? r < θ 1 + θ 2 + θ 3 + θ 4? r < θ 1 + θ 2 + θ 3 + θ 4 + θ 5? r < θ 1 +θ 2 +θ 3 +θ 4 +θ 5 +θ 6? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 Random number in (0,1): r = 0.209581 r < θ 1? r < θ 1 + θ 2? r < θ 1 + θ 2 + θ 3? r < θ 1 + θ 2 + θ 3 + θ 4? r < θ 1 + θ 2 + θ 3 + θ 4 + θ 5? r < θ 1 +θ 2 +θ 3 +θ 4 +θ 5 +θ 6? Return X = 6 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Sampling from a categorical distribution Example 2: rolling a biased die P(X = 1) = θ 1 = 0.01 P(X = 2) = θ 2 = 0.01 P(X = 3) = θ 3 = 0.01 P(X = 4) = θ 4 = 0.01 P(X = 5) = θ 5 = 0.01 P(X = 6) = θ 6 = 0.95 Random number in (0,1): r = 0.209581 r < θ 1? r < θ 1 + θ 2? r < θ 1 + θ 2 + θ 3? r < θ 1 + θ 2 + θ 3 + θ 4? r < θ 1 + θ 2 + θ 3 + θ 4 + θ 5? r < θ 1 +θ 2 +θ 3 +θ 4 +θ 5 +θ 6? Return X = 6 We will always return X = 6 unless our random number r < 0.05. 6 is the most probable outcome INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 8 of 8

Probability Distributions: Discrete INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 19, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 1 of 7

Multinomial distribution Recall: the binomial distribution is the number of successes from multiple Bernoulli success/fail events The multinomial distribution is the number of different outcomes from multiple categorical events It is a generalization of the binomial distribution to more than two possible outcomes As with the binomial distribution, each categorical event is assumed to be independent Bernoulli : binomial :: categorical : multinomial Examples: The number of times each face of a die turned up after 50 rolls The number of times each suit is drawn from a deck of cards after 10 draws INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 2 of 7

Multinomial distribution Notation: let X be a vector of length K, where X k is a random variable that describes the number of times that the kth value was the outcome out of N categorical trials. The possible values of each X k are integers from 0 to N All X k values must sum to N: K k=1 X k = N Example: if we roll a die 10 times, suppose it comes up with the following values: X =< 1,0,3,2,1,3 > X 1 = 1 X 2 = 0 X 3 = 3 X 4 = 2 X 5 = 1 X 6 = 3 The multinomial distribution is a joint distribution over multiple random variables: P(X 1,X 2,...,X K ) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 3 of 7

Multinomial distribution Suppose we roll a die 3 times. There are 216 (6 3 ) possible outcomes: P(111) = P(1)P(1)P(1) = 0.00463 P(112) = P(1)P(1)P(2) = 0.00463 P(113) = P(1)P(1)P(3) = 0.00463 P(114) = P(1)P(1)P(4) = 0.00463 P(115) = P(1)P(1)P(5) = 0.00463 P(116) = P(1)P(1)P(6) = 0.00463......... P(665) = P(6)P(6)P(5) = 0.00463 P(666) = P(6)P(6)P(6) = 0.00463 What is the probability of a particular vector of counts after 3 rolls? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 4 of 7

Multinomial distribution What is the probability of a particular vector of counts after 3 rolls? Example 1: X =< 0,1,0,0,2,0 > INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 5 of 7

Multinomial distribution What is the probability of a particular vector of counts after 3 rolls? Example 1: X =< 0,1,0,0,2,0 > P( X) = P(255) + P(525) + P(552) = 0.01389 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 5 of 7

Multinomial distribution What is the probability of a particular vector of counts after 3 rolls? Example 1: X =< 0,1,0,0,2,0 > P( X) = P(255) + P(525) + P(552) = 0.01389 Example 2: X =< 0,0,1,1,1,0 > INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 5 of 7

Multinomial distribution What is the probability of a particular vector of counts after 3 rolls? Example 1: X =< 0,1,0,0,2,0 > P( X) = P(255) + P(525) + P(552) = 0.01389 Example 2: X =< 0,0,1,1,1,0 > P( X) = P(345) + P(354) + P(435) + P(453) + P(534) + P(543) = 0.02778 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 5 of 7

Multinomial distribution The probability mass function for the multinomial distribution is: N! K f( x) = K k=1 x k! k=1 }{{} Generalization of binomial coefficient Like categorical distribution, multinomial has a K -length parameter vector θ encoding the probability of each outcome. Like binomial, the multinomial distribution has a additional parameter N, which is the number of events. θ x k k INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 6 of 7

Multinomial distribution: summary Categorical distribution is multinomial when N = 1. Sampling from a multinomial: same code repeated N times. Remember that each categorical trial is independent. Question: Does this mean the count values (i.e., each X 1, X 2, etc.) are independent? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 7

Multinomial distribution: summary Categorical distribution is multinomial when N = 1. Sampling from a multinomial: same code repeated N times. Remember that each categorical trial is independent. Question: Does this mean the count values (i.e., each X 1, X 2, etc.) are independent? No! If N = 3 and X 1 = 2, then X 2 can be no larger than 1 (must sum to N). INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 7

Multinomial distribution: summary Categorical distribution is multinomial when N = 1. Sampling from a multinomial: same code repeated N times. Remember that each categorical trial is independent. Question: Does this mean the count values (i.e., each X 1, X 2, etc.) are independent? No! If N = 3 and X 1 = 2, then X 2 can be no larger than 1 (must sum to N). Remember this analogy: Bernoulli : binomial :: categorical : multinomial INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions: Discrete 7 of 7