Binomial Approximation and Joint Distributions Chris Piech CS109, Stanford University

Similar documents
The Normal Distribution

6. Continous Distributions

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

4 Random Variables and Distributions

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

IEOR 165 Lecture 1 Probability Review

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Central Limit Theorem, Joint Distributions Spring 2018

Commonly Used Distributions

Chapter 3 Discrete Random Variables and Probability Distributions

MA : Introductory Probability

Probability Distributions II

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Chapter 6 Continuous Probability Distributions. Learning objectives

Simulation Wrap-up, Statistics COS 323

Random Variable: Definition

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Chapter 4 Continuous Random Variables and Probability Distributions

II - Probability. Counting Techniques. three rules of counting. 1multiplication rules. 2permutations. 3combinations

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Chapter 4 Continuous Random Variables and Probability Distributions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

CS 237: Probability in Computing

PROBABILITY DISTRIBUTIONS

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

4.3 Normal distribution

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Converting to the Standard Normal rv: Exponential PDF and CDF for x 0 Chapter 7: expected value of x

STATISTICS and PROBABILITY

Statistics for Business and Economics

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Review. Binomial random variable

Session Window. Variable Name Row. Worksheet Window. Double click on MINITAB icon. You will see a split screen: Getting Started with MINITAB

Chapter 8: The Binomial and Geometric Distributions

15.063: Communicating with Data Summer Recitation 4 Probability III

Random Variables Handout. Xavier Vilà

Sampling Distribution

2017 Fall QMS102 Tip Sheet 2

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

15.063: Communicating with Data Summer Recitation 3 Probability II

Lecture III. 1. common parametric models 2. model fitting 2a. moment matching 2b. maximum likelihood 3. hypothesis testing 3a. p-values 3b.

Welcome to Stat 410!

MAS187/AEF258. University of Newcastle upon Tyne

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Engineering Statistics ECIV 2305

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

. (i) What is the probability that X is at most 8.75? =.875

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Econ 250 Fall Due at November 16. Assignment 2: Binomial Distribution, Continuous Random Variables and Sampling

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

What was in the last lecture?

Probability Distributions for Discrete RV

Sampling & populations

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Standard Normal, Inverse Normal and Sampling Distributions

Module 4: Probability

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

4.2 Bernoulli Trials and Binomial Distributions

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Lecture 2. Probability Distributions Theophanis Tsandilas

Introduction to Statistics I

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Favorite Distributions

Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 6 Normal Probability Distribution QMIS 120. Dr.

Chapter 5: Probability models

Discrete Probability Distributions and application in Business

STAT Chapter 7: Central Limit Theorem

Business Statistics 41000: Probability 3

CHAPTER 6 Random Variables

Useful Probability Distributions

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

Review of the Topics for Midterm I

Objective: To understand similarities and differences between geometric and binomial scenarios and to solve problems related to these scenarios.

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

The Bernoulli distribution

STAT 830 Convergence in Distribution

Statistical Models of Word Frequency and Other Count Data

5. In fact, any function of a random variable is also a random variable

Random Variables and Probability Functions

TOPIC: PROBABILITY DISTRIBUTIONS

Probability and Random Variables A FINANCIAL TIMES COMPANY

Binomial Distributions

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

Chapter 4 Probability Distributions

Probability. An intro for calculus students P= Figure 1: A normal integral

The binomial distribution p314

7. The random variable X is the number of cars entering the campus from 1 to 1:05 A.M. Assign probabilities according to the formula:

Introduction to Business Statistics QM 120 Chapter 6

Continuous Probability Distributions & Normal Distribution

March 21, Fractal Friday: Sign up & pay now!

Statistical Methods in Practice STAT/MATH 3379

Transcription:

Binomial Approximation and Joint Distributions Chris Piech CS109, Stanford University

Four Prototypical Trajectories Review

The Normal Distribution X is a Normal Random Variable: X ~ N(µ, s 2 ) Probability Density Function (PDF): 1 f ( x) = e s 2p E[X ] = µ 2 Var( X ) = s 2 2 -( x-µ ) / s Also called Gaussian Note: f(x) is symmetric about µ 2 where - < x f (x) < µ x

Simplicity is Humble µ Simple. Will generalize probability σ 2 value * A Gaussian maximizes entropy for a given mean and variance

Density vs Cumulative 1 CDF of a Normal F(x) PDF of a Normal f(x) -5 0 5 f(x) = derivative of probability F(x) = P(X < x)

Probability Density Function N (µ, 2 ) exponential the distance to the mean f(x) = 1 p 2 e (x µ) 2 2 2 probability density at x a constant sigma shows up twice

Cumulative Density Function N (µ, 2 ) CDF of Standard Normal: A function that has been solved for numerically F (x) = x µ The cumulative density function (CDF) of any normal Table of F(z) values in textbook, p. 201 and handout

Four Prototypical Trajectories Great questions!

Four Prototypical Trajectories 68% rule only for Gaussians?

68% Rule? What is the probability that a normal variable X N(µ, has a value within one standard deviation of its mean? µ µ X µ µ + µ P (µ <X<µ+ )=P < < = P ( 1 <Z<1) = (1) ( 1) = (1) [1 (1)] =2 (1) 1 = 2[0.8413] 1 = 0.683 Only applies to normal 2 )

68% Rule? Counter example: Uniform X Uni(, ) Var(X) = ( )2 12 = p Var(X) = p 12 1 2 P (µ <X<µ+ ) = 1 apple 2( ) p 12 = p 2 apple p 12 =0.58

Four Prototypical Trajectories How does python sample from a Gaussian?

from random import * for i in range(10): mean = 5 std = 1 sample = gauss(mean, std) print sample How does this work? 3.79317794179 5.19104589315 4.209360629 5.39633891584 7.10044176511 6.72655475942 5.51485158841 4.94570606131 6.14724644482 4.73774184354

How Does a Computer Sample Normal? Inverse Transform Sampling 1 CDF of the Standard Normal (x) -5 0 5

How Does a Computer Sample Normal? Step 1: pick a uniform number y between 0,1-5 Inverse Transform Sampling 1 0 Step 2: Find the x such that (x) =y x = 1 (y) CDF of the Standard Normal (x) Further reading: Box Muller transform 5

Continuous RV Relative Probability f(x) X = time to finish pset 3 X ~ N(10, 2) Time to finish pset 3 x How much more likely are you to complete in 10 hours than in 5? P (X = 10) P (X = 5) = = = = "f(x = 10) "f(x = 5) f(x = 10) f(x = 5) 1 p 2 2 p 1 2 2 1 p 4 e p 1 4 e 0 = e0 e 25 4 (10 µ)2 2 e 2 2 (5 µ)2 e 2 2 (10 10)2 4 (5 10)2 4 = 518

Four Prototypical Trajectories Imagine you are sitting a test

Website Testing 100 people are given a new website design X = # people whose time on site increases CEO will endorse new design if X 65 What is P(CEO endorses change it has no effect)? X ~ Bin(100, 0.5). Want to calculate P(X 65) Give a numerical answer X100 100 P (X 65) = (0.5) i (1 0.5) 100 i i i=65

Normal Approximates Binomial There is a deep reason for the Binomial/Normal similarity

Normal Approximates Binomial

Four Prototypical Trajectories Let s invent an approximation!

Website Testing 100 people are given a new website design X = # people whose time on site increases CEO will endorse new design if X 65 What is P(CEO endorses change it has no effect)? X ~ Bin(100, 0.5). Want to calculate P(X 65) P(X = x) What is variance? E[X] = 50 x

Website Testing 100 people are given a new website design X = # people whose time on site increases CEO will endorse new design if X 65 What is P(CEO endorses change it has no effect)? X ~ Bin(100, 0.5). Want to calculate P(X 65) np = 50 np( 1 p) = 25 np( 1 p) = 5 Use Normal approximation: Y ~ N(50, 25) P (Y 65) = P Y 50 5 > 65 50 5 = P (Z >3) = 1 (3) 0.0013 Using Binomial: P( X ³ 65)» 0. 0018

Website Testing P(X = x) E[X] = 50 x

Continuity Correction If Y (normal) approximates X (binomial) P (X 65) What about 64.9? P (Y 64.5) 0.0018 p(x) or f(x) Bin(100, 0.5) Normal(50, 25) 64 65 66 x

Continuity Correction If Y (normal) approximates X (binomial) Discrete (eg Binomial) probability question Continuous (Normal) probability question X = 6 5.5 < Y < 6.5 X >= 6 Y > 5.5 X > 6 Y > 6.5 X < 6 Y < 5.5 X <= 6 Y < 6.5 * Note: Binomial is always defined in units of 1

Comparison when n = 100, p = 0.5 P(X = k) k

Who Gets to Approximate? X ~ Bin(n, p) Poisson approx. n large (> 20), p small (< 0.05) Normal approx. n large (> 20), p is mid-ranged np(1-p) > 10 If there is a choice, go with the normal approximation

Stanford Admissions Stanford accepts 2050 students this year Each accepted student has 84% chance of attending X = # students who will attend. X ~ Bin(2050, 0.84) What is P(X > 1745)? np = 1722 np(1 p) = 276 p np(1 p) = 16.6 Use Normal approximation: Y ~ N(1722, 276) P (X >1745) P (Y > 1745.5) P (Y 1745.5) = P Y 1722 16.6 > 1745.5 1722 16.6 = P (Z >1.4) 0.08

Changes in Stanford Admissions Class of 2021 Admit Rates Lowest in University History Fewer students were admitted to the Class of 2021 than the Class of 2019, due to the increase in Stanford s yield rate which has increased over 5 percent in the past four years, according to Colleen Lim M.A. 80, Director of Undergraduate Admission. 68% 10 years ago 84% last year

Continuous Random Variables Uniform Random Variable X Uni(, ) All values of x between alpha and beta are equally likely. Normal Random Variable X N (µ, Aka Gaussian. Defined by mean and variance. Goldilocks distribution. 2 ) Exponential Random Variable X Exp( ) Time until an event happens. Parameterized by lambda (same as Poisson). Beta Random Variable How mysterious and curious. You must wait a few classes J.

Four Prototypical Trajectories Joint Distributions

CS109 Joint Go to this URL: https://goo.gl/jh3eu4

Four Prototypical Trajectories Events occur with other events

Probability Table for Discrete States all possible outcomes with several discrete variables A probability table is not parametric If #variables is > 2, you can have a probability table, but you can t draw it on a slide All values of A 0 1 2 All values of B 0 1 2 P(A = 1, B = 1) Here, means and Every outcome falls into a bucket

Discrete Joint Mass Function For two discrete random variables X and Y, the Joint Probability Mass Function is: p X Y = Marginal distributions:, ( a, b) = P( X = a, Y b) p = = = å X ( a) P( X a) px, Y ( a, y) Example: X = value of die D 1, Y = value of die D 2 y =å p b) = P( Y = b) p ( x, b) Y ( X, Y x 6 6 p X, Y (1, y) = y= 1 y= 1 = å å 1 36 P( X 1) = = 1 6

A Computer (or Three) In Every House Consider households in Silicon Valley A household has X Macs and Y PCs Can t have more than 3 Macs or 3 PCs Y X 0 1 2 3 p Y (y) 0 0.16 0.12? 0.04 1 0.12 0.14 0.12 0 2 0.07 0.12 0 0 3 0.04 0 0 0 p X (x)

A Computer (or Three) In Every House Consider households in Silicon Valley A household has X Macs and Y PCs Can t have more than 3 Macs or 3 PCs Y X 0 1 2 3 p Y (y) 0 0.16 0.12 0.07 0.04 1 0.12 0.14 0.12 0 2 0.07 0.12 0 0 3 0.04 0 0 0 p X (x)

A Computer (or Three) In Every House Consider households in Silicon Valley A household has X Macs and Y PCs Can t have more than 3 Macs or 3 PCs Y X 0 1 2 3 p Y (y) 0 0.16 0.12 0.07 0.04 0.39 1 0.12 0.14 0.12 0 0.38 2 0.07 0.12 0 0 0.19 3 0.04 0 0 0 0.04 p X (x) 0.39 0.38 0.19 0.04 1.00 Marginal distributions

CS109 Joint Results Go to this URL: https://goo.gl/jh3eu4

Four Prototypical Trajectories Way Back

Permutations How many ways are there to order n distinct objects? n!

Binomial How many ways are there to make an unordered selection of r objects from n objects? How many ways are there to order n objects such that: r are the same (indistinguishable) (n r) are the same (indistinguishable)? r!(n n! r)! = n r Called the binomial because of something from Algebra

Multinomial How many ways are there to order n objects such that: n 1 are the same (indistinguishable) n 2 are the same (indistinguishable) n r are the same (indistinguishable)? n! n 1!n 2!...n r! = n n 1,n 2,...,n r Note: Multinomial > Binomial

Binomial Distribution Consider n independent trials of Ber(p) rand. var. X is number of successes in n trials X is a Binomial Random Variable: X ~ Bin(n, p) Probability of exactly i successes Binomial # ways of ordering the successes ænö i n-i P( X = i) = p( i) = ç p (1 - p) i = è i ø 0,1,..., n Probability of each ordering of i successes is equal + mutually exclusive

Four Prototypical Trajectories End Way Back

The Multinomial Multinomial distribution n independent trials of experiment performed Each trial results in one of m outcomes, with respective probabilities: p 1, p 2,, p m where X i = number of trials with outcome i m å i = 1 p i = 1 P c1 c2 ( X = c1, X 2 = c2,..., X m = cm) = ç p1 p2 c1, c2,..., cm where æ n 1 ö ç è ø Joint distribution m åci = i=1 n and Multinomial # ways of ordering the successes æ ç èc, c n,..., c ö ø = c! c n!! c 1 2 m 1 2 m!... p c m m Probabilities of each ordering are equal and mutually exclusive

6-sided die is rolled 7 times Roll results: 1 one, 1 two, 0 three, 2 four, 0 five, 3 six This is generalization of Binomial distribution Binomial: each trial had 2 possible outcomes Multinomial: each trial has m possible outcomes 7 3 0 2 0 1 1 6 5 4 3 2 1 6 1 420 6 1 6 1 6 1 6 1 6 1 6 1 1!1!0!2!0!3! 7! 3) 0, 2, 0, 1, 1, ( ø ö ç è æ = ø ö ç è æ ø ö ç è æ ø ö ç è æ ø ö ç è æ ø ö ç è æ ø ö ç è æ = = = = = = = X X X X X X P Hello Die Rolls, My Old Friends

Probabilistic Text Analysis Ignoring order of words, what is probability of any given word you write in English? P(word = the ) > P(word = transatlantic ) P(word = Stanford ) > P(word = Cal ) Probability of each word is just multinomial distribution What about probability of those same words in someone else s writing? P(word = probability writer = you) > P(word = probability writer = non-cs109 student) After estimating P(word writer) from known writings, use Bayes Theorem to determine P(writer word) for new writings!

A Document is a Large Multinomial According to the Global Language Monitor there are 988,968 words in the english language used on the internet. The

Text is a Multinomial P Example document: Pay for Viagra with a credit-card. Viagra is great. So are credit-cards. Risk free Viagra. Click for free. n = 18 Viagra = 2 Free = 2 Risk = 1 Credit-card: 2 For = 2 Probability of seeing this document spam spam = It s a Multinomial! n! 2!2!...2! p2 viagrap 2 free...p 2 for The probability of a word in spam email being viagra

Four Prototypical Trajectories Who wrote the federalist papers?

Old and New Analysis Authorship of Federalist Papers 85 essays advocating ratification of US constitution Written under pseudonym Publius o Really, Alexander Hamilton, James Madison and John Jay Who wrote which essays? o Analyzed probability of words in each essay versus word distributions from known writings of three authors

Four Prototypical Trajectories Let s write a program!