Distributions and Intro to Likelihood

Similar documents
Intro to Likelihood. Gov 2001 Section. February 2, Gov 2001 Section () Intro to Likelihood February 2, / 44

Stochastic Components of Models

Review. Binomial random variable

TOPIC: PROBABILITY DISTRIBUTIONS

Statistics/BioSci 141, Fall 2006 Lab 2: Probability and Probability Distributions October 13, 2006

The Binomial Distribution

Statistics 6 th Edition

The Binomial Distribution

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

Probability. An intro for calculus students P= Figure 1: A normal integral

5.2 Random Variables, Probability Histograms and Probability Distributions

Chapter 6: Random Variables and Probability Distributions

4 Random Variables and Distributions

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

GOV 2001/ 1002/ E-200 Section 3 Inference and Likelihood

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Chapter 5: Probability

The normal distribution is a theoretical model derived mathematically and not empirically.

Back to estimators...

Simple Random Sample

Business Statistics 41000: Probability 4

CSSS/SOC/STAT 321 Case-Based Statistics I. Random Variables & Probability Distributions I: Discrete Distributions

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Statistics for Managers Using Microsoft Excel 7 th Edition

Chapter 5. Sampling Distributions

II - Probability. Counting Techniques. three rules of counting. 1multiplication rules. 2permutations. 3combinations

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

CS 361: Probability & Statistics

Conjugate Models. Patrick Lam

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Chapter 4 and 5 Note Guide: Probability Distributions

Statistics and Probability

Lecture III. 1. common parametric models 2. model fitting 2a. moment matching 2b. maximum likelihood 3. hypothesis testing 3a. p-values 3b.

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Theoretical Foundations

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

Lecture 2. Probability Distributions Theophanis Tsandilas

Lab #7. In previous lectures, we discussed factorials and binomial coefficients. Factorials can be calculated with:

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Lecture 3: Probability Distributions (cont d)

Normal Approximation to Binomial Distributions

BIOL The Normal Distribution and the Central Limit Theorem

Chapter 9 & 10. Multiple Choice.

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

6. Continous Distributions

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Binomial and Normal Distributions

Law of Large Numbers, Central Limit Theorem

A useful modeling tricks.

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

LAB 2 Random Variables, Sampling Distributions of Counts, and Normal Distributions

5.1 Personal Probability

Business Statistics 41000: Probability 3

But suppose we want to find a particular value for y, at which the probability is, say, 0.90? In other words, we want to figure out the following:

Section 0: Introduction and Review of Basic Concepts

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

Chapter 7: Estimation Sections

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

AP Statistics Chapter 6 - Random Variables

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

It is common in the field of mathematics, for example, geometry, to have theorems or postulates

Chapter 7: Estimation Sections

MVE051/MSG Lecture 7

Probability Models. Grab a copy of the notes on the table by the door

4. Basic distributions with R

Lecture Stat 302 Introduction to Probability - Slides 15

Random Variables Handout. Xavier Vilà

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Unit 6 Bernoulli and Binomial Distributions Homework SOLUTIONS

Chapter 3 Discrete Random Variables and Probability Distributions

Commonly Used Distributions

MA : Introductory Probability

The Mathematics of Normality

What was in the last lecture?

Random Variable: Definition

Lecture 9. Probability Distributions. Outline. Outline

MATH 112 Section 7.3: Understanding Chance

Data Analysis and Statistical Methods Statistics 651

Random Variables and Probability Functions

Learning Objec0ves. Statistics for Business and Economics. Discrete Probability Distribu0ons

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 7: Random Variables

Math 243 Section 4.3 The Binomial Distribution

Lecture 9. Probability Distributions

4.3 Normal distribution

Binomial Random Variable - The count X of successes in a binomial setting

Binomial Random Variables. Binomial Random Variables

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Statistical Methods in Practice STAT/MATH 3379

Stat 211 Week Five. The Binomial Distribution

List of Online Quizzes: Quiz7: Basic Probability Quiz 8: Expectation and sigma. Quiz 9: Binomial Introduction Quiz 10: Binomial Probability

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models

CS 4100 // artificial intelligence

Deriving the Black-Scholes Equation and Basic Mathematical Finance

Lecture Data Science

Stat 20: Intro to Probability and Statistics

Transcription:

Distributions and Intro to Likelihood Gov 2001 Section February 4, 2010

Outline Meet the Distributions! Discrete Distributions Continuous Distributions Basic Likelihood

Why should we become familiar with these distributions?

Why should we become familiar with these distributions? Part of the point of this class is to get you to fit models to a variety of data.

Why should we become familiar with these distributions? Part of the point of this class is to get you to fit models to a variety of data. But the first step is recognizing what kind of data you are working with.

Why should we become familiar with these distributions? Part of the point of this class is to get you to fit models to a variety of data. But the first step is recognizing what kind of data you are working with. If you see that your data are Poisson, Binomial, Normal, etc., then you can analyze the data using a model (likelihood or Bayesian) appropriate for that data.

So learning about the distributions is a bit like eating your spinach! It s not pleasant, but it s really useful.

So learning about the distributions is a bit like eating your spinach! It s not pleasant, but it s really useful. It s a lot better for you than forcing on the data a distribution assumption that doesn t make sense (e.g., assuming the data are normal and then using OLS).

So learning about the distributions is a bit like eating your spinach! It s not pleasant, but it s really useful. It s a lot better for you than forcing on the data a distribution assumption that doesn t make sense (e.g., assuming the data are normal and then using OLS). What s the best way to learn the distributions? Learn the stories behind them.

So learning about the distributions is a bit like eating your spinach! It s not pleasant, but it s really useful. It s a lot better for you than forcing on the data a distribution assumption that doesn t make sense (e.g., assuming the data are normal and then using OLS). What s the best way to learn the distributions? Learn the stories behind them. Remember that you can always look up the specs of the distributions later just focus on trying to identify them for now.

Outline Meet the Distributions! Discrete Distributions Continuous Distributions Basic Likelihood

Outline Meet the Distributions! Discrete Distributions Continuous Distributions Basic Likelihood

The Bernoulli Distribution Takes value 1 with success probability π and value 0 with failure probability 1 π. Ideal for modelling one-time yes/no (or success/failure) events. The best example is one coin flip if your data resemble a single coin flip, then you have a Bernoulli distribution.

The Bernoulli Distribution Takes value 1 with success probability π and value 0 with failure probability 1 π. Ideal for modelling one-time yes/no (or success/failure) events. The best example is one coin flip if your data resemble a single coin flip, then you have a Bernoulli distribution. ex) one voter voting yes/no

The Bernoulli Distribution Takes value 1 with success probability π and value 0 with failure probability 1 π. Ideal for modelling one-time yes/no (or success/failure) events. The best example is one coin flip if your data resemble a single coin flip, then you have a Bernoulli distribution. ex) one voter voting yes/no ex) one person being either a man/woman

The Bernoulli Distribution Takes value 1 with success probability π and value 0 with failure probability 1 π. Ideal for modelling one-time yes/no (or success/failure) events. The best example is one coin flip if your data resemble a single coin flip, then you have a Bernoulli distribution. ex) one voter voting yes/no ex) one person being either a man/woman ex) the New Orleans Saints winning/losing the Super Bowl

The Bernoulli Distribution Y Bernoulli(π) y = 0, 1 probability of success: π [0, 1] p(y π) = π y (1 π) (1 y) E(Y ) = π Var(Y ) = π(1 π)

The Bernoulli Distribution Y Bernoulli(π) Bernoulli Distribution y = 0, 1 probability of success: π [0, 1] p(y π) = π y (1 π) (1 y) E(Y ) = π p(y π) 0.0 0.2 0.4 0.6 0.8 1.0 Bernoulli(.3) Bernoulli(.5) Bernoulli(.7) 0 1 Var(Y ) = π(1 π) y

The Binomial Distribution Let s say you run a bunch of Bernoulli trials, and, instead of seeing the result of each trial separately, you just see the grand total.

The Binomial Distribution Let s say you run a bunch of Bernoulli trials, and, instead of seeing the result of each trial separately, you just see the grand total. So, for example, you flip a coin three times and count the total number of heads you got. (The order doesn t matter.)

The Binomial Distribution Let s say you run a bunch of Bernoulli trials, and, instead of seeing the result of each trial separately, you just see the grand total. So, for example, you flip a coin three times and count the total number of heads you got. (The order doesn t matter.) This is the Binomial. It s ideal for modelling repeated yes/no (or success/failure) events.

The Binomial Distribution Let s say you run a bunch of Bernoulli trials, and, instead of seeing the result of each trial separately, you just see the grand total. So, for example, you flip a coin three times and count the total number of heads you got. (The order doesn t matter.) This is the Binomial. It s ideal for modelling repeated yes/no (or success/failure) events. ex) the number of women in a group of 10 Harvard students

The Binomial Distribution Let s say you run a bunch of Bernoulli trials, and, instead of seeing the result of each trial separately, you just see the grand total. So, for example, you flip a coin three times and count the total number of heads you got. (The order doesn t matter.) This is the Binomial. It s ideal for modelling repeated yes/no (or success/failure) events. ex) the number of women in a group of 10 Harvard students ex) the number of rainy days in the seven week

The Binomial Distribution Y Binomial(n, π) y = 0, 1,..., n number of trials: n {1, 2,... } probability of success: π [0, 1] p(y π) = ( ) n y π y (1 π) (n y) E(Y ) = nπ Var(Y ) = nπ(1 π)

The Binomial Distribution Y Binomial(n, π) Binomial Distribution y = 0, 1,..., n number of trials: n {1, 2,... } probability of success: π [0, 1] p(y π) = ( ) n y π y (1 π) (n y) p(y n, π) 0.0 0.1 0.2 0.3 0.4 0.5 Binomial(20,.3) Binomial(20,.5) Binomial(20,.9) E(Y ) = nπ 0 5 10 15 20 y Var(Y ) = nπ(1 π)

The Multinomial Distribution Suppose you had more than just two outcomes e.g., vote for Republican, Democrat, or Independent. Can you use a binomial?

The Multinomial Distribution Suppose you had more than just two outcomes e.g., vote for Republican, Democrat, or Independent. Can you use a binomial? We can t use a binomial, because a binomial requires two outcomes(yes/no, 1/0, etc.). Instead, we use the multinomial. Multinomial lets you work with several mutually exclusive outcomes.

The Multinomial Distribution Suppose you had more than just two outcomes e.g., vote for Republican, Democrat, or Independent. Can you use a binomial? We can t use a binomial, because a binomial requires two outcomes(yes/no, 1/0, etc.). Instead, we use the multinomial. Multinomial lets you work with several mutually exclusive outcomes. ex) you toss a die 15 times and get outcomes 1-6

The Multinomial Distribution Suppose you had more than just two outcomes e.g., vote for Republican, Democrat, or Independent. Can you use a binomial? We can t use a binomial, because a binomial requires two outcomes(yes/no, 1/0, etc.). Instead, we use the multinomial. Multinomial lets you work with several mutually exclusive outcomes. ex) you toss a die 15 times and get outcomes 1-6 ex) ten undergraduate students are classified freshmen, sophomores, juniors, or seniors

The Multinomial Distribution Suppose you had more than just two outcomes e.g., vote for Republican, Democrat, or Independent. Can you use a binomial? We can t use a binomial, because a binomial requires two outcomes(yes/no, 1/0, etc.). Instead, we use the multinomial. Multinomial lets you work with several mutually exclusive outcomes. ex) you toss a die 15 times and get outcomes 1-6 ex) ten undergraduate students are classified freshmen, sophomores, juniors, or seniors ex) Gov graduate students divided into either American, Comparative, Theory, or IR

The Multinomial Distribution Y Multinomial(n, π 1,..., π k ) y j = 0, 1,..., n; k j=1 y j = n number of trials: n {1, 2,... } probability of success for j: π j [0, 1]; k j=1 π j = 1 n! p(y n, π) = y 1!y 2!...y k! πy 1 1 πy 2 2... πy k k E(Y j ) = nπ j Var(Y j ) = nπ j (1 π j ) Cov(Y i, Y j ) = nπ i π j

The Poisson Distribution Represents the number of events occurring in a fixed period of time. Can also be used for the number of events in other specified intervals such as distance, area, or volume. Can never be negative so, good for modeling events.

The Poisson Distribution Represents the number of events occurring in a fixed period of time. Can also be used for the number of events in other specified intervals such as distance, area, or volume. Can never be negative so, good for modeling events. ex) the number Prussian solders who died each year by being kicked in the head by a horse (Bortkiewicz, 1898)

The Poisson Distribution Represents the number of events occurring in a fixed period of time. Can also be used for the number of events in other specified intervals such as distance, area, or volume. Can never be negative so, good for modeling events. ex) the number Prussian solders who died each year by being kicked in the head by a horse (Bortkiewicz, 1898) ex) the of number shark attacks in Australia per month

The Poisson Distribution Represents the number of events occurring in a fixed period of time. Can also be used for the number of events in other specified intervals such as distance, area, or volume. Can never be negative so, good for modeling events. ex) the number Prussian solders who died each year by being kicked in the head by a horse (Bortkiewicz, 1898) ex) the of number shark attacks in Australia per month ex) the number of search warrant requests a federal judge hears in one year

The Poisson Distribution Y Poisson(λ) y = 0, 1,... expected number of occurrences: λ > 0 p(y λ) = e λ λ y y! E(Y ) = λ Var(Y ) = λ

The Poisson Distribution Y Poisson(λ) Poisson Distribution y = 0, 1,... expected number of occurrences: λ > 0 p(y λ) = e λ λ y y! p(y λ) 0.0 0.1 0.2 0.3 0.4 0.5 Poisson(2) Poisson(10) Poisson(20) E(Y ) = λ 0 10 20 30 40 50 y Var(Y ) = λ

Outline Meet the Distributions! Discrete Distributions Continuous Distributions Basic Likelihood

The Univariate Normal Distribution Probably the one distribution you are already familiar with describes data that cluster in a bell curve around the mean.

The Univariate Normal Distribution Probably the one distribution you are already familiar with describes data that cluster in a bell curve around the mean. A lot of naturally occurring processes are normally distributed.

The Univariate Normal Distribution Probably the one distribution you are already familiar with describes data that cluster in a bell curve around the mean. A lot of naturally occurring processes are normally distributed. ex) the weights of male students in our class

The Univariate Normal Distribution Probably the one distribution you are already familiar with describes data that cluster in a bell curve around the mean. A lot of naturally occurring processes are normally distributed. ex) the weights of male students in our class ex) high school students SAT scores

The Univariate Normal Distribution Y Normal(µ, σ 2 ) y R mean: µ R variance: σ 2 > 0 p(y µ, σ 2 ) = E(Y ) = µ Var(Y ) = σ 2 ( ) exp (y µ)2 2σ 2 σ 2π

The Univariate Normal Distribution Y Normal(µ, σ 2 ) Normal Distribution y R mean: µ R variance: σ 2 > 0 p(y µ, σ 2 ) = ( ) exp (y µ)2 2σ 2 σ 2π p(y µ, σ 2 ) 0.0 0.5 1.0 1.5 2.0 Normal(0,1) Normal(2,1) Normal(0,.25) E(Y ) = µ 4 2 0 2 4 y Var(Y ) = σ 2

The Uniform Distribution Any number in the interval you chose is equally probable.

The Uniform Distribution Any number in the interval you chose is equally probable. Intuitively easy to understand, but hard to come up with examples. (Easier to think of discrete uniform examples.)

The Uniform Distribution Any number in the interval you chose is equally probable. Intuitively easy to understand, but hard to come up with examples. (Easier to think of discrete uniform examples.) ex) the numbers that come out of random number generators

The Uniform Distribution Any number in the interval you chose is equally probable. Intuitively easy to understand, but hard to come up with examples. (Easier to think of discrete uniform examples.) ex) the numbers that come out of random number generators ex) rolling 1-6 in a die roll (discrete)

The Uniform Distribution Any number in the interval you chose is equally probable. Intuitively easy to understand, but hard to come up with examples. (Easier to think of discrete uniform examples.) ex) the numbers that come out of random number generators ex) rolling 1-6 in a die roll (discrete) ex) the lottery tumblers out of which a person draws one ball with a number on it (also discrete)

The Uniform Distribution Y Uniform(α, β) y [α, β] Interval: [α, β]; β > α p(y α, β) = 1 β α E(Y ) = α+β 2 Var(Y ) = (β α)2 12

Quiz: Test Your Knowledge of Discrete Distributions Are the following Bernoulli (coin flip), Binomial(several coin flips), Multinomial (Rep, Dem, Indep), Poisson (Prussian soldier deaths), Normal (SAT scores), or Uniform (die)?

Quiz: Test Your Knowledge of Discrete Distributions Are the following Bernoulli (coin flip), Binomial(several coin flips), Multinomial (Rep, Dem, Indep), Poisson (Prussian soldier deaths), Normal (SAT scores), or Uniform (die)? The heights of trees on campus?

Quiz: Test Your Knowledge of Discrete Distributions Are the following Bernoulli (coin flip), Binomial(several coin flips), Multinomial (Rep, Dem, Indep), Poisson (Prussian soldier deaths), Normal (SAT scores), or Uniform (die)? The heights of trees on campus? The number of airplane crashes in one year?

Quiz: Test Your Knowledge of Discrete Distributions Are the following Bernoulli (coin flip), Binomial(several coin flips), Multinomial (Rep, Dem, Indep), Poisson (Prussian soldier deaths), Normal (SAT scores), or Uniform (die)? The heights of trees on campus? The number of airplane crashes in one year? A yes or no vote cast by Senator Brown?

Quiz: Test Your Knowledge of Discrete Distributions Are the following Bernoulli (coin flip), Binomial(several coin flips), Multinomial (Rep, Dem, Indep), Poisson (Prussian soldier deaths), Normal (SAT scores), or Uniform (die)? The heights of trees on campus? The number of airplane crashes in one year? A yes or no vote cast by Senator Brown? The number of parking tickets Cambridge PD gives out in one month?

Quiz: Test Your Knowledge of Discrete Distributions Are the following Bernoulli (coin flip), Binomial(several coin flips), Multinomial (Rep, Dem, Indep), Poisson (Prussian soldier deaths), Normal (SAT scores), or Uniform (die)? The heights of trees on campus? The number of airplane crashes in one year? A yes or no vote cast by Senator Brown? The number of parking tickets Cambridge PD gives out in one month? The poll your Facebook friends took to choose their favorite sport out of football, basketball, and soccer

Outline Meet the Distributions! Discrete Distributions Continuous Distributions Basic Likelihood

Likelihood The whole point of likelihood is to leverage information about the data generating process into our inferences.

Likelihood The whole point of likelihood is to leverage information about the data generating process into our inferences. Here are the basic steps:

Likelihood The whole point of likelihood is to leverage information about the data generating process into our inferences. Here are the basic steps: Think about your data generating process. (What do the data look like? Use your substantive knowledge.)

Likelihood The whole point of likelihood is to leverage information about the data generating process into our inferences. Here are the basic steps: Think about your data generating process. (What do the data look like? Use your substantive knowledge.) Find a distribution that you think explains the data. (Poisson, Binomial, Normal? Something else?)

Likelihood The whole point of likelihood is to leverage information about the data generating process into our inferences. Here are the basic steps: Think about your data generating process. (What do the data look like? Use your substantive knowledge.) Find a distribution that you think explains the data. (Poisson, Binomial, Normal? Something else?) Derive the likelihood.

Likelihood The whole point of likelihood is to leverage information about the data generating process into our inferences. Here are the basic steps: Think about your data generating process. (What do the data look like? Use your substantive knowledge.) Find a distribution that you think explains the data. (Poisson, Binomial, Normal? Something else?) Derive the likelihood. If you want, maximize the likelihood to get the MLE.

Likelihood The whole point of likelihood is to leverage information about the data generating process into our inferences. Here are the basic steps: Think about your data generating process. (What do the data look like? Use your substantive knowledge.) Find a distribution that you think explains the data. (Poisson, Binomial, Normal? Something else?) Derive the likelihood. If you want, maximize the likelihood to get the MLE. Note: This is the case in the univariate context. We ll be introducting covariates later on in the term.

Likelihood: An Example Let s walk through an example.

Likelihood: An Example Let s walk through an example. Suppose I am a lawyer and I want to study the rate of convictions in Massachusetts. Here is my data:

Likelihood: An Example Let s walk through an example. Suppose I am a lawyer and I want to study the rate of convictions in Massachusetts. Here is my data: There are 100 cases in my data set. In each a defendant is either found innocent or guilty.

Likelihood: An Example Let s walk through an example. Suppose I am a lawyer and I want to study the rate of convictions in Massachusetts. Here is my data: There are 100 cases in my data set. In each a defendant is either found innocent or guilty. I observe that defendants are found innocent in 65 of them.

Likelihood: An Example Let s walk through an example. Suppose I am a lawyer and I want to study the rate of convictions in Massachusetts. Here is my data: There are 100 cases in my data set. In each a defendant is either found innocent or guilty. I observe that defendants are found innocent in 65 of them. And they are found guilty in 35 of them.

Likelihood: An Example Let s walk through an example. Suppose I am a lawyer and I want to study the rate of convictions in Massachusetts. Here is my data: There are 100 cases in my data set. In each a defendant is either found innocent or guilty. I observe that defendants are found innocent in 65 of them. And they are found guilty in 35 of them. These data follow what distribution?

Likelihood: An Example (ctd) So our data are binomial. Now what do we do?

Likelihood: An Example (ctd) So our data are binomial. Now what do we do? Look up the appropriate PDF. (If you are unsure, talk to your friends, look at Wikipedia, look at a probability textbook.)

Likelihood: An Example (ctd) So our data are binomial. Now what do we do? Look up the appropriate PDF. (If you are unsure, talk to your friends, look at Wikipedia, look at a probability textbook.) We know from lecture that the binomial PDF is p(y π) = ( ) n y π y (1 π) (n y)

Likelihood: An Example (ctd) So our data are binomial. Now what do we do? Look up the appropriate PDF. (If you are unsure, talk to your friends, look at Wikipedia, look at a probability textbook.) We know from lecture that the binomial PDF is p(y π) = ( ) n y π y (1 π) (n y) Here, n = 100 and y = 35. Note that y is the number of successes, here the number of guilty defendants.

Likelihood: An Example (ctd) So our data are binomial. Now what do we do? Look up the appropriate PDF. (If you are unsure, talk to your friends, look at Wikipedia, look at a probability textbook.) We know from lecture that the binomial PDF is p(y π) = ( ) n y π y (1 π) (n y) Here, n = 100 and y = 35. Note that y is the number of successes, here the number of guilty defendants. Plugging in this info gives us p(y π) = ( ) 100 35 π 35 (1 π) (100 35)

Likelihood: An Example (ctd) Next, let s calculate the likelihood function. Where does the likelihood come from?

Likelihood: An Example (ctd) Next, let s calculate the likelihood function. Where does the likelihood come from? From Bayes Rule, we get p(π y) = p(y π)p(π) p(y)

Likelihood: An Example (ctd) Next, let s calculate the likelihood function. Where does the likelihood come from? From Bayes Rule, we get p(π y) = p(y π)p(π) p(y) Let k(y) = p(π) p(y)

Likelihood: An Example (ctd) Next, let s calculate the likelihood function. Where does the likelihood come from? From Bayes Rule, we get p(π y) = p(y π)p(π) p(y) Let k(y) = p(π) p(y) Note that the π in k(y) is the true π, a constant that doesn t vary. So k(y) is just a function of y.

Likelihood: An Example (ctd) Next, let s calculate the likelihood function. Where does the likelihood come from? From Bayes Rule, we get p(π y) = p(y π)p(π) p(y) Let k(y) = p(π) p(y) Note that the π in k(y) is the true π, a constant that doesn t vary. So k(y) is just a function of y. Define L(π y) = p(y π)k(y)

Likelihood: An Example (ctd) Next, let s calculate the likelihood function. Where does the likelihood come from? From Bayes Rule, we get p(π y) = p(y π)p(π) p(y) Let k(y) = p(π) p(y) Note that the π in k(y) is the true π, a constant that doesn t vary. So k(y) is just a function of y. Define L(π y) = p(y π)k(y) L(π y) p(y π)

Likelihood: An Example (ctd) So here are our steps:

Likelihood: An Example (ctd) So here are our steps: First, ) we got p(y π) from the Biomial PDF: π 35 (1 π) (100 35) ( 100 35

Likelihood: An Example (ctd) So here are our steps: First, ) we got p(y π) from the Biomial PDF: π 35 (1 π) (100 35) ( 100 35 Second, we derived the likelihood, L(π y) p(y π)

Likelihood: An Example (ctd) So here are our steps: First, ) we got p(y π) from the Biomial PDF: π 35 (1 π) (100 35) ( 100 35 Second, we derived the likelihood, L(π y) p(y π) Third, we can pull this all together, L(π y) ( ) 100 35 π 35 (1 π) (100 35)

Likelihood: An Example (ctd) So here are our steps: First, ) we got p(y π) from the Biomial PDF: π 35 (1 π) (100 35) ( 100 35 Second, we derived the likelihood, L(π y) p(y π) Third, we can pull this all together, L(π y) ( ) 100 35 π 35 (1 π) (100 35) That s it!

Likelihood: An Example (ctd) So now we have our likelihood function, L(π y) ( 100 35 ) π 35 (1 π) (100 35). The interpretation is that it s the likelihood of our model having generated the data. The likelihood doesn t make much sense in the abstract. How to make sense? (1) It s a good idea to plot it to get a sense of what s going onf (2) deriving (analytically or via simulation) the maximum of the likelihood, which is the maximum likelihood estimate (MLE)

Plotting the example First, note that we can take advantage of a lot of pre-packged R functions rbinom, rpoisson, rnorm, runif gives random values from that distribution pbinom, ppoisson, pnorm, punif gives the cumulative distribution (the probability of that value or less) dbinom, dpoisson, dnorm, dunif gives the density (i.e., height of the PDF useful for drawing) qbinom, qpoisson, qnorm, qunif gives the quantile function (given quantile, tells you the value) We can also write our own function using the plot command.

Plotting the example We want to plot L(π y) ( ) 100 35 π 35 (1 π) (100 35) > ## example using the dbinom > dbinom(35, size = 100, prob =.35) [1] 0.0834047 > ## prob of getting 35 successes given that prob =.35 > ## it s actually kind of low > curve(dbinom(35, size = 100, prob = x), xlim =c(0,.8), xlab ="pi", ylab = "likelihood")

Plotting the example likelihood 0.00 0.02 0.04 0.06 0.08 0.0 0.2 0.4 0.6 0.8 pi Can we eyeball what the maximum likelihood estimate will be?

Other things to keep in mind

Other things to keep in mind What if we have two or more data points that we believe come from the same model?

Other things to keep in mind What if we have two or more data points that we believe come from the same model? We can derive a likelihood for the combined data by multiplying the independent likelihoods together.

Other things to keep in mind What if we have two or more data points that we believe come from the same model? We can derive a likelihood for the combined data by multiplying the independent likelihoods together. Taking the log of the likelihood (the log-likelihood ) someimes makes this easier.

Other things to keep in mind What if we have two or more data points that we believe come from the same model? We can derive a likelihood for the combined data by multiplying the independent likelihoods together. Taking the log of the likelihood (the log-likelihood ) someimes makes this easier. But we will address this as well as finding the MLE in the weeks to come.