Probability of tails given coin is green is 10%, Probability of tails given coin is purple is 60%.

Similar documents
The Binomial Distribution

The Binomial Distribution

Simple Random Sample

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

The binomial distribution p314

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

Probability. An intro for calculus students P= Figure 1: A normal integral

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

A useful modeling tricks.

Spike Statistics. File: spike statistics3.tex JV Stone Psychology Department, Sheffield University, England.

Decision Trees: Booths

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Chapter 5 Probability Distributions. Section 5-2 Random Variables. Random Variable Probability Distribution. Discrete and Continuous Random Variables

Spike Statistics: A Tutorial

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

Math 130 Jeff Stratton. The Binomial Model. Goal: To gain experience with the binomial model as well as the sampling distribution of the mean.

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Binomial Random Variable - The count X of successes in a binomial setting

23.1 Probability Distributions

The Binomial and Geometric Distributions. Chapter 8

2. Modeling Uncertainty

The following content is provided under a Creative Commons license. Your support

Chapter 6: Random Variables

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

SECTION 4.4: Expected Value

Chapter 4 and 5 Note Guide: Probability Distributions

Stat511 Additional Materials

Data Analysis. BCF106 Fundamentals of Cost Analysis

4.1 Probability Distributions

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Normal Approximation to Binomial Distributions

Solutions for practice questions: Chapter 15, Probability Distributions If you find any errors, please let me know at

Chapter 17. The. Value Example. The Standard Error. Example The Short Cut. Classifying and Counting. Chapter 17. The.

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

***SECTION 8.1*** The Binomial Distributions

Business Statistics 41000: Probability 3

work to get full credit.

Expectation Exercises.

MATH 264 Problem Homework I

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

The Binomial Probability Distribution

if a < b 0 if a = b 4 b if a > b Alice has commissioned two economists to advise her on whether to accept the challenge.

2011 Pearson Education, Inc

CS 361: Probability & Statistics


CHAPTER 6 Random Variables

MA 1125 Lecture 12 - Mean and Standard Deviation for the Binomial Distribution. Objectives: Mean and standard deviation for the binomial distribution.

We use probability distributions to represent the distribution of a discrete random variable.

Chapter 5: Discrete Probability Distributions

Exploring the Scope of Neurometrically Informed Mechanism Design. Ian Krajbich 1,3,4 * Colin Camerer 1,2 Antonio Rangel 1,2

You should already have a worksheet with the Basic Plus Plan details in it as well as another plan you have chosen from ehealthinsurance.com.

Chapter 3 Discrete Random Variables and Probability Distributions

Probability & Statistics Chapter 5: Binomial Distribution

What is the probability of success? Failure? How could we do this simulation using a random number table?

The normal distribution is a theoretical model derived mathematically and not empirically.

STA 220H1F LEC0201. Week 7: More Probability: Discrete Random Variables

Review: Population, sample, and sampling distributions

STAT 201 Chapter 6. Distribution

6.3: The Binomial Model

Essential Question: What is a probability distribution for a discrete random variable, and how can it be displayed?

Descriptive Statistics (Devore Chapter One)

Lesson 9: Comparing Estimated Probabilities to Probabilities Predicted by a Model

Section 0: Introduction and Review of Basic Concepts

Chapter 3: Probability Distributions and Statistics

Lecture 9: Plinko Probabilities, Part III Random Variables, Expected Values and Variances

STAT 157 HW1 Solutions

Problem Set 7 Part I Short answer questions on readings. Note, if I don t provide it, state which table, figure, or exhibit backs up your point

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

Problem Set 1 Due in class, week 1

Math 1070 Sample Exam 2 Spring 2015

30 Wyner Statistics Fall 2013

MATH 118 Class Notes For Chapter 5 By: Maan Omran

The Central Limit Theorem

Chapter 8: Binomial and Geometric Distributions

Random Variables and Probability Distributions

Law of Large Numbers, Central Limit Theorem

Statistical Methods in Practice STAT/MATH 3379

CALIFORNIA INSTITUTE OF TECHNOLOGY. Introduction to Probability and Statistics Winter Assignment 3

Basic Procedure for Histograms

Section M Discrete Probability Distribution

AP Statistics Chapter 6 - Random Variables

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

MATH1215: Mathematical Thinking Sec. 08 Spring Worksheet 9: Solution. x P(x)

Game Theory Notes: Examples of Games with Dominant Strategy Equilibrium or Nash Equilibrium

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Binomal and Geometric Distributions

Chapter 6 Part 3 October 21, Bootstrapping

ECO155L19.doc 1 OKAY SO WHAT WE WANT TO DO IS WE WANT TO DISTINGUISH BETWEEN NOMINAL AND REAL GROSS DOMESTIC PRODUCT. WE SORT OF

Section 6.3 Binomial and Geometric Random Variables

What do you think "Binomial" involves?

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

Chapter 7. Random Variables

Please have out... - notebook - calculator

AP Statistics Section 6.1 Day 1 Multiple Choice Practice. a) a random variable. b) a parameter. c) biased. d) a random sample. e) a statistic.

5.7 Probability Distributions and Variance

Chapter 3 Discrete Random Variables and Probability Distributions

Chapter 6: Random Variables

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

Transcription:

Examples of Maximum Likelihood Estimation (MLE) Part A: Let s play a game. In this bag I have two coins: one is painted green, the other purple, and both are weighted funny. The green coin is biased heavily to land heads up, and will do so about 90% of the time. The purple coin is slightly weighted to land tails up, about 60% of flips. Both coins are otherwise identical. In this game, I ll pull a coin out of the bag without looking, flip it in secret, and tell you what landed up, either heads or tails. To win this game, you have to guess which color of coin I picked out of the bag. At first glance this game may seem to be a coin toss (pun!), with a chance of guessing the color right 50% of the time. But it turns out to be easier than that, since we know that each coin behaves differently. Suppose I tell you that I flipped a tails. Well, we know the green coin hits tails only 10% of flips, while the purple coin could hit tails 60% of flips. i.e., Probability of tails given coin is green is 10%, Probability of tails given coin is purple is 60%. Now since we know the probabilities of tails conditioned on which coin is drawn, we know the likelihoods: Likelihood of having a green coin given tails was flipped is 10%, Likelihood of having a purple coin given tails was flipped is 60%. When I tell you that a tails was flipped, you can infer that it is 6 times more likely that the coin I drew was a purple coin. So if you wanted to maximize your chances of winning the game, you should choose the coin color which maximizes your likelihood of wining, given the information you have about what was flipped. That is, if you see tails, choose purple; if you see heads, choose green. THIS IS EXACTLY MAXIMUM LIKELIHOOD DECODING! It is nothing more than: you choose the option that is most likely to be true given your observation. Here, the observation is a heads or a tails (you could say, a 1 or a 0) and you are choosing a color. In neuroscience, it is a spike count (a number) and you are choosing a stimulus value. Now, what s the fraction of correct responses you will deliver when being given an observed responses, using maximum likelihood decoding? To figure this out, consider the two possibilities. Say the truth is that a green coin was flipped. How often will you get it right? 90% of the time a heads will occur, and according to your maximum likelihood decoding rule you will say green, so be correct. So 90% of the time when green is flipped, you re correct. Say the truth is that a purple coin was flipped. How often will you get it right? 60% of the time you ll get a tails, and say purple, so be correct. So 60% of the time when green is flipped, you re correct. What s your error rate on balance? We are assuming each coin in truth is flipped an equal number of times. So your fraction of correct responses is the average of these numbers, 75%.

Cool! Maximum likelihood decoding sure beats guessing. THIS IS EXACTLY THE PROCEDURE YOU WILL FOLLOW IN GENERAL to figure out your fraction of correct responses. The only difference in the neural case is that there are more than two possible observations (heads and tails), instead integer-valued spike counts. But you do the same thing. You have your maximum likelihood decoding rule in hand. You assume one of the options (i.e., one stimulus value) is true, and see what percentage of the time your rule gives you the right answer. Then you assume the other option is true, and see what percentage of the time your rule gives you the right answer. You average these percentages, and that is your fraction of correct responses.

Part B: Let s play another game. In this game we have two world class sprinters running the 150m dash: Donovan Bailey, and Michael Johnson. Each runner has a normal (Gaussian) distribution for their finishing times: Donovan has a mean of 15 seconds with a standard deviation of 1second, Michael has a mean of 17 seconds with a standard deviation of 1.5 seconds. In this game, I ll tell you the finishing time of one of the runners, and you win if you guess who ran that time correctly. To get started, you could use MATLAB to plot the finishing time distributions of each runner, and get the plot on the right, via this code: (using the gaussian formula from class together with linspace, xlabel, ylabel, legend commands:) xlist=linspace(10,25,1000); m1=15 s1=1 m2=17 s2=1.5 gsn1=1/(sqrt(2*pi*s1^2))*exp(-(xlist-m1).^2/(2*s1^2)); gsn2=1/(sqrt(2*pi*s2^2))*exp(-(xlist-m2).^2/(2*s2^2)); %makes text bigger in upcoming plot labels set(0,'defaultaxesfontsize',20); set(0,'defaulttextfontsize',20); figure plot(xlist,gsn1,'r','linewidth',4) hold on plot(xlist,gsn2,'b','linewidth',4) legend('donny','mike') xlabel('time'); ylabel('probability') Suppose I give you a running time of 21.5 seconds. While it s a slow time for the American Johnson, the probability of Canadian Bailey running over 20 seconds is extremely small. Thus if you d want to win, you re better off to say that Michael Johnson ran the 21.5. ONCE AGAIN, THIS IS MAXIMUM LIKELIHOOD DECODING! You are given an observation (here, a time), and you choose the option that was most likely to produce that time (here, Johnson). You can then apply this method to determine the best guess to win this game given any running time: it s the guess corresponding to the higher likelihood (i.e., maximum likelihood). To compute the fraction of correct responses, you d follow the procedure above. That would involve summing (integrating) the area under the histograms that correspond to correct choices. For histograms, as on the assignment, it s easier: you sum up the probabilities in the corresponding bins.

Part C: A big part of maximum likelihood estimation involves working with data and probability distributions. In practice, you will not have smooth curves like the above blue and redo ones, but will need to build your own histograms. Use the following code to generate and then then visualize some random data: D = 2*randn(1,10)+6; hist(d); You ll see that there are only a few samples in the data, and your histogram is pretty sparse. You might not be able to estimate overlap points between this and another histogram very well, to see above and below what value you should choose an option based on this histogram. Generate 1000 samples from D using the same formula, instead of 10, and plot the histogram. You should be able to get a better idea now. To visualize the 1000 samples more finely, increase the number of histogram bins to 100 with: hist(d,100); To get a good smooth histogram now, you might need to draw even more samples. YOU COULD RUN INTO THIS TYPE OF ISSUE ON YOUR ASSIGNMENT. MAKE SURE YOU ARE DOING ENOUGH SAMPLES TO GET SMOOTH HISTOGRAMS, JUST INCREASING NUMBER OF TRIALS. Part D: OPTIONAL. In our HW problem, we can get around the issue of not having enough samples by just increasing the trial count, i.e., asking the computer for more! In real life, often we will be given a small set of data and will need to use another strategy to make informed guesses about where other samples came from. We ll discuss how to do this using mean and variance statistics (or the square root of variance, the standard deviation, often called std dev or std). You do not need to do this for the assignment, but it s worth knowing about for sure. Suppose I have a machine that spits out two different kinds of white powder. One powder is the stuff that covers sourpatch kids candy, is extremely delicious, and I want to eat it. The other powder is Ricin, a powerful neurotoxin. Based on experiments I have conducted with a population of undergrads, I know that the machine will distribute both powders with equal probability, and that the Ricin is less massive, having a lower weight. I have been able to find the weights in grams of 20 powder samples, below as matrix PD1: PD1 = [2.7800 3.2000 3.0400 2.6900 2.9600 3.1300 2.7600 2.9800 2.4900 2.9200; 1.8600 2.6900 2.3200 2.9100 2.0500 3.0700 2.2600 2.1000 2.1000 2.1200] The rows represent different classes of powders, determined by how the undergrads responded to the powders. The columns are different sample numbers. First, determine which row corresponds to the Ricin, by finding which class has a lower weight. To do this, take the mean across the second (column) dimension: mean(pd1,2) Assign each row as either being one of candy or poison based on the average weight. Now get an idea for the distribution of mass within each group:

std(pd1, 0, 2) Note that the 0 argument is necessary, due to an optional flag used in the definition of std which is not important here (see doc std) for details if you re interested. We see that each row has a different variance, but it is difficult to see how these relate to the means. First we can plot the means: plot([1 2], mean(pd1,2), b- ); which is sort of hard to view, so plot it again replacing the default b- with r. or go or mx: and view the effects (google matlab linespec for further details). We can then incorporate our knowledge of the variance within each row by overlaying the standard deviation from each mean. First, run a doc errorbar and read the options, then try: errorbar([1 2], mean(pd1,2), std(pd1,0,2), 'ro', 'markersize', 12) The x-axis corresponds to the column number, and the y-axis to the mass in grams. The vertical lines at each datapoint correspond to the standard deviation about the mean of each data row. You can then imagine plotting two normal distributions, as in part B, with the mean/std you ve calculated, and doing MLE as before. Here, we ll just use the errorbar plot to make some educated guesses. Ask yourself: say I have four samples, represented as ordered pairs of ID letters and weights: (A, 2.28), (B, 3.25), (C, 3.22), (D, 2.65). Which sample would you say was likely poison? Which sample would you say is safe for human consumption? Between the two remaining samples, which would you wish to sample, and which would you offer to your friend/enemy, and why?