Statistical Methods for NLP LT 2202

Similar documents
Probability mass function; cumulative distribution function

Probability Distributions for Discrete RV

Binomial Random Variables. Binomial Random Variables

CHAPTER 10: Introducing Probability

The Binomial distribution

The binomial distribution

CHAPTER 6 Random Variables

Random Variables. 6.1 Discrete and Continuous Random Variables. Probability Distribution. Discrete Random Variables. Chapter 6, Section 1

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Marquette University MATH 1700 Class 8 Copyright 2018 by D.B. Rowe

Chapter 6: Random Variables

IEOR 165 Lecture 1 Probability Review

Stochastic Calculus, Application of Real Analysis in Finance

ECON 214 Elements of Statistics for Economists 2016/2017

Statistics for IT Managers

+ Chapter 7. Random Variables. Chapter 7: Random Variables 2/26/2015. Transforming and Combining Random Variables

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

Sec$on 6.1: Discrete and Con.nuous Random Variables. Tuesday, November 14 th, 2017

Stat 211 Week Five. The Binomial Distribution

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Math 14 Lecture Notes Ch Mean

TOPIC: PROBABILITY DISTRIBUTIONS

Part V - Chance Variability

Chapter 8 Solutions Page 1 of 15 CHAPTER 8 EXERCISE SOLUTIONS

HUDM4122 Probability and Statistical Inference. February 23, 2015

Chapter 5 Basic Probability

5.2 Random Variables, Probability Histograms and Probability Distributions

A useful modeling tricks.

II - Probability. Counting Techniques. three rules of counting. 1multiplication rules. 2permutations. 3combinations

Random Variables Handout. Xavier Vilà

Conditional Probability. Expected Value.

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Distribution Distribute in anyway but normal

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 4. Section 4.1 Objectives. Random Variables. Random Variables. Chapter 4: Probability Distributions

Central Limit Theorem, Joint Distributions Spring 2018

Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes

Probability Distributions

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Binomial population distribution X ~ B(

MATH 118 Class Notes For Chapter 5 By: Maan Omran

Statistics for Business and Economics: Random Variables (1)

The normal distribution is a theoretical model derived mathematically and not empirically.

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

Mean, Median and Mode. Lecture 2 - Introduction to Probability. Where do they come from? We start with a set of 21 numbers, Statistics 102

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

FE 5204 Stochastic Differential Equations

Lesson 97 - Binomial Distributions IBHL2 - SANTOWSKI

Opening Exercise: Lesson 91 - Binomial Distributions IBHL2 - SANTOWSKI

Theoretical Foundations

Chapter 7. Random Variables

Random Variables and Probability Functions

STOR Lecture 7. Random Variables - I

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Probability Distributions: Discrete

Binomial formulas: The binomial coefficient is the number of ways of arranging k successes among n observations.

4 Random Variables and Distributions

HHH HHT HTH THH HTT THT TTH TTT

12. THE BINOMIAL DISTRIBUTION

12. THE BINOMIAL DISTRIBUTION

6.1 Discrete and Continuous Random Variables. 6.1A Discrete random Variables, Mean (Expected Value) of a Discrete Random Variable

Section Distributions of Random Variables

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Business Statistics 41000: Probability 4

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

STAT Chapter 7: Central Limit Theorem

The Elements of Probability and Statistics

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Stochastic Processes and Financial Mathematics (part one) Dr Nic Freeman

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Probability and Random Variables A FINANCIAL TIMES COMPANY

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

Statistics for Managers Using Microsoft Excel 7 th Edition

5.4 Normal Approximation of the Binomial Distribution

INTRODUCTION TO MATHEMATICAL MODELLING LECTURES 3-4: BASIC PROBABILITY THEORY

Some Discrete Distribution Families

Lecture Data Science

Probability: Week 4. Kwonsang Lee. University of Pennsylvania February 13, 2015

Lecture 6 Probability

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

2017 Fall QMS102 Tip Sheet 2

Review of the Topics for Midterm I

2011 Pearson Education, Inc

5. In fact, any function of a random variable is also a random variable

Central Limit Theorem (cont d) 7/28/2006

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Chapter 14. From Randomness to Probability. Copyright 2010 Pearson Education, Inc.

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Probability Models. Grab a copy of the notes on the table by the door

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

5.4 Normal Approximation of the Binomial Distribution Lesson MDM4U Jensen

MATH1215: Mathematical Thinking Sec. 08 Spring Worksheet 9: Solution. x P(x)

Review. What is the probability of throwing two 6s in a row with a fair die? a) b) c) d) 0.333

Chapter 3 Discrete Random Variables and Probability Distributions

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Chapter 4 and 5 Note Guide: Probability Distributions

Keeping Your Options Open: An Introduction to Pricing Options

Transcription:

LT 2202 Lecture 3 Random variables January 26, 2012

Recap of lecture 2 Basic laws of probability: 0 P(A) 1 for every event A. P(Ω) = 1 P(A B) = P(A) + P(B) if A and B disjoint Conditional probability: P(A B) = P(A B)/P(B) Bayes: P(B A) = P(A B)P(B)/P(A) Independence if P(A B) = P(A) P(B)

Today s lecture Random variables Probability distributions Expectations and variances The zoo of probability distributions

Random variables, informally A variable that takes different values with different probabilities The amount I win when buying a lottery ticket The number of heads when throwing coins n times The gender of a newborn baby The number of words in a random English sentence The initial word in a random English sentence

Random variables, informally The outcomes: If I win, I ll get 1,000,000 SEK Otherwise, I get nothing P(nothing) = 0.99999 P(1,000,000 SEK) = 0.00001 P(something else) = 0

Throwing a coin twice The number of heads when throwing a coin twice The outcomes: (H,H), (H,T), (T,H), (T,T) P(none) = 1/4 P(one) = 2/4 P(two) = 1/4

Random variables, formally Definition: A random variable is a function from the sample space Ω to some set of values Also stochastic variable (στοχαστικός) Example: the number of heads

Describing a random variable When discussing a random variable, we need to describe which values it takes and with which probabilities: the distribution

The probability mass function For all values x that a random variable X may take, we define the function p X (x) = P(X takes the value x) This is called the probability mass function (pmf) of X

Coins again X = the number of heads when throwing a coin twice p X (0) = P(X = 0) = 1/4 p X (1) = P(X = 1) = 2/4 p X (2) = P(X = 2) = 1/4

Exams X = the number of times I have to go to the exam p X (1) = P(X = 1) = 0.6 p X (2) = P(X = 2) = 0.4 * 0.6... p X (k) = P(X = k) = 0.4 k-1 * 0.6

Independent random variables We saw independent events previously We may introduce a similar concept for random variables: X and Y are independent random variables if P(X = x, Y = y) = P(X = x) P(Y = y) = p X (x) p Y (y) for all possible values x, y of X and Y.

Expectations: averages We are rolling a die repeatedly. What will be the average result? We are throwing a coin. If heads, we pay $10, if tails, we get $5. What is the average gain?

Expectations: averages We are throwing a coin. If heads, we pay $10, if tails, we get $5. What is the average gain? P($5) = 0.5 P(-$10) = 0.5 0.5 * $5 + 0.5 * -$10

The expected value If X is a random variable taking numeric values, then Ε[ = k X ] p ( k) X is called the expected value, mean value, or expectation of X. k

Dice rolling X is the result of a die roll: 1 1+... + 6 Ε[ X ] = k p ( ) = X k k = = 6 6 k k 3.5

Coins again X = the number of heads when throwing a coin twice p X (0) = P(X = 0) = 1/4 p X (1) = P(X = 1) = 2/4 p X (2) = P(X = 2) = 1/4 1 2 1 Ε[ X ] = k p ( k X ) = 0 + 1 + 2 = 4 4 4 k 1

Interpretation

Some interesting facts about expectations The expectation is linear: Ε [ ax + b ] = a Ε [ X ] + b We can sum expectations directly: Ε[ X + Y ] = Ε[ X ] + Ε[ Y ]

Expectations of functions If Y is a function of X, e.g. Y = g(x), then = g( k) Ε [ Y ] p ( k) X k

Example X = the result of a die roll Y = Amount won or lost depending on X: 1: Lose $20 2: Lose $5 3: Nothing 4: Win $1 5: Win $2 6: Win $20

Example = g( k) Ε[ Y ] p X ( k) k Ε[ Y ] = g(1) px (1) +... + g(6) px (6) 1 1 1 1 1 Ε[ Y ] = 20 5 + 1 + 2 + 20 = 6 6 6 6 6 2 6

Variance If m = E[X], then V ( X ) =Ε[( X m) 2 ] is called the variance of X, and D ( X ) = V ( X ) is called the standard deviation of X.

Example (standard deviation)

The zoo of probability distributions We ll exemplify a few. Try to remember the type cases! See table on course page Examples: Coin tossing with uneven coin: Bernoulli Die rolling: Uniform Counting errors/successes: Binomial Trying until success: Geometric Word frequencies: Zipf

The Bernoulli distribution Assume that we throw an unbalanced coin giving heads (1) with probability p and tails (0) with probability 1-p: p X ( 0) =1 p p X (1) = p

Properties of the Bernoulli distribution Expectation and variance (see exercise): Ε [X X ] = p V[ X ] = p(1 p)

The uniform distribution: die rolling 1 p X ( k) = n Ε[ X ] = n + 1 2

Counting: the binomial distribution The number of heads when throwing a coin (heads prob p) n times n=1: H, T n=2: (H,H), (H,T), (T,H), (T,T) n=3: (H,H,H), (H,H,T), (H,T,H), (H,T,T), (T,H,H), (T,H,T), (T,T,H), (T,T,T)

Counting: the binomial distribution The probability of getting k heads? n=3, k=1: (H,T,T), (T,H,T,), (T,T,H) Each of the outcomes has the probability k n k p (1 p) How many such outcomes: the number of ways to pick k items from a set of n

Picking k items out of n The number of ways to pick k items from a set of n is called the binomial coefficient: n n k = n!! k!( n k)!

The binomial distribution We get the pmf: p X ( k) n k k k n k = p (1 p) The expectation: (think about it!) Ε [ X ] = np

The geometric distribution Trying until success p X(1) = P(X = 1) = p p X (2) = P(X = 2) = (1-p) * p... p X (k) = P(X = k) = (1-p) k-1 * p Ε[ X ] = 1 p

The Zipf distribution For many phenomena in language, frequency tends to be approximately proportional to inverse frequency rank Assuming such a distribution, the probability of drawing the kth most common item is p X ( k) = C k (C is a normalizer so that the sum is 1.) Must assume finite vocabulary!

Example: the Codex Argenteus Corpus size about 60,000 words How many hapax legomena? About 9% Zipf with voc. size 25,000 would give us around 10% 50,000: 15%

Continuous distributions So far, our random values have taken discrete values (1, 2, 3,... or a, b,...) We may also define random variables that take continuous values This requires slightly different machinery

Probability density function For a continuous random variable X, we don t use a pmf Instead we use a probability density function (pdf) f X (x) such that P ( a < X b) = f X ( x) b x= a (Note that f X (x) is not a probability by itself!)

Expectations and such The expectation (and most other stuff as well) becomes similar we just need to replace sums with integrals: Discrete: Continuous: = k Ε [ X ] p ( k) X k Ε [ X ] = x f ( x) dx x X

The normal distribution (Gaussian) The normal distribution has the following pdf f X ( x m) 2 2 1 σ ( x) = e σ 2π 2 Ε [ X ] = m V[ X ] = σ 2

Some nice facts about the normal distribution The normal distribution is very common in statistics because Often seen in nature (height, test scores,...) Has nice mathematical properties Scale, translation, sum invariant Central limit theorem: if we sum (or average) a large number of i.i.d. variables, the sum is approximately normal

Central limit theorem

Normal approximation of binomial For large values of n, the binomial distribution can be approximated with a normal: (why???)

Recap Random variables: Variables taking values with some probability Probability distributions Probability mass function Expectations (means) and variances Distributions... Try to remember the type cases See table on course page