INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Similar documents
INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

4 Random Variables and Distributions

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

The normal distribution is a theoretical model derived mathematically and not empirically.

4.3 Normal distribution

Central Limit Theorem, Joint Distributions Spring 2018

Chapter 5. Sampling Distributions

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Statistics for Business and Economics

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Binomial and Normal Distributions

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Statistical Methods for NLP LT 2202

ECON 214 Elements of Statistics for Economists 2016/2017

Business Statistics 41000: Probability 4

Chapter 7 1. Random Variables

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

The binomial distribution p314

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

6 If and then. (a) 0.6 (b) 0.9 (c) 2 (d) Which of these numbers can be a value of probability distribution of a discrete random variable

Theoretical Foundations

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

The Bernoulli distribution

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

Binomial Random Variables. Binomial Random Variables

VIDEO 1. A random variable is a quantity whose value depends on chance, for example, the outcome when a die is rolled.

MA : Introductory Probability

Business Statistics 41000: Probability 3

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Sampling and sampling distribution

Probability. An intro for calculus students P= Figure 1: A normal integral

Mean of a Discrete Random variable. Suppose that X is a discrete random variable whose distribution is : :

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Section 0: Introduction and Review of Basic Concepts

STOR 155 Introductory Statistics (Chap 5) Lecture 14: Sampling Distributions for Counts and Proportions

Lecture Data Science

Chapter 6: Random Variables and Probability Distributions

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

5.4 Normal Approximation of the Binomial Distribution

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

Statistics 6 th Edition

Section Sampling Distributions for Counts and Proportions

Lecture 2. Probability Distributions Theophanis Tsandilas

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

The Binomial Distribution

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

Elementary Statistics Lecture 5

Central Limit Theorem (cont d) 7/28/2006

CHAPTER 6 Random Variables

The Binomial Distribution

Unit2: Probabilityanddistributions. 3. Normal and binomial distributions

2011 Pearson Education, Inc

Random Variables Handout. Xavier Vilà

Review. Binomial random variable

AP Statistics Ch 8 The Binomial and Geometric Distributions

Part V - Chance Variability

The Binomial Probability Distribution

4.2 Bernoulli Trials and Binomial Distributions

PROBABILITY DISTRIBUTIONS

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

MATH 264 Problem Homework I

Homework Assignments

5.2 Random Variables, Probability Histograms and Probability Distributions

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

6. THE BINOMIAL DISTRIBUTION

Sampling Distributions For Counts and Proportions

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

***SECTION 8.1*** The Binomial Distributions

A.REPRESENTATION OF DATA

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

Commonly Used Distributions

15.063: Communicating with Data Summer Recitation 3 Probability II

Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

ECON 214 Elements of Statistics for Economists

Chapter 8: The Binomial and Geometric Distributions

STOR Lecture 7. Random Variables - I

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

Engineering Statistics ECIV 2305

Law of Large Numbers, Central Limit Theorem

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

What was in the last lecture?

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

(Practice Version) Midterm Exam 1

STAT 201 Chapter 6. Distribution

Chapter 3 Discrete Random Variables and Probability Distributions

Transcription:

INF5830 015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 3, 1.9

Today: More statistics Binomial distribution Continuous random variables/distributions Normal distribution Sampling and sampling distribution Statistics Hypothesis testing Estimation Known and unknown standard deviation

Last week Probability theory Probability space Random experiment (or trial) (no: forsøk) Outcomes (utfallene) Sample space (utfallsrommet) An event (begivenhet) Bayes theorem Discrete random variable The probability mass function, pmf The cumulative distribution function, cdf The mean (or expectation) (forventningsverdi) The variance of a discrete random variable X The standard deviation of the random variable

Discrete random variables

Mean of a discrete random variable The mean (or expectation) (forventningsverdi) of a discrete random variable X: µ = Useful to remember X E ( X ) = p( x) x x µ + ( X + Y ) = µ X µ Y µ = a + bµ ( a+ bx ) x Examples: One dice: 3.5 Two dices: 7 Ten dices: 35

Example Throwing a dice until you get 6 P(odd) =? P(even) = P(odd)*5/6 P(even) + P(odd) = 1 ppp n = 1 6 (5 6 )(n 1), n 1 μ = 6

More than mean Mean doesn t say everything Example (1.3) The sum of the two dice, Z, i.e. p Z () = 1/36,, p Z (7) = 6/36 etc (3.) p given by: p (7)=1 p (x)= 0 for x 7 (3.3) p 3 given by: p 3 (x)= 1/11 for x =,3,,1 Have the same mean but are very different

Variance The variance of a discrete random variable X Var( X ) = σ = p( x)( x µ ) Observe that Var( X ) = E(( X E( X x It may be shown that this equals E( X ) ( E( X The standard deviation of the random variable )) ) )) σ = Var(X )

Examples of variance Throwing one dice µ = (1++..+6)/6=7/ σ = ((1-7/) +(-7/) + (6-7/) )/6 = (5+9+1)/4*3=35/1 (Ex 1.3) Throwing two dice: σ = 35/6 (Ex 3.) p, where p (7)=1 has variance 0 (Ex 3.3) p 3, the uniform distribution, has variance: ((-7) + (1-7) )/11 = (5+16+9+4+1+0)*/11 = 10

Probability distributions Sannsynlighetsfordelinger

Examples of distributions (1.3) The sum of the two dice, Z, i.e. p Z () = 1/36,, p Z (7) = 6/36 etc (3.) p given by: p (7)=1 p (x)= 0 for x 7 (3.3) p 3 given by: p 3 (x)= 1/11 for x =,3,,1

Bernoulli trial One experiment, two outcomes Ω X ={0, 1} Write p for p(1) Then p(0) = 1-p The mean/expectation: 0*p(0)+1*p(1)=0+p=p Variance Examples: Flipping a fair coin, p=1/ Rolling a dice, getting a 6, p=1/6 Var( X ) = σ = p( x)( x µ ) x =

Bernoulli trial One experiment, two outcomes Ω X ={0, 1} Write p for p(1) Then p(0) = 1-p The mean/expectation: 0*p(0)+1*p(1)=0+p=p Variance Var( X ) = σ = p( x)( x µ ) (1 p)(0 Examples: Flipping a fair coin, p=1/ Rolling a dice, getting a 6, p=1/6 p) + x p(1 p) = = p(1 p)

Binomial distribution Binomial distribution (binomisk fordeling) Conducting n Bernoulli trials with the same probability and counting the number of successes Example flipping a fair coin n times, p(k): n=: p(0)=1/4, p(1)=1/, p() =1/4 n=3: p(0)=1/8, p(1)=3/8, p()=3/8, p(3)=1/8 n=4: (1,4,6,4,1)/16 n=5: (1,5,10,5,1)/3 n: p( k) = where n 1 k n n = k n! k!( n k)!

Binomial distribution Binomial distribution (binomisk fordeling) General form: 0<p<1 n a natural number B(n,p) is given by for k = 0, 1, n, where ) ( ) (1 ), ; ( k n k p p k n p n k b = )!!(! k n k n k n =

Binomial distribution n = 0 p = 0.1 (blue), p = 0.5 (green) and p = 0.8 (red)

Binomial distribution Mean/expectation, μ, of B(n,p) is np n Bernoulli trials Each Bernoulli trial has mean p The variance is np(1-p) Because the Bernoulli trials are independent Each Bernoulli trial has variance p(1-p) The variance of the sum of two independent random variables is the sum of their variances

p=0.5 N=4: N=16: N 1 4 16 64 56 σ 0.5 1 4 16 64 σ 0.5 1 4 8 N=64: The relative variation gets smaller with growing N The pmf graph approaches a bell shape

Think about Flip a coin 10 times, count the number of heads You expect 5 heads, but not exactly 5 6 is OK When do you start to worry whether the coin is unfair? 8 heads? 9 heads? This is the task for inferential statistics

Tossing a fair(?) coin The cumulative distribution function: ``How likely is it to get N or fewer tails? 10: N pmf(n) cdf(n) 0 0.001 0.001 1 0.010 0.011 0.044 0.055 3 0.117 0.17 4 0.05 0.377 5 0.46 0.63 6 0.05 0.88 7 0.117 0.945 8 0.044 0.989 9 0.010 0.999 10 0.001 1.000

SciPy import scipy from scipy import stats bin10 = stats.binom(10, 0.5) # N=10, p=0.5 bin10.pmf(3) # probability mass of 3 bin10.cdf(3) # cumulative distribution function at 3 bin10.var() bin10.std() # variance # standard deviation

Continuous random variables

Continuous random variables P(X=a) = 0 for all values a The probability mass function does not make sense The cumulative distribution function, cdf, given by F(a) = P(X<a) makes sense P(a<x<b) = F(b) - F(a) To calculate expectation and variance we must use integration instead of (infinite) sums. We skip the details!

Probability density function The derivative of the cdf, F, is called the probability density function, pdf (sannsynlighetstetthet) We draw curves for pdf-s The pdf has a similar relationship to the cdf in the continuous case as the pma has in the discrete case

The normal distribution z-score relates the general case to the standard case z = x µ σ Standard norm.dist. (red curve) General norm.dist N(µ,σ) Scary formula (Don t have to remember) f ( x) = 1 e π x f ( x µ ) 1 σ ( x) = e πσ Important Mean 0 µ Standard deviation 1 σ

68% - 95% - 99.7%

Example z = x µ σ Tallness of Norwegian young men (rough numbers): µ = 180 cm σ = 6cm z = (186-180)/6=1 (standard deviation) (100-68)/%= 16% are taller than 186cm How many are taller than 190cm? z = (190-180)/6 = 1.67 Prob. = 0.0475 (from table or software)

Sampling distribution Utvalgsfordeling

Sampling - empirically Goal: make assertions about a whole population from observations of a sample (utvalg) A simple random sample (SRS) (tilfeldig utvalg): 1. Each individual has equal chance of being chosen (unbiased/forventningsrett). Selection of the various individuals are independent Not as simple as it sounds (c.f. the current election polls): Various methods to rescue E.g. choose from known groups, weigh by group size (gender, age, home town, etc.)

Sampling in Language Technology You want to take a simple random sample of words from a corpus? Can you use the n first sentences? Can you use a random sample of n sentences? How can you build a corpus (sample) which gives a random sample of Norwegian texts?

Sampling distributions Example Height: X assume N(180, 6) (Var=36) Randomly choose 100. Add their heights: S = X 1 + X + + X n A new random variable (all such samples) Exp(S) = n*µ= 18000 (cm) Var(S) = 100*Var(X) = 3600 σ S = 10 σ X = 60 (cc) Source: Wikipedia

Sampling distributions Example Height: X assume N(180, 6) (Var=36) Randomly choose 100. Add their heights: S = X 1 + X + + X n A new random variable (all such samples) Exp(S) = n*µ= 18000 (cm) Var(S) = 100*Var(X) = 3600 σ S = 10 σ X = 60 (cc) The mean of the samples: X =S/n A new random variable (all such means of samples of 100) Exp(S) = µ= 180 (cm) σ X = 1 100 σ S = 0.6 (cc)

Sampling distributions Let X be a random variable for a population with exp: µ, std: σ Let S = X 1 + X + + X n, i.e. each X i equals X Let : X =S/n Then: Exp(S) = n*µ Exp(X ) = µ Var Var S ( ) S ( ) n X 1 1 ( X ) = σ = Var( S) = σ X X σ = 1 X n σ = σ = n Var X = σ X n n

Effect of sample size Sample size 1 4 16 100 400 1600 Standard dev. 6 3 1.5 0.6 0.3 0.15

The form of the distribution If the Xi-s are independent and normally distributed, then X is normally distributed (as expected) (More surprisingly) Even though the Xi-s are not normally distributed: for large n-s, the sample distribution is approximately normal = Central Limit Theorem

Example: throwing the dice until a 6 Number of samples: 1000 Sample size 1 10 4 100

Binomial distribution b( k; n, p) = n p k k (1 p) ( n k ) Population: all Bernoulli trials with probability p. Sample: n such trials Example: Throwing a dice n times, counting the number of 6-s (success) Number of successes: X Random variable over all series of n trials Binomial distribution (binomisk fordeling): B(n,p) E(X)= np Var(X)= np(1-p) σ X = np( 1 p) Approximated by N(np, np( 1 p) ) for large n Rule of thumb: np>10 and n(1-p)>10 Proportion of success: p^ =X/n E( p^ ) = E(X/n) = np/n = p Var( pˆ) = σ np(1 p) n X n = p(1 p) n Approximated by N(p, p ( 1 p) / n ) for large n = p(1 p) σ Y σ pˆ = = n n

Example Example: p = 0.8 You have a classifier which you think is 80 % correct. What can you expect of this classifier from samples of various sizes? N E(X) Var(X ) SD(X) μ ± σ E( p^ ) =E(X/n) Var( p^ ) SD( p^ ) μ ± σ 1 0.8 0.16 0.4 0.8 0.16 0.4 5 0 4 0.8 0.0064 0.08 100 80 16 4 [7, 88] 0.8 0.0016 0.04 [.7,.88] 500 000 400 0 [1960, 040] 0.8 0.000064 0.008 10000 8000 1600 40 [790,8080] 0.8 0.000016 0.004 [.79,.808]