Lecture 2. Probability Distributions Theophanis Tsandilas

Similar documents
Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Part V - Chance Variability

2011 Pearson Education, Inc

Statistics and Probability

MATH 3200 Exam 3 Dr. Syring

Confidence Intervals Introduction

Lecture Data Science

Statistical Intervals (One sample) (Chs )

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

The normal distribution is a theoretical model derived mathematically and not empirically.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Probability. An intro for calculus students P= Figure 1: A normal integral

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Business Statistics 41000: Probability 3

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

It is common in the field of mathematics, for example, geometry, to have theorems or postulates

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Probability and distributions

BIO5312 Biostatistics Lecture 5: Estimations

4. Basic distributions with R

Random Variables Handout. Xavier Vilà

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

The topics in this section are related and necessary topics for both course objectives.

Describing Uncertain Variables

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

The Bernoulli distribution

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Business Statistics 41000: Probability 4

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

Chapter 5. Statistical inference for Parametric Models

CS 237: Probability in Computing

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Binomial and Normal Distributions

Theoretical Foundations

4.3 Normal distribution

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

χ 2 distributions and confidence intervals for population variance

ECON 214 Elements of Statistics for Economists 2016/2017

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Statistics for Business and Economics

Commonly Used Distributions

Random variables. Contents

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Simple Random Sampling. Sampling Distribution

Chapter 6: Random Variables and Probability Distributions

Confidence Intervals. σ unknown, small samples The t-statistic /22

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Probability and Random Variables A FINANCIAL TIMES COMPANY

The Binomial Distribution

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

The Binomial Distribution

Central Limit Theorem, Joint Distributions Spring 2018

4 Random Variables and Distributions

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Simulation Wrap-up, Statistics COS 323

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

BIOSTATISTICS TOPIC 5: SAMPLING DISTRIBUTION II THE NORMAL DISTRIBUTION

Chapter 8: Sampling distributions of estimators Sections

Chapter 4. The Normal Distribution

Data Analysis. BCF106 Fundamentals of Cost Analysis

DESCRIBING DATA: MESURES OF LOCATION

Statistical Tables Compiled by Alan J. Terry

ECON 214 Elements of Statistics for Economists

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

The Binomial Probability Distribution

Chapter 8 Estimation

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

Learning Objectives for Ch. 7

4: Probability. What is probability? Random variables (RVs)

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 3, 1.9

Statistical Methods in Practice STAT/MATH 3379

Chapter 5. Sampling Distributions

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

STAT Chapter 7: Confidence Intervals

. (i) What is the probability that X is at most 8.75? =.875

Normal Probability Distributions

PROBABILITY DISTRIBUTIONS

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Lecture III. 1. common parametric models 2. model fitting 2a. moment matching 2b. maximum likelihood 3. hypothesis testing 3a. p-values 3b.

Statistics for Business and Economics: Random Variables:Continuous

Lecture 9 - Sampling Distributions and the CLT

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Statistics 13 Elementary Statistics

Introduction to Statistical Data Analysis II

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Lecture 6: Confidence Intervals

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Transcription:

Lecture 2 Probability Distributions Theophanis Tsandilas

Comment on measures of dispersion Why do common measures of dispersion (variance and standard deviation) use sums of squares: nx (x i ˆµ) 2 i=1 instead of sums of absolute residuals: nx i=1 x i ˆµ

Comment on measures of dispersion Working with absolute values can be difficult, but this is not the main reason. The measure of central tendency that minimizes the sums of absolute differences is the median, not the mean. And since the mean is the prevalent measure of central tendency, we commonly use sums of squares. However, for statistical methods that rely on medians, sums of absolute differences can be more appropriate.

R code

What is a probability distribution? Consider the population of all possible outcomes when throwing two dice. How probable is each sum S of counts from the two dice? The probability distribution provides the probability of occurrence of all possible outcomes in an experiment

Probability distribution of a population It is generally not known. However, we may have sufficient information about this distribution to meet the goals of our research. Statistical modeling: It is possible to rely on a small set of probability distributions that capture the key characteristics of a population. We can then infer the parameters of these model probability distributions from our sample. Why? We expect that the sample contains some information about the population from which it was sampled.

Example distributions Height: Women vs. Men

Example distributions US Income distribution (older than 2013)

Example distributions Half Marathon and Marathon race finish time Source: https://medium.com/runkeeper-everyone-every-run/how-long-till-the-finish-line-494361cc901b

Example distributions Distribution of most frequent English words http://norvig.com/mayzner.html

Discrete probability distributions The population (and hence its samples) contain discrete values, either finite or infinite in number e.g., {-3, 1, 0, 1, 2}, { blue, brown, green }, or {1, 2, 3,...} The probability of a value x in a population can be expressed as a function: f(x) = p(x = x) (the probability of random variable X taking value x) The function f(x) is called a probability mass function (pmf)

Discrete probability distributions This is a discrete probability distribution

Discrete probability distributions Probabilities should sum to 1: X f(x) =1 x2s The expected value (or mean): E(x) = X x2s f(x)x =2 1 36 +3 1 18 +4 1 12 +51 9 +6 5 36 +71 6 +8 5 36 +9 1 9 + 10 1 12 + 11 1 18 + 12 1 36 =7 The mode is the most frequent value: which one? The median is the middle value: which one?

Symmetrical probability distributions When the mean coincides with the median The above is a symmetrical, unimode distribution

The binomial distribution Consider a population containing two values: 1 and 0 Let s set the probability of 1 to Pr(1) =.75 and the probability of 0 to Pr(0) =.25. A single sample (n = 1) from such a population is known as a Bernoulli trial. a coin flipping trial with a fair coin is a Bernoulli trial with Pr(Head) =.5 and Pr(Tail) =.5 If we perform n independent Bernoulli trials, their outcomes will follow a binomial distribution.

The binomial distribution If a random variable X has a binomial distribution, we write: X B(n, P ) where n and P are the parameters of the distribution: n is the number of Bernoulli trials P is the chance (probability) of success If we know n and P, we can fully describe the distribution

The binomial distribution Probability Mass Function (pmf) n f(x; n, P )= P x (1 P ) n x x The probability of exactly n - x failures The probability of exactly x successes The number of possible ways that n Bernoulli trails lead to x successes

The binomial distribution Consider the distribution of errors for 10 trials, when for each trial, errors occur with a probability of 40%

R code

The binomial distribution Cumulative Distribution Function (cdf)

R code

Continuous distributions Not restricted to specific values. They can take any value between the lower and upper bound of a population (of course, populations can be unbounded).

Continuous distributions The probability of any particular value is zero. Probabilities can only be obtained for intervals (i.e., a range of values): Z b Pr(a apple X apple b) = f(x)dx a where f(x) is the probability density function (pdf). It provides the relative (rather than absolute) likelihood that a random variable X takes the value x. Pr( 1 apple X apple1)= Z 1 1 f(x)dx =1

Continuous distributions Mode: value of highest peak Median: value that divides the area exactly in half Mean: µ = Z 1 1 xf(x)dx

The normal distribution Also known as the Gaussian distribution

The normal distribution Symmetrical, unimodal and continuous Can be derived as a sum of an infinite number of independent random variables Thus, it is appropriate when data arise from a process that involves adding together the contributions from a large number of independent, random events.

Example The human height can be considered as the outcome of many independent random genetic and environmental influences

Normal distribution parameters A normal distribution can be fully described by two only parameters: its mean μ and its variance σ 2 A normally distributed variable X can be written as X N(µ, 2 ) Its probability density function (pdf) is as follows: f(x; µ, 2 )= 1 p e (x µ2 ) 2 2 2 2

Example The following normal distribution has mean μ = 100 and a standard deviation σ = 15

Standard normal distribution It is the normal distribution with a mean equal to 0 and a standard deviation (also variance) equal to 1: z N(0, 1) The standard normal distribution is often abbreviated to z. It is frequently used to simplify working with normal distributions.

Standard normal distribution

Reading a normal distribution

Sampling distribution of a statistic It is the distribution obtained by calculating the statistic (e.g. the mean) from an infinite number of independent samples of size n

Example An experiment measures the time it takes n = 10 people to visually locate a target on a computer screen. The same experiment is repeated a large (or infinite) number of times, where each time, we draw a new sample of size n. For each experiment, we compute the mean time: Experiment 1: M = 11.4 s Experiment 2: M = 12.6 s Experiment 3: M = 10.2 s... What s the distribution of these mean values?

Sampling distribution of a statistic Such distributions are interesting as they determine the probability of observing a particular value of the statistic, e.g., the mean. It is often very different than the distribution of the data used to calculate the statistic. distribution of the data sampling distribution of their means

Sampling distribution of the mean Its mean value is also the mean (expected value) of the original population the samples were drawn from Its standard deviation (SD) is known as the standard error of the mean (SEM)

The central limit theorem (CLT) States that the sampling distribution of a statistic approaches the normal distribution as n approaches infinity. It applies to statistics computed by summing or averaging quantities (means, variances) but not to standard deviations (squared root of an average)

The central limit theorem (CLT) States that the sampling distribution of a statistic approaches the normal distribution as n approaches infinity. It applies to statistics computed by summing or averaging quantities (means, variances) but not to standard deviations (squared root of an average) central = fundamental to probabilities and statistics limit = refers to a limit condition n!1

Practical importance of the CLT If the size of the sample is sufficiently large, then the sampling distribution of the statistic will be approximately normal (no matter what the distribution of the original population was) But which sample size is sufficiently large?

Sampling from normal distributions If the original population is normal, then the CLT will always hold, even if the sample size is as low as n = 1 The further the original population moves away from a normal distribution, the larger the sample size n should be

Sampling from binomial distributions Statistic of interest: Count of successes from n Bernoulli trials Frequency 0 10000 25000 Frequency 0 5000 15000 Frequency 0 5000 15000 0 2 4 6 8 n = 10, P =.15 0 2 4 6 8 12 n = 30, P =.15 5 10 20 30 n = 100, P =.15 Frequency 0 10000 20000 Frequency 0 5000 10000 Frequency 0 5000 15000 0 2 4 6 8 10 n = 10, P =.35 5 10 15 20 n = 30, P =.35 20 30 40 50 n = 100, P =.35

R code

Sampling from exponential distributions dexp(x) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 Distribution of source population Statistic of interest: Mean of a sample of n drawn from an exponential distribution Sampling distributions of the mean Frequency 0 5000 10000 15000 20000 25000 Frequency 0 5000 10000 15000 20000 Frequency 0 5000 10000 15000 20000 0.0 0.5 1.0 1.5 2.0 2.5 n=10 0.5 1.0 1.5 2.0 n=30 0.6 0.8 1.0 1.2 1.4 n=100

R code

What n is sufficiently large? Several textbooks claim that n = 30 is enough to assume that a sampling distribution is normal, irrespective of the shape of the source distribution. «This is untrue» [Baguley] There is no magic number to guarantee that.

Log-normal distribution A random variable X is log-normally distributed if the logarithm of X is normally distributed: X LogN(µ, 2 ) () ln(x) N(µ, 2 )

Simple math with logarithms log b (x) =a () b a = x log b (1) = 0 () b 0 =1 log b (b) =1() b 1 = b If the base of the logarithm is equal to the Euler number e = 2.7182, we write: ln(x) =log e (x) Which we base to use is not important, but it is common to use e as a base. 47

Log-normal distribution A common choice for real-world data bounded by zero e.g., response time or task-completion time «The reasons governing frequency distributions in nature usually favor the log-normal, whereas people are in favor of the normal» «For small coefficients of variation, normal and log-normal distribution both fit well.» [ Limpert et al. 2001 ] https://stat.ethz.ch/~stahel/lognormal/bioscience.pdf

Sampling from lognormal distributions dlnorm(x, meanlog = 0, sdlog = 1) 0.0 0.2 0.4 0.6 Distribution of source population µ =0, =1 0 1 2 3 4 5 Sampling distributions of the mean Frequency 0 10000 30000 50000 Frequency 0 10000 20000 30000 40000 Frequency 0 5000 15000 25000 0 2 4 6 8 10 12 14 n=10 2 4 6 8 n=30 1.0 1.5 2.0 2.5 3.0 3.5 n=100

Sampling from lognormal distributions Before... Sampling distributions of the mean Frequency 0 10000 30000 50000 Frequency 0 10000 20000 30000 40000 Frequency 0 5000 15000 25000 0 2 4 6 8 10 12 14 2 4 6 8 1.0 1.5 2.0 2.5 3.0 3.5 n=10...and after applying a log transformation on the data n=30 n=100 Frequency 0 5000 10000 15000 20000 Frequency 0 5000 10000 15000 20000 Frequency 0 5000 10000 15000-1.5-1.0-0.5 0.0 0.5 1.0 1.5 n=10-0.5 0.0 0.5 n=30-0.4-0.2 0.0 0.2 0.4 n=100

R code

Skewed distributions Asymmetrical distributions are said to be skewed

The chi-square (χ 2 ) distribution Consider a squared observation z 2 drawn at random from the standard normal (z) distribution The distribution of z 2 will follow a χ 2 distribution with 1 degree of freedom (df)

The chi-square (χ 2 ) distribution Consider a squared observation z 2 drawn at random from the standard normal (z) distribution The distribution of z 2 will follow a χ 2 distribution with 1 degree of freedom (df) Probability density 0.0 0.1 0.2 0.3 0.4 Probability density 0.0 0.2 0.4 0.6 0.8 1.0 1.2-3 -2-1 0 1 2 3 0 2 4 6 8 10 Z Z 2

The chi-square (χ 2 ) distribution A χ 2 distribution with k degrees of freedom is the distribution of a sum of squares of k independent variables that follow a standard normal distribution Q = kx Zi 2 =) Q i=1 2 (k)

The chi-square (χ 2 ) distribution Given the link between variances and sums of squares, the chi-square distribution is useful for modeling variances of samples from normal (or approximately normal) distributions.

The t distribution (Student s t) The sampling distribution of means for a normally distributed population Useful when: the sample size is small, and the population standard deviation is unknown published by William Gosset (1908) under the pseudonym «Student»

The t distribution (Student s t) When the population standard deviation σ is unknown and is estimated from the unbiased variance estimate: ˆ2 = np (x i ˆµ) i=1 n 1 then, the resulting standardized sample mean has a t distribution with ν = n - 1 degrees of freedom. Note: We ll further explain this later.

The t distribution (Student s t) A random variable X following a t distribution is denoted as: X t( ) Probability density 0.0 0.1 0.2 0.3 0.4 t z Probability density 0.0 0.1 0.2 0.3 0.4 t z -4-2 0 2 4 t(1) -4-2 0 2 4 t(29)

R code

R distribution functions Binomial distribution dbinom(x, n, P) Provides the probability mass function for the binomial distribution B(n,P) Examples: dbinom(4, 20,.2): It will return the probability of x = 4 successes for n = 20 Bernoulli trials with a P=.2 probability of success. dbinom(c(1,2,3,4), 10,.2): It will return a vector with the probabilities of x = {1, 2, 3, 4} successes for n =10 Bernoulli trials with a P =.2 probability of success.

R distribution functions Binomial distribution pbinom(x, n, P) Provides the cumulative probability mass function for the binomial distribution B(n,P) Example: pbinom(4, 20,.2): It will return the cumulative probability up to x = 4 successes for n = 20 Bernoulli trials with a P =.2 probability of success.

R distribution functions Binomial distribution rbinom(size, n, P) It will generate a random sample of size size from the binomial distribution B(n,P) Example: rbinom(10, 20,.2): It will return a random sample of size = 10 from the binomial distribution B(n = 20, P =.2)

R distribution functions Normal distribution dnorm(x, mean, sd) Provides the probability density function for the normal distribution with a mean value equal to mean and a standard deviation equal to sd. Examples: dnorm(.2, 0, 1): It will return the relative likelihood of the value x =.2, for the standard normal distribution. curve(dnorm(x, 100, 15), xlim = c(60, 140)): It will plot the probability density function from x = 60 to x = 140 for the normal distribution with mean = 100 and sd = 15.

R distribution functions Normal distribution pnorm(x, mean, sd) Provides the cumulative probability density function for the normal distribution with a mean value equal to mean and a standard deviation equal to sd. Example: pnorm(100, 100, 15): It will return the cumulative probability up to x = 100 for the normal distribution with mean = 100 and sd = 15. (What do you expect it to be?)

R distribution functions Normal distribution rnorm(size, mean, sd) It will generate a random sample of size size from the normal distribution with a mean value equal to mean and a standard deviation equal to sd. Example: rnorm(10, 0, 1): It will return a random sample of size = 10 from the standard normal distribution.

R distribution functions Binomial Normal Log-normal chi-squared Student Distribution function (pmf or cdf) Cumulative distr. function Random sampling dbinom(x, n, P) pbinom(x, n, P) rbinom(size, n, P) dnorm(x, mean, sd) dlnorm(x, mean, sd) pnorm(x, mean, sd) rnorm(size, mean, sd) plnorm(x, mean, sd) rlnorm(size, mean, sd) dchisq(x, k) pchisq(x, k) rchisq(size, k) dt(x, ν) pt(x, ν) rt(size, ν)

Intro to Confidence Intervals

Statistical inference The process of deducing the parameters of an underlying probability distribution from a sample Four broad types: point estimation interval estimation hypothesis testing prediction

Point estimates How much informative is the following graph? Mean Time (s) 0 1 2 3 4 T1 T2 T3 Techniques

Point estimates A point estimate can be thought of as a «best guess» of the true population parameter Descriptive statistics such a the sample mean or the median are examples of point estimates Question: What are the point estimates of a population s variance and standard deviation?

Point estimates How much informative is the following graph? Mean Time (s) 0 1 2 3 4 T1 T2 T3 Techniques A point estimate communicates no information about the uncertainty or quality of the estimate it provides

Interval estimate An interval estimate does not provide an exact value, but rather a range of values that the parameter might plausibly take. Most common method: constructing a confidence interval (CI)

Confidence interval (CI) It specify a range of values that is expected to contain the true parameter value (but it may not) true value It is associated with a confidence level, usually expressed as a percentage e.g., 95% CI or 99% CI

Formal interpretation of CIs Classical frequentists statistics view a probability as a statement about the frequency with which events occur in the long run. Of the many 95% CIs that might be constructed, 95% are expected to contain the true population parameter. The other 5% may completely fail!

Informal interpretation of CIs Formally speaking, a CI does not specify a probability range! A 95% CI does not necessarily contain the true population parameter with a 95% probability (or 95% confidence). However, it is often reasonable to treat a CI as an expression of confidence or belief that it does contain the true value. See [Baguley] and [Cumming and Finch, 2005] Attention: This view is not shared by everyone! It has received a lot of criticism by Bayesian statisticians.

Confidence level A 100% CI will include the whole range of possible values A 0% CI reduces to a point estimate A 95% CI is the most common choice (by tradition)

alpha If C is the confidence level of a CI, then: C = 100 (1 - α) where α (or alpha) represents the number of times that a C% CI is expected to fail: If C = 95, the α =.05

Structure of a confidence interval It is defined by two points that form its limits, i.e., its lower and upper bounds It can be symmetrical, where the point estimate lies in the center of the CI...or asymmetrical, where the point estimate is not at the center of the CI

Symmetrical CIs The intervals can be described by the point estimate plus or minus half of the interval, e.g., 165 ± 6 cm This half width of the interval is known as the margin of error (MOE)

Width of a CI Depends on the confidence level e.g., 99% CIs are wider than 95% CIs Also depends on the sampling distribution of the statistic The smaller the sample size, the wider the sampling distribution Small samples produce wide CIs

Example Consider the sampling distribution of the mean for a normally distributed population (M = 100, SD = 10) Frequency 0 5000 10000 15000 20000 Frequency 0 5000 10000 15000 20000 Frequency 0 5000 10000 15000 20000 85 90 95 100 105 110 115 120 n=10 85 90 95 100 105 110 115 120 n=30 85 90 95 100 105 110 115 120 n=100 The sampling distribution becomes narrower as more samples are added.

In the following lecture, we will revisit confidence intervals and explain one can construct them. 83