Favorite Distributions

Similar documents
Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Random Variable: Definition

4.3 Normal distribution

Business Statistics 41000: Probability 4

4 Random Variables and Distributions

BIO5312 Biostatistics Lecture 5: Estimations

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Statistical Tables Compiled by Alan J. Terry

2011 Pearson Education, Inc

Statistics 6 th Edition

Random Variables Handout. Xavier Vilà

Random Variables and Probability Functions

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Probability. An intro for calculus students P= Figure 1: A normal integral

Central Limit Theorem, Joint Distributions Spring 2018

ECON 214 Elements of Statistics for Economists 2016/2017

Commonly Used Distributions

Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

Homework Assignments

The normal distribution is a theoretical model derived mathematically and not empirically.

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

Unit 5: Sampling Distributions of Statistics

Central Limit Theorem (cont d) 7/28/2006

Unit 5: Sampling Distributions of Statistics

Probability Distributions II

CIVL Discrete Distributions

MA : Introductory Probability

Chapter 4 Continuous Random Variables and Probability Distributions

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Chapter 6 Continuous Probability Distributions. Learning objectives

Engineering Statistics ECIV 2305

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Populations and Samples Bios 662

Review. Binomial random variable

Math 227 Elementary Statistics. Bluman 5 th edition

PROBABILITY DISTRIBUTIONS

Chapter 7. Sampling Distributions and the Central Limit Theorem

E509A: Principle of Biostatistics. GY Zou

5. In fact, any function of a random variable is also a random variable

Business Statistics 41000: Probability 3

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

CS 237: Probability in Computing

Probability Models. Grab a copy of the notes on the table by the door

The Binomial Distribution

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Random variables. Contents

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

Statistics, Measures of Central Tendency I

Chapter 4 Continuous Random Variables and Probability Distributions

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

NORMAL APPROXIMATION. In the last chapter we discovered that, when sampling from almost any distribution, e r2 2 rdrdϕ = 2π e u du =2π.

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

CIVL Learning Objectives. Definitions. Discrete Distributions

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Basic notions of probability theory: continuous probability distributions. Piero Baraldi

Discrete Random Variables and Probability Distributions

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

The Normal Distribution

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Statistics for Business and Economics

Continuous random variables

Introduction to Probability

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Elementary Statistics Lecture 5

Probability Models.S2 Discrete Random Variables

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

TOPIC: PROBABILITY DISTRIBUTIONS

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

IEOR 165 Lecture 1 Probability Review

Chapter 7. Sampling Distributions and the Central Limit Theorem

Describing Uncertain Variables

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Statistical Methods in Practice STAT/MATH 3379

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

The probability of having a very tall person in our sample. We look to see how this random variable is distributed.

Statistics 511 Supplemental Materials

Binomial Random Variables. Binomial Random Variables

MA 1125 Lecture 18 - Normal Approximations to Binomial Distributions. Objectives: Compute probabilities for a binomial as a normal distribution.

A useful modeling tricks.

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Counting Basics. Venn diagrams

Some Discrete Distribution Families

The topics in this section are related and necessary topics for both course objectives.

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

AP Statistics Ch 8 The Binomial and Geometric Distributions

Continuous Probability Distributions & Normal Distribution

6. Continous Distributions

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Sampling Distributions For Counts and Proportions

Probability Distributions for Discrete RV

Math 489/Math 889 Stochastic Processes and Advanced Mathematical Finance Dunbar, Fall 2007

value BE.104 Spring Biostatistics: Distribution and the Mean J. L. Sherley

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Transcription:

Favorite Distributions Binomial, Poisson and Normal Here we consider 3 favorite distributions in statistics: Binomial, discovered by James Bernoulli in 1700 Poisson, a limiting form of the Binomial, found by Poisson in1837 Normal, discovered by DeMoivre in 1733; often called the Gaussian because Gauss showed (around 1809) that errors in astronomical observations follow it with much accuracy Binomial Distribution. Review Toss a coin n times with probability p of head (or do any experiment involving n successive trials with probability p of success) This discrete distribution has pdf, for k=0,1,,3, n! k n k px ( k) = p q, where q= 1 p k!( n k)! mean= E(X) = np variance=σ = V(X)=npq q p 1 6pq skewness = kurtosis = npq npq 1

The histogram for binomial distribution n=6, p=. We plotted this using Mathematica <<Statistics`DiscreteDistributions` <<"BarCharts`";<<"Histograms`";<<"PieCharts`" ; bdist = BinomialDistribution[6,.] v=table[pdf[bdist,x],{x,0,6}] BarChart[v] The histogram for binomial distribution n=,50 p=.3 We plotted this using Mathematica ListPlotATable@8k, PDF@BinomialDistribution@50, 0.3D, kd<, 8k, 0, 50<D, Filling -> AxisE

Bernoulli Trials lead to Binomial Distribution 1) The result of each trial may be either success or failure. ) The probability p of success is the same in each trial. 3) The trials are independent; i.e., the outcome of 1 trial does not affect later outcomes. It is rather difficult to find frequency distributions which follow the binomial with great accuracy since most large bodies of data would not have constant probabilities. Poisson Distribution Limiting case of Binomial as n with np=λ constant. n λ λ Using the fact that lim 1 = e we see n n k n! k n k λ λ lim p (1 p) = e k!( n k)! k! n np = λ 3

0.0 The binomial pdf with n=30 and p=.13 Mathematica command to do this 0.15 ListPlot[Table[{k, PDF[BinomialDistribution[30,.13], k]}, {k, 0, 30}],PlotStyle >{Thick,PointSize[Large]}] 0.10 0.05 30*.13 = 3.9 5 10 15 0 5 30 0.0 0.15 Poisson pdf with λ=3.9 Mathematica command ListPlot[Table[{k, PDF[PoissonDistribution[3.9],k]}, {k, 0, 30}],PlotStyle >{Thick,PointSize[Large]}] 010 0.10 The distributions look alike even though n is not very large. 0.05 5 10 15 0 5 30 Properties of the Poisson Distribution It is a discrete distribution; i.e., k= 0 p () k = 1, if p () k = e Mean = E(X) = λ and Variance =Var(X)=λ. Poisson Model # occurrences of something in time 0 t T Divide up the interval [0,T] into n subintervals of length T/n Poisson means probability or more events in 1 subinterval is 0 events are independent probability event occurs in subinterval p = constant X X k λ λ X = total number occurrences in time T, λ=rate of occurrence E(X)=np=λT and so p=λt/n pdf is ( λt k ) p ( k) = e λt X k! k! 4

Example. 4..13. New Zealand, the first country to let women vote. Over 113 years, we get the table below. p(k)=poisson with λ=.36831858. The agreement is pretty good once the last 3 rows are combined. k=yearly ynumber of frequency with k=,3,4, p(k) with k=,3,4, countries letting women vote frequency proportion p(k) added added 0 8 0.76 0.696 1 5 0.1 0.5 4 0.035 0.046 0.053 0.05 3 0 0.000 0.006 4 0.018 0.001 total of frequency 113 mean of frequency 0.36831858 Intervals Between Poisson Distributed Events are Exponentially Distributed Why? We follow the discussion in Bulmer, Principles of Statistics, p. 99. Consider radioactive disintegration, for example. If, on average, there are λ disintegrations per second, then the number of disintegrations in t seconds is about λt. Thus, if disintegrations are Poisson, they will have mean λt. So the probability of no disintegrations in t seconds is e λt. This is the same as the probability that we must wait more than t seconds before the 1 st disintegration occurs. Write Y for the waiting time before the occurrence of the 1 st disintegration, then Prob(Y>t)= e λt λ. Thus Prob(Y t)=1 e λt λ. So, taking the derivative of this last function with respect to t, gives us the pdf of Y which is λe λt = the pdf of the exponential distribution! QED 5

Example from our text page 91. The eruptions of Mauna Loa the 14,000 ft volcano in Hawaii with a famous telescope have been measured since 183. During the time from 183 1950, the rate of occurrence of eruptions was λ.07 per month. So our exponential pdf is p(y)=.07 e.07y. The agreement with the table is pretty ygood. distribution of time intervals between eruptions of Mauna Loa (183 1950) y interval in months frequency density p(y) 1.000 [0,0) 13.000 0.018 0.07.000 [0,40) 9.000 0.013 0.016 3.000 [40,60) 5.000 0.007 0.009 4.000 [60,80) 6.000 0.008 0.005 5.000 [80,100) 0.000 0.000 0.003 6.000 [100,10) 1.000 0.001 0.00 f( z) Normal Distribution 0.4 1 z e ππ = 0.3 DeMoivre discovered that when p=1/, the binomial distribution was closely approximated by the normal distribution for large n. In fact, this works for any value of p and in even greater generality (the central limit theorem). It is somewhat surprising as the binomial is skewed when p is not ½. The skewness of the binomial pdf disappears when n gets large. 0. 0.1-4 - 4 6

Plots of Normal Densities with Mean μ=0, Standard Deviations σ=1, 1/, and Mathematica Command Plot[{PDF[NormalDistribution[0,1],x],PDF[NormalDistribution[0,.5],x], PDF[NormalDistribution[0,],x]},{x, 5,5},Filling >Axis,PlotRange >Full] Full] small std dev 1 σ π e 1 ( ) x μ σ large std dev Properties of the Normal Distribution 1 σ π The normal density is a pdf meaning that the integral over the real line is 1. The mean (also median and mode) = μ and the standard deviation = σ. e 1 ( ) x μ σ If X is normal with mean μ and standard deviation σ, then Y=a+bX is normal with mean a+bμ and standard deviation b σ. Proofs 1) Square the integral and change to polar coordinates ) For the mean use substitution, for the variance, use integration by parts to reduce to 1). 3) Use theorems from chapter 3, pages 187 and 197. 7

Example. The IQ measured by the Stanford Binet test is approximately normally distributed with mean 100 and standard deviation 16. Find the probability of having an IQ 10. Let X be the r.v. for IQ. To use tables in our book, we must change to the standard normal distribution using the Z transform Z=(X μ)/σ. Then the cumulative probability distribution is X F(x)=Prob(X x)= Pr μ x μ x μ =Φ σ σ σ Here Φ(x) =cumulative probability density for the standard normal x distribution: 1 t Φ ( x) = e dt π There are tables for this function in the back of our text. Of course programs like Mathematica will compute it for you. One finds using table, p. 85: Prob(X 10)= X 100 10 100 Pr Φ( 1.5 ).8944 16 16 In Mathematica you don t need to normalize. Just say: CDF[NormalDistribution[μ,σ]]. In Mathematica you don t need to normalize. Mathematica computes it in terms of the error function erf(x). The cdf for a normal rv X with meanμ and standard deviation σ is: 1 x μ F( x) = erf + 1, where σσ p erf HzL = Ÿ 0 z e -t dt So to do our IQ test problem in Mathematica, we input: CDF[NormalDistribution[100,16],10] and Mathematica responds: 1/ (1+Erf[5/(4 )]) We want a number so we write: CDF[NormalDistribution[100,16],10]//N 16] 10]//N and Mathematica gives us: 0.89435 8

The Central Limit Theorem. Suppose X 1, X,.., X n, is an infinite sequence of independent random variables, each having the same pdf. Suppose that the mean of the pdf is μ and the standard deviation is σ (both finite). Then for any real number a,b, we have b X1 + + X 1 z n nμ lim a b = e dz n σ n π a This theorem is due to Lindeberg. Others had proved it under more restrictive conditions. There is a proof using moment generating functions. See our text p. 341 or see Bulmer, Principles of Statistics, p. 116. For a proof using Fourier analysis, see my book, Harmonic Analysis on Symmetric Spaces and Applications, I, pp. 6 7. See Feller, An Introduction to Probability Theory & its Applications, I, II. The Continuity Correction Let s take another look at the binomial distribution and its approximation by the normal distribution. Let the binomial random variable have n=5 and p=1/; i.e 5 flips of a fair coin. We can compute the following probability using Mathematica: Pr(X 14) =.7878. CDF[BinomialDistribution[5,1/],14]//N To draw picture: <<Statistics`DiscreteDistributions` <<"BarCharts`"; <<"Histograms`"; <<"PieCharts`" bdist = BinomialDistribution[5, 0.5] v=table[pdf[bdist,x],{x,0,5}]; BarChart[v] the bars of the histogram are centered at numbers so that we ll get a better estimate if we go up to 14.5 rather than 14 on the normal curve 9

Compare binomial with a normal random variable Y with the same mean μ=np=5/=1.5, standard deviation σ=(np(1 p)) 1/ =(5/4) 1/ =5/=.5 Plot[PDF[NormalDistribution[1.5,.5],x], {x,0,5},filling >Axis,PlotRange >Full] Now to compute the probabilities. If you want to use the table in the back of our text, p. 85, use Z=(Y 1.5)/.5: noting that (14 1.5)/.5=.6 and the continuity correction version (14.5 1.5)/.5=.8 Pr(Z.6).757 Pr(Z.8).7881 or use Mathematica without going to the standard normal CDF[NormalDistribution[1.5,.5],14]//N 0.75747 CDF[NormalDistribution[1.5,.5],14.5]//N 0.788145 The exact answer for the binomial distribution was Pr(X 14) =.7878. So the continuity correction was much closer. binomial n=5, p=.5 normal with same mean, standard deviation 10

Claim: The normal is only a good approximation to the binomial distribution when np and n(1 p) 5. In our case above, 5(.5)=1.5. More examples. Example 1. n=3 n3, p=.1 Example. n=10, p=.1 Example 3. n=50, p=.1 What s so great about the normal distribution? It is easy to compute. By table with the Z transform or by computer with Mathematica or whatever. A f l b f i d d Any sum of a large number of independent identically distributed random variables converges to it. 11