EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont d) Instructor: Prof. Johnny Luo www.sci.ccny.cuny.edu/~luo
Dates Topic Reading (Based on the 2 nd Edition of Wilks book) Other Activity Aug 31 Introduction; Review of probability Wilks, Chap 2 Pre-test Sep 7 Matlab tutorial (optional) Sep 14 Review of probability; Probability Distribution 1 Wilks, Chap 2, 3 Sep 21 Probability Distribution 2 Wilks, Chap 3, 4 Sep 28 Hypothesis testing Wilks, Chap 5 Oct 5 Linear regression I Wilks Chap 6; von Storch 8-9 Oct 12 Linear regression II Wilks Chap 6; von Storch 8-9 Oct 19 Time series analysis I Wilks 8; von Storch 10-12 Oct 26 Midterm; discussion of final project Nov 2 Time series analysis II Wilks 8; von Storch 10-12 Project 1-page abstract due Univariate Statistics Nov 9 Nov 16 Principal Component Analysis & Empirical orthogonal functions I Principal Component Analysis & Empirical orthogonal functions II Wilks 11; von Storch 13 Wilks 11; von Storch 13 Project progress report due Multivariate Statistics Nov 30 Cluster analysis Wilks 14 Dec 7 Final project presentation
ØParametric Probability Distribution: summarize the observed probability distribution using particular mathematical forms. q Binomial Distribution q Poisson Distribution q Gaussian Distribution q Gamma Distribution
plot(0:20,binopdf(0:20,20,0.5), o- ) N = 20, p = 0.5
plot(0:10,binopdf(0:10,10,0.045), o- ) The Cayuga Lake freezing problem (check it for 10 years) N = 10, p = 0.045 Pr {X=0} = 0.63 Pr {X=1} = 0.30
plot(0:220,binopdf(0:220,220,0.045), o- ) The Cayuga Lake freezing problem (check it for 220 years) N = 220, p = 0.045 Peak probability occurs at X = 10
A variant of Binomial Distribution: Geometric Distribution Random Variable (X) for Binomial Distribution: number of yes (or head) in a sequence of n trials. Random Variable (X) for Geometric Distribution: number of trials required to obtain the next success.
Independence & Multiplicative Law of Probability Two events are independent if the occurrence or nonoccurrence of one does not affect the probability of the other. Pr{1 success} Pr{failure} Pr{failure} : x-1 times
Random Variable (X) for Geometric Distribution: number of trials required to obtain the next success. Geometric Distribution
ØParametric Probability Distribution: summarize the observed probability distribution using particular mathematical forms. q Binomial Distribution q Poisson Distribution q Gaussian Distribution q Gamma Distribution
Discrete Distribution II: Poisson Distribution The Poisson Distribution describes the probability of a given number of events occurring in a fixed interval of time. For example, number of email you receive each day, or number of tornadoes reported in New York State each year. The Poisson Distribution only has one parameter: μ (happens to be the mean of the distribution) μ μ μ
Wikipedia.org
Consider the annual tornado counts in NYS for 1959 1988, in Table 4.3. During the 30 years covered by these data, 138 tornados were reported in New York state. The average, or mean, rate of tornado occurrence is 138/30 = 4.6 /year
Consider the annual tornado counts in NYS for 1959 1988, in Table 4.3. During the 30 years covered by these data, 138 tornados were reported in New York state. The average, or mean, rate of tornado occurrence is 138/30 = 4.6 /year The Poisson distribution fits data fairly well (we will learn how to do the fitting later in class).
>> plot(0:12,poisspdf(0:12,4.6),'o-')
Binomial Diff. b/w Binomial & Poisson distributions Poisson Ø Binomial predicts number of successes (X) within a set number of trials (N). Ø Poisson predicts number of occurrences (X) for a period of time.
An exercise: check your email box and record the number of emails you received for the past, say, 50 days (better limit to weekdays only).
Expected Value of a Random Variable The expected value of a random variable or function of a random variable is simply the probability-weighted average of that variable or function. For example, flip coin 3 times, N = 3, p=0.5, E[X] = 1.5 (in between one head and two heads)
Expected value: Variance:
Outlines 1. Definition of Terms 1. Some Empirical & Exploratory Data Analysis 2. Parametric Distribution I: Discrete Distributions 3. Parametric Distribution II: Continuous Distributions 4. Assessments of the Goodness of Fit
Probability Density Function (PDF): f(x) Analogous to histogram. Probability is represented by the area under the curve Cumulative Distribution Function (CDF): F(x)
Continuous Distribution I: Gaussian Distribution (aka, Normal distribution) Two parameters: μ and σ Why is Gaussian distribution so popular? Central Limit Theorem: as the sample size gets large, the sum (or average) of a set of independent observations will follow a Gaussian distribution, regardless of the distribution of the original variable. A lot of quantities in natural science are the result of many factors superimposed (resembling the sum or average of these factors)
Histograms of the Jan Max Temp in Ithaca. They already look somewhat Gaussianlike, although not exactly. If you plot the distribution of mean max temp. in Jan (i.e., use multiple years of data), it will become more Gaussian.
Mean: 0, standard deviation: 1 Standard Normal Distribution Z-score (random variable) Quantiles (or CDF)
PDF and CDF of a Normal Distribution CDF PDF
Q1: The mean Jan temperature in Ithaca is 22.2 0 F and σ is 4.4 0 F. In Jan 1987, the mean Jan temp. is 21.4 0 F. Assume it follows Gaussian distribution. What is the probability that mean Jan temp. is as cold or colder than Jan 1987?
Q1: The mean Jan temperature in Ithaca is 22.2 0 F and σ is 4.4 0 F. In Jan 1987, the mean Jan temp. is 21.4 0 F. Assume it follows Gaussian distribution. What is the probability that mean Jan temp. is as cold or colder than Jan 1987? z = (21.4 22.2)/4.4 = -0.18
What about z in the positive range?
Q2: The mean Jan temperature in Ithaca is 22.2 0 F and σ is 4.4 0 F. Assume it follows Gaussian distribution. What is the probability that 20 0 F mean temp. 25 0 F?
Q2: The mean Jan temperature in Ithaca is 22.2 0 F and σ is 4.4 0 F. Assume it follows Gaussian distribution. What is the probability that 20 0 F mean temp. 25 0 F? z 20 = (20 22.2)/4.4 = -0.50 z 25 = (25 22.2)/4.4 = 0.64 (Note that 1-0.261 = 0.739)
Mean: 0, standard deviation: 1
Continuous Distribution II: Gamma Distribution Sometimes a variable is constrained by a physical limit on the left. For example, precipitation: it can t be lower than zero, but it can go to infinity (in theory). So, the distribution is not Gaussian, but skewed to the right.
Continuous Distribution II: Gamma Distribution - Random variable: x -Two parameters: 1) α: the shape parameter, 2) β: the scale parameter. Γ(α) is the gamma function.
Standard gamma distribution:
Q1: suppose Ithaca Jan precip follows the Gamma distribution with α 4 and β = 0.52 inches. For Jan 1987, the mean precip in Ithaca is 3.15 inches, use the Table below to find the percentile value for Jan 1987 precip. Standard gamma distribution:
Q1: suppose Ithaca Jan precip follows the Gamma distribution with α 4 and β = 0.52 inches. For Jan 1987, the mean precip in Ithaca is 3.15 inches, use the Table below to find the percentile value for Jan 1987 precip. Step 1: standardize ξ = 3.15/0.52 = 6.06 Step 2: For α 4, standard variable of 6.06 falls in between the cumulative prob. of 0.80 and 0.90. So, it s about 0.85.
How do we estimate the shape parameter (α) and the scale parameter (β)?
A: Shape parameter B: Scale parameter
Outlines 1. Definition of Terms 1. Some Empirical & Exploratory Data Analysis 2. Parametric Distribution I: Discrete Distributions 3. Parametric Distribution II: Continuous Distributions 4. Assessments of the Goodness of Fit
Superimpose the fitted Gaussian and Gamma distribution curved on the raw histogram (Jan 1987 Ithaca precip) More will be covered later in class