Lecture 3: Probability Distributions (cont d)

Similar documents
Topic 6 - Continuous Distributions I. Discrete RVs. Probability Density. Continuous RVs. Background Reading. Recall the discrete distributions

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Commonly Used Distributions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Statistics 431 Spring 2007 P. Shaman. Preliminaries

What was in the last lecture?

Statistics 6 th Edition

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Business Statistics 41000: Probability 4

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Business Statistics 41000: Probability 3

Some Characteristics of Data

4.3 Normal distribution

Chapter 7 1. Random Variables

The normal distribution is a theoretical model derived mathematically and not empirically.

The Normal Distribution

The Binomial Probability Distribution

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Introduction to Statistical Data Analysis II

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Continuous Probability Distributions & Normal Distribution

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

AP Statistics Ch 8 The Binomial and Geometric Distributions

value BE.104 Spring Biostatistics: Distribution and the Mean J. L. Sherley

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

MVE051/MSG Lecture 7

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

MA : Introductory Probability

Describing Uncertain Variables

ECON 214 Elements of Statistics for Economists 2016/2017

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

Probability and Statistics

Lecture 6: Normal distribution

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

STATISTICS and PROBABILITY

Basic Procedure for Histograms

TOPIC: PROBABILITY DISTRIBUTIONS

Statistics and Probability

A useful modeling tricks.

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Figure 1: 2πσ is said to have a normal distribution with mean µ and standard deviation σ. This is also denoted

Appendix A. Selecting and Using Probability Distributions. In this appendix

Counting Basics. Venn diagrams

Chapter 9: Sampling Distributions

Statistical Tables Compiled by Alan J. Terry

Statistics for Managers Using Microsoft Excel 7 th Edition

The probability of having a very tall person in our sample. We look to see how this random variable is distributed.

Modelling Environmental Extremes

Chapter ! Bell Shaped

Lecture 3: Review of Probability, MATLAB, Histograms

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

2011 Pearson Education, Inc

Data Analysis and Statistical Methods Statistics 651

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Modelling Environmental Extremes

Statistics for Managers Using Microsoft Excel 7 th Edition

4 Random Variables and Distributions

5.2 Random Variables, Probability Histograms and Probability Distributions

Chapter 3 Discrete Random Variables and Probability Distributions

AMS 7 Sampling Distributions, Central limit theorem, Confidence Intervals Lecture 4

Continuous random variables

Lecture Data Science

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Lecture 2. Probability Distributions Theophanis Tsandilas

Data Analytics (CS40003) Practice Set IV (Topic: Probability and Sampling Distribution)

ECON Introductory Econometrics. Lecture 1: Introduction and Review of Statistics

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

Math 227 Elementary Statistics. Bluman 5 th edition

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

UNIT 4 MATHEMATICAL METHODS

Chapter 5: Statistical Inference (in General)

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Chapter 7: Point Estimation and Sampling Distributions

Random Variables and Probability Distributions

Data Analysis and Statistical Methods Statistics 651

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 6: Random Variables

CS145: Probability & Computing

CS 237: Probability in Computing

CS 237: Probability in Computing

Elementary Statistics Lecture 5

Lean Six Sigma: Training/Certification Books and Resources

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 22 January :00 16:00

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Examples of continuous probability distributions: The normal and standard normal

Loss Simulation Model Testing and Enhancement

Transcription:

EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont d) Instructor: Prof. Johnny Luo www.sci.ccny.cuny.edu/~luo

Dates Topic Reading (Based on the 2 nd Edition of Wilks book) Other Activity Aug 31 Introduction; Review of probability Wilks, Chap 2 Pre-test Sep 7 Matlab tutorial (optional) Sep 14 Review of probability; Probability Distribution 1 Wilks, Chap 2, 3 Sep 21 Probability Distribution 2 Wilks, Chap 3, 4 Sep 28 Hypothesis testing Wilks, Chap 5 Oct 5 Linear regression I Wilks Chap 6; von Storch 8-9 Oct 12 Linear regression II Wilks Chap 6; von Storch 8-9 Oct 19 Time series analysis I Wilks 8; von Storch 10-12 Oct 26 Midterm; discussion of final project Nov 2 Time series analysis II Wilks 8; von Storch 10-12 Project 1-page abstract due Univariate Statistics Nov 9 Nov 16 Principal Component Analysis & Empirical orthogonal functions I Principal Component Analysis & Empirical orthogonal functions II Wilks 11; von Storch 13 Wilks 11; von Storch 13 Project progress report due Multivariate Statistics Nov 30 Cluster analysis Wilks 14 Dec 7 Final project presentation

ØParametric Probability Distribution: summarize the observed probability distribution using particular mathematical forms. q Binomial Distribution q Poisson Distribution q Gaussian Distribution q Gamma Distribution

plot(0:20,binopdf(0:20,20,0.5), o- ) N = 20, p = 0.5

plot(0:10,binopdf(0:10,10,0.045), o- ) The Cayuga Lake freezing problem (check it for 10 years) N = 10, p = 0.045 Pr {X=0} = 0.63 Pr {X=1} = 0.30

plot(0:220,binopdf(0:220,220,0.045), o- ) The Cayuga Lake freezing problem (check it for 220 years) N = 220, p = 0.045 Peak probability occurs at X = 10

A variant of Binomial Distribution: Geometric Distribution Random Variable (X) for Binomial Distribution: number of yes (or head) in a sequence of n trials. Random Variable (X) for Geometric Distribution: number of trials required to obtain the next success.

Independence & Multiplicative Law of Probability Two events are independent if the occurrence or nonoccurrence of one does not affect the probability of the other. Pr{1 success} Pr{failure} Pr{failure} : x-1 times

Random Variable (X) for Geometric Distribution: number of trials required to obtain the next success. Geometric Distribution

ØParametric Probability Distribution: summarize the observed probability distribution using particular mathematical forms. q Binomial Distribution q Poisson Distribution q Gaussian Distribution q Gamma Distribution

Discrete Distribution II: Poisson Distribution The Poisson Distribution describes the probability of a given number of events occurring in a fixed interval of time. For example, number of email you receive each day, or number of tornadoes reported in New York State each year. The Poisson Distribution only has one parameter: μ (happens to be the mean of the distribution) μ μ μ

Wikipedia.org

Consider the annual tornado counts in NYS for 1959 1988, in Table 4.3. During the 30 years covered by these data, 138 tornados were reported in New York state. The average, or mean, rate of tornado occurrence is 138/30 = 4.6 /year

Consider the annual tornado counts in NYS for 1959 1988, in Table 4.3. During the 30 years covered by these data, 138 tornados were reported in New York state. The average, or mean, rate of tornado occurrence is 138/30 = 4.6 /year The Poisson distribution fits data fairly well (we will learn how to do the fitting later in class).

>> plot(0:12,poisspdf(0:12,4.6),'o-')

Binomial Diff. b/w Binomial & Poisson distributions Poisson Ø Binomial predicts number of successes (X) within a set number of trials (N). Ø Poisson predicts number of occurrences (X) for a period of time.

An exercise: check your email box and record the number of emails you received for the past, say, 50 days (better limit to weekdays only).

Expected Value of a Random Variable The expected value of a random variable or function of a random variable is simply the probability-weighted average of that variable or function. For example, flip coin 3 times, N = 3, p=0.5, E[X] = 1.5 (in between one head and two heads)

Expected value: Variance:

Outlines 1. Definition of Terms 1. Some Empirical & Exploratory Data Analysis 2. Parametric Distribution I: Discrete Distributions 3. Parametric Distribution II: Continuous Distributions 4. Assessments of the Goodness of Fit

Probability Density Function (PDF): f(x) Analogous to histogram. Probability is represented by the area under the curve Cumulative Distribution Function (CDF): F(x)

Continuous Distribution I: Gaussian Distribution (aka, Normal distribution) Two parameters: μ and σ Why is Gaussian distribution so popular? Central Limit Theorem: as the sample size gets large, the sum (or average) of a set of independent observations will follow a Gaussian distribution, regardless of the distribution of the original variable. A lot of quantities in natural science are the result of many factors superimposed (resembling the sum or average of these factors)

Histograms of the Jan Max Temp in Ithaca. They already look somewhat Gaussianlike, although not exactly. If you plot the distribution of mean max temp. in Jan (i.e., use multiple years of data), it will become more Gaussian.

Mean: 0, standard deviation: 1 Standard Normal Distribution Z-score (random variable) Quantiles (or CDF)

PDF and CDF of a Normal Distribution CDF PDF

Q1: The mean Jan temperature in Ithaca is 22.2 0 F and σ is 4.4 0 F. In Jan 1987, the mean Jan temp. is 21.4 0 F. Assume it follows Gaussian distribution. What is the probability that mean Jan temp. is as cold or colder than Jan 1987?

Q1: The mean Jan temperature in Ithaca is 22.2 0 F and σ is 4.4 0 F. In Jan 1987, the mean Jan temp. is 21.4 0 F. Assume it follows Gaussian distribution. What is the probability that mean Jan temp. is as cold or colder than Jan 1987? z = (21.4 22.2)/4.4 = -0.18

What about z in the positive range?

Q2: The mean Jan temperature in Ithaca is 22.2 0 F and σ is 4.4 0 F. Assume it follows Gaussian distribution. What is the probability that 20 0 F mean temp. 25 0 F?

Q2: The mean Jan temperature in Ithaca is 22.2 0 F and σ is 4.4 0 F. Assume it follows Gaussian distribution. What is the probability that 20 0 F mean temp. 25 0 F? z 20 = (20 22.2)/4.4 = -0.50 z 25 = (25 22.2)/4.4 = 0.64 (Note that 1-0.261 = 0.739)

Mean: 0, standard deviation: 1

Continuous Distribution II: Gamma Distribution Sometimes a variable is constrained by a physical limit on the left. For example, precipitation: it can t be lower than zero, but it can go to infinity (in theory). So, the distribution is not Gaussian, but skewed to the right.

Continuous Distribution II: Gamma Distribution - Random variable: x -Two parameters: 1) α: the shape parameter, 2) β: the scale parameter. Γ(α) is the gamma function.

Standard gamma distribution:

Q1: suppose Ithaca Jan precip follows the Gamma distribution with α 4 and β = 0.52 inches. For Jan 1987, the mean precip in Ithaca is 3.15 inches, use the Table below to find the percentile value for Jan 1987 precip. Standard gamma distribution:

Q1: suppose Ithaca Jan precip follows the Gamma distribution with α 4 and β = 0.52 inches. For Jan 1987, the mean precip in Ithaca is 3.15 inches, use the Table below to find the percentile value for Jan 1987 precip. Step 1: standardize ξ = 3.15/0.52 = 6.06 Step 2: For α 4, standard variable of 6.06 falls in between the cumulative prob. of 0.80 and 0.90. So, it s about 0.85.

How do we estimate the shape parameter (α) and the scale parameter (β)?

A: Shape parameter B: Scale parameter

Outlines 1. Definition of Terms 1. Some Empirical & Exploratory Data Analysis 2. Parametric Distribution I: Discrete Distributions 3. Parametric Distribution II: Continuous Distributions 4. Assessments of the Goodness of Fit

Superimpose the fitted Gaussian and Gamma distribution curved on the raw histogram (Jan 1987 Ithaca precip) More will be covered later in class