Lecture 34. Summarizing Data

Similar documents
Lecture Neyman Allocation vs Proportional Allocation and Stratified Random Sampling vs Simple Random Sampling

Survival Analysis APTS 2016/17 Preliminary material

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Basic notions of probability theory: continuous probability distributions. Piero Baraldi

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Probability and Statistics

The Binomial Distribution

Life Tables and Selection

Life Tables and Selection

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

What was in the last lecture?

MA : Introductory Probability

Commonly Used Distributions

Probability Theory and Simulation Methods. April 9th, Lecture 20: Special distributions

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

CS 237: Probability in Computing

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Annuities. Lecture: Weeks 8-9. Lecture: Weeks 8-9 (Math 3630) Annuities Fall Valdez 1 / 41

Chapter 5: Statistical Inference (in General)

King Saud University Academic Year (G) College of Sciences Academic Year (H) Solutions of Homework 1 : Selected problems P exam

STATISTICS and PROBABILITY

Chapter 7: Point Estimation and Sampling Distributions

The mathematical definitions are given on screen.

4 Random Variables and Distributions

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Central Limit Theorem, Joint Distributions Spring 2018

CS134: Networks Spring Random Variables and Independence. 1.2 Probability Distribution Function (PDF) Number of heads Probability 2 0.

2. The sum of all the probabilities in the sample space must add up to 1

The Weibull in R is actually parameterized a fair bit differently from the book. In R, the density for x > 0 is

PSTAT 172A: ACTUARIAL STATISTICS FINAL EXAM

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Annuities. Lecture: Weeks 8-9. Lecture: Weeks 8-9 (Math 3630) Annuities Fall Valdez 1 / 41

Lecture 22. Survey Sampling: an Overview

Much of what appears here comes from ideas presented in the book:

Discrete Random Variables

The Monthly Payment. ( ) ( ) n. P r M = r 12. k r. 12C, which must be rounded up to the next integer.

Annuities. Lecture: Weeks Lecture: Weeks 9-11 (Math 3630) Annuities Fall Valdez 1 / 44

MAS187/AEF258. University of Newcastle upon Tyne

The Binomial Probability Distribution

LECTURE CHAPTER 3 DESCRETE RANDOM VARIABLE

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

STAT/MATH 395 PROBABILITY II

. (i) What is the probability that X is at most 8.75? =.875

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Tutorial 6. Sampling Distribution. ENGG2450A Tutors. 27 February The Chinese University of Hong Kong 1/6

X i = 124 MARTINGALES

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

MATH 3200 Exam 3 Dr. Syring

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Frequency Distribution Models 1- Probability Density Function (PDF)

MATH 3630 Actuarial Mathematics I Class Test 1-3:35-4:50 PM Wednesday, 15 November 2017 Time Allowed: 1 hour and 15 minutes Total Marks: 100 points

Duration Models: Parametric Models

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Probability and Statistics

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

CHAPTERS 5 & 6: CONTINUOUS RANDOM VARIABLES

Probability. An intro for calculus students P= Figure 1: A normal integral

February 2 Math 2335 sec 51 Spring 2016

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Business Statistics 41000: Probability 3

The Binomial Distribution

Elementary Statistics Lecture 5

UCLA Department of Economics Ph.D. Preliminary Exam Industrial Organization Field Exam (Spring 2010) Use SEPARATE booklets to answer each question

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Heriot-Watt University BSc in Actuarial Science Life Insurance Mathematics A (F70LA) Tutorial Problems

IEOR 165 Lecture 1 Probability Review

Chapter 3 Discrete Random Variables and Probability Distributions

MAS187/AEF258. University of Newcastle upon Tyne

The Normal Distribution

Chapter Learning Objectives. Discrete Random Variables. Chapter 3: Discrete Random Variables and Probability Distributions.

Business Statistics 41000: Probability 4

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Quantile Regression in Survival Analysis

Chapter 2 and 3 Exam Prep Questions

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 5

Lecture Stat 302 Introduction to Probability - Slides 15

Strategy -1- Strategy

STAT Chapter 7: Central Limit Theorem

4.3 Normal distribution

November 2001 Course 1 Mathematical Foundations of Actuarial Science. Society of Actuaries/Casualty Actuarial Society

Distribution of the Sample Mean

WARRANTY SERVICING WITH A BROWN-PROSCHAN REPAIR OPTION

TABLE OF CONTENTS - VOLUME 2

Chapter 3 - Lecture 5 The Binomial Probability Distribution

(# of die rolls that satisfy the criteria) (# of possible die rolls)

STAT 241/251 - Chapter 7: Central Limit Theorem

Spread Risk and Default Intensity Models

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 5. Sampling Distributions

M3S1 - Binomial Distribution

Market Volatility and Risk Proxies

MATH 10 INTRODUCTORY STATISTICS

Transcription:

Math 408 - Mathematical Statistics Lecture 34. Summarizing Data April 24, 2013 Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 1 / 15

Agenda Methods Based on the CDF The Empirical CDF Example: Data from Uniform Distribution Example: Data from Normal Distribution Statistical Properties of the ecdf The Survival Function Example: Data from Exponential Distribution The Hazard Function Example: The Hazard Function for the Exponential Distribution Summary Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 2 / 15

Describing Data In the next few Lectures we will discuss methods for describing and summarizing data that are in the form of one or more samples. These methods are useful for revealing the structure of data that are initially in the form of numbers. Example: the arithmetic mean x = (x 1 +... + x n )/n is often used as a summary of a collection of numbers x 1,..., x n : it indicates a typical value. Example: x = (1.5147, 1.7223, 1.063, 1.4916,...) y = (0.7353, 0.0781, 0.276, 1.5666,...) 2.5 2 1.5 1 y 0.5 0 0.5 1 1.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 3 x Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 3 / 15

Empirical CDF Suppose that x 1,..., x n is a batch of numbers. Remark: We use the word sample when X 1,..., X n is a collection of random variables. batch when x 1,..., x n are fixed numbers (realization of sample). Definition The empirical cumulative distribution function (ecdf) is defined as F n (x) = 1 n (#x i x) Denote the ordered batch of numbers by x (1),..., x (n). If x < x (1), then F n (x) = 0 If x (1) x < x (2), then F n (x) = 1/n If x (k) x < x (k+1), then F n (x) = k/n The ecdf is the data analogue of the CDF of a random variable Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 4 / 15

Example: Data from Uniform Distribution Let (X 1,..., X n ) U[0, 1] Let (x 1,..., x n ) is a particular realization of (X 1,..., X n ), n = 50 (x1,..., x n) = (0.24733, 0.3527, 0.18786, 0.49064,...) 1 Empirical CDF 0.9 0.8 0.7 0.6 F n (x) 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 x Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 5 / 15

Example: Data from Normal Distribution Let (X 1,..., X n ) N (0, 1) Let (x 1,..., x n ) is a particular realization of (X 1,..., X n ), n = 50 (x1,..., x n) = ( 0.23573, 0.45952, 0.93808, 0.62162,...) Empirical CDF 1.5 1 F n (x) 0.5 0 0.5 2 1.5 1 0.5 0 0.5 1 1.5 x Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 6 / 15

Statistical Properties of the ecdf Let X 1,..., X n be a random sample from a continuous distribution F. Then the ecdf can be written as follows: F n (x) = 1 n I (,x] (X i ), n where I (,x] (X i ) = i=1 { 1, if Xi x 0, if X i > x The random variables I (,x) (X 1 ),..., I (,x) (X n ) are independent Bernoulli random variables: { 1, with probability F (x) I (,x) (X i ) = 0, with probability 1 F (x) Thus, nf n (x) is a binomial random variable: nf n (x) Bin(n, F (x)) E[F n (x)] = F (x) V[F n (x)] = 1 n F (x)(1 F (x)) V[F n (x)] 0, as n Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 7 / 15

Example: Convergence of the ecdf to the CDF Let (X 1,..., X n ) N (0, 1) Let (x 1,..., x n ) is a particular realization of (X 1,..., X n ), n = 20 1 0.9 0.8 0.7 0.6 Empirical CDF Normal CDF N(0,1) 0.5 0.4 0.3 0.2 0.1 0 4 3 2 1 0 1 2 3 4 Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 8 / 15

Example: Convergence of the ecdf to the CDF Let (X 1,..., X n ) N (0, 1) Let (x 1,..., x n ) is a particular realization of (X 1,..., X n ), n = 100 1 0.9 0.8 0.7 0.6 Empirical CDF Normal CDF N(0,1) 0.5 0.4 0.3 0.2 0.1 0 4 3 2 1 0 1 2 3 4 Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 9 / 15

Example: Convergence of the ecdf to the CDF Let (X 1,..., X n ) N (0, 1) Let (x 1,..., x n ) is a particular realization of (X 1,..., X n ), n = 1000 1 0.9 0.8 0.7 0.6 Empirical CDF Normal CDF N(0,1) 0.5 0.4 0.3 0.2 0.1 0 4 3 2 1 0 1 2 3 4 Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 10 / 15

The Survival Function The survival function is equivalent to the CDF and is defined as S(t) = P(T > t) = 1 F (t) In applications where the data consists of times until failure or death (and are thus nonnegative), it is often customary to work with the survival function rather than the CDF, although the two give equivalent information. Data of this type occur in medical studies reliability studies S(t) = Probability that the lifetime will be longer than t The data analogue of S(t) is the empirical survival function: S n (t) = 1 F n (t) Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 11 / 15

Example: Data from Exponential Distribution Let (X 1,..., X n ) Exp(β), β = 5 Let (x 1,..., x n ) is a particular realization of (X 1,..., X n ), n = 50 (x1,..., x n) = (4.4356, 1.684, 11.376, 4.8357,...) 1 0.9 0.8 0.7 0.6 S n (t) 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 t Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 12 / 15

The Hazard Function Let T is a random variable (time) with the CDF F and PDF f. Definition The hazard function is defined as h(t) = f (t) 1 F (t) = f (t) S(t) The hazard function may be interpreted as the instantaneous death rate for individuals who have survived up to a given time: if an individual is alive at time t, the probability that individual will die in the time interval (t, t + ɛ) is P(t T t + ɛ T t) ɛf (t) 1 F (t) If T is the lifetime of a manufactured component, it maybe natural to think of h(t) as the age-specific failure rate. It may also be expressed as h(t) = d log S(t) dt Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 13 / 15

Example: Hazard Function for the Exponential Distribution Let T Exp(β), then f (t) = 1 β e t/β F (t) = 1 e t/β S(t) = e t/β h(t) = 1 β The instantaneous death rate is constant. If the exponential distribution were used as a model for the lifetime of a component, it would imply that the probability of the component failing did not depend on its age. Typically, a hazard function is U-shaped: the rate of failure is high for very new components because of flaws in the manufacturing process that show up very quickly, the rate of failure is relatively low for components of intermediate age, the rate of failure increases for older components as they wear out. Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 14 / 15

Summary The empirical cumulative distribution function (ecdf) is F n (x) = 1 n (#x i x) The survival function is equivalent to the CDF and is defined as S(t) = P(T > t) = 1 F (t) The data analogue of S(t) is the empirical survival function: S n (t) = 1 F n (t) The hazard function is h(t) = f (t) 1 F (t) = f (t) S(t) may be interpreted as the instantaneous death rate for individuals who have survived up to a given time Konstantin Zuev (USC) Math 408, Lecture 34 April 24, 2013 15 / 15