Chapter 2 ( ) Fall 2012

Similar documents
Survival Analysis APTS 2016/17 Preliminary material

Duration Models: Parametric Models

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Reliability and Risk Analysis. Survival and Reliability Function

Basic notions of probability theory: continuous probability distributions. Piero Baraldi

Estimation Procedure for Parametric Survival Distribution Without Covariates

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Commonly Used Distributions

Hedge funds and Survival analysis

Continuous random variables

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Bivariate Birnbaum-Saunders Distribution

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

The Normal Distribution

Confidence Intervals for an Exponential Lifetime Percentile

Gamma Distribution Fitting

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

The Weibull in R is actually parameterized a fair bit differently from the book. In R, the density for x > 0 is

Duration Models: Modeling Strategies

Financial Risk Management

Exam M Fall 2005 PRELIMINARY ANSWER KEY

Homework Problems Stat 479

Homework Assignments

Quantile Regression in Survival Analysis

Managing Systematic Mortality Risk in Life Annuities: An Application of Longevity Derivatives

1. You are given the following information about a stationary AR(2) model:

Chapter 3 Common Families of Distributions. Definition 3.4.1: A family of pmfs or pdfs is called exponential family if it can be expressed as

IEOR E4602: Quantitative Risk Management

Some Characteristics of Data

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

PASS Sample Size Software

Business Statistics 41000: Probability 3

Random Variables and Probability Distributions

Probability. An intro for calculus students P= Figure 1: A normal integral

Frequency and Severity with Coverage Modifications

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

6. Continous Distributions

Random variables. Contents

Practice Exam 1. Loss Amount Number of Losses

Multivariate Cox PH model with log-skew-normal frailties

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Modelling Environmental Extremes

Modelling Environmental Extremes

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Random Variables Handout. Xavier Vilà

II. Random Variables

On the comparison of the Fisher information of the log-normal and generalized Rayleigh distributions

Chapter 7: Portfolio Theory

Populations and Samples Bios 662

Numerical Descriptions of Data

Advanced Tools for Risk Management and Asset Pricing

Universität Regensburg Mathematik

A Comprehensive, Non-Aggregated, Stochastic Approach to. Loss Development

Statistical Tables Compiled by Alan J. Terry

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Statistics for Business and Economics

The Multinomial Logit Model Revisited: A Semiparametric Approach in Discrete Choice Analysis

Probability distributions relevant to radiowave propagation modelling

Financial Risk Forecasting Chapter 9 Extreme Value Theory

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Chapter 4 Continuous Random Variables and Probability Distributions

Logit Models for Binary Data

The Bernoulli distribution

Back to estimators...

REINSURANCE RATE-MAKING WITH PARAMETRIC AND NON-PARAMETRIC MODELS

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

Interest rate models and Solvency II

Lecture 3: Probability Distributions (cont d)

Lecture 10: Point Estimation

What was in the last lecture?

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Lecture 4. Finite difference and finite element methods

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -26 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

An Information Based Methodology for the Change Point Problem Under the Non-central Skew t Distribution with Applications.

Slides for Risk Management

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Statistics and Probability

M.I.T Fall Practice Problems

Calculating VaR. There are several approaches for calculating the Value at Risk figure. The most popular are the

Earnings Inequality and the Minimum Wage: Evidence from Brazil

Survival models. F x (t) = Pr[T x t].

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Credit Risk. June 2014

Structural Models of Credit Risk and Some Applications

Heterogeneous Firm, Financial Market Integration and International Risk Sharing

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

Survival Data Analysis Parametric Models

The comparison of proportional hazards and accelerated failure time models in analyzing the first birth interval survival data

The Normal Distribution. (Ch 4.3)

MATH3075/3975 FINANCIAL MATHEMATICS TUTORIAL PROBLEMS

Chapter 7: Point Estimation and Sampling Distributions

Course information FN3142 Quantitative finance

Transcription:

Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 2 (2.1-2.6) Fall 2012 Definitions and Notation There are several equivalent ways to characterize the probability distribution of a survival random variable. Some of these are familiar; others are special to survival analysis. We will focus on the following terms: The density function f(t) The survivor function S(t) The hazard function h(t) The cumulative hazard function H(t) Density function f(t) For discrete r.v. s (Probability Mass Function) Suppose that T takes values in a 1, a 2,..., a n. f j if t = a j, j = 1, 2,..., n f(t) = P r(t = t) = 0 if t a j, j = 1, 2,..., n Density Function for continuous r.v. s 1 f(t) = lim P r(t T t + t) t 0 t Survivorship Function: S(t) = P (T t). In other settings, the cumulative distribution function, F (t) = P (T t), is of interest. In survival analysis, our interest tends to focus on the survival function, S(t). 1

For a continuous random variable: S(t) = t f(µ)dµ Exponential (0.5) 0.4 Density f(x) 0.2 x x+dx Time Figure 1: Plot of probability density function The survival function S(x) corresponds to the area under the curve to the right of x. f(x)dx P (x X < x + dx) = S(x) S(x + dx). f(x)dx is infinitesimal prob. of failure at x, unconditionally on whether individual is alive just prior to x. For a discrete random variable: S(t) = P (T > t) = µ>t f(µ) = a j >t f(a j ) = a j >t f j Notes: 2

1. From the definition of S(t) for a continuous variable, S(t) = 1 F (t) as long as F (t) is absolutely continuous w.r.t the Lebesgue measure. [That is, F (t) has a density function.] 2. For a discrete variable, we have to decide what to do if an event occurs exactly at time t; i.e., does that become part of F (t) or S(t)? 3. To get around this problem, several books define S(t) = P (T > t), or else define F (t) = P (T < t) (eg. Collett). K&M used S(t) = P (T > t). Hazard Function h(t) Sometimes called an instantaneous failure rate, the force of mortality, or the age-specific failure rate. 1. Continuous random variables: h(t) = lim t 0 = lim t 0 = lim t 0 = f(t) S(t) 1 P r(t T t + t T t) t 1 P r([t T t + t] [T t]) t P r(t t) 1 P r(t T t + t) t P r(t t) h(t)dt is infinitesimal prob. of failure at the next instant after t, given that one is alive at t. 2. Discrete random variables: 3

Cumulative Hazard Function H(t) Continuous random variables: Discrete random variables: h(a j ) h j = P r(t = a j T a j ) = P r(t = a j) P r(t a j ) = f(a j) S(a j 1 ) f(a j ) = k:a k >a j 1 f(a k ) H(t) = t 0 H(t) = h(µ)dµ k:a k t h k 4

Relationship between S(t) and h(t) We ve already shown that, for a continuous r.v. h(t) = f(t) S(t) For a left-continuous survivor function S(t), we can show: f(t) = S (t) We can use this relationship to show that: So another way to write h(t) is as follows: d dt [log S(t)] = S (t) S(t) = f(t) S(t) = f(t) S(t) h(t) = d [log S(t)] dt 5

Relationship between S(t) and H(t) Continous case: H(t) = = t 0 t 0 t h(µ)dµ f(µ) S(µ) dµ = d log S(µ) 0 dµ = log S(t) + log S(0) S(t) = e H(t) Discrete case: Suppose that a 1 < a 2 < < a K, and a j t < a j+1. 1st way to derive it: S(t) = P (T > t) = P (T a j+1 ) = P (T a 1, T a 2,..., T a j+1 ) = P (T a 1 )P (T a 2 T a 1 ) P (T a j+1 T a j ) = P (T a 1 ) [1 P (T = a 1 T a 1 )] [1 P (T = a j T a j )] = 1 (1 h(a 1 )) (1 h(a j )) = (1 h(a j )). j:a j t 2nd way to derive it: 6

Since we have h(a j ) = f(a j) S(a j 1 ) = S(a j 1) S(a j ) S(a j 1 ) = 1 S(a j), where j = 1,..., K S(a j 1 ) S(a j ) = (1 h(a j ))S(a j 1 ) = = (1 h(a j )) (1 h(a 1 ))S(a 0 ) = (1 h(a j )) (1 h(a 1 )) The last equation is because S(a 0 ) = 1. Now we have S(a j ) = {1 h(a k )}. a k a j Since h(x) = 0 for x a 1,..., a d, we have S(t) = S(a j ) = {1 h(a k )}. k:a k t Cox defines H(t) = k:a k t log(1 h k ) (1) so that S(t) = e H(t) in the discrete case, as well. K&M used H(t) = h k. (2) k:a k t Equation (2) is an approximation of (1) when h k are small (Try log(1 h k) h k h k 0). 1 when 7

Example (discrete): f j = P (X = j) = 1/3, j = 1, 2, 3., S(x) =? (in Figure 2) h(x) =? 1.0 0.8 Survival Probability 0.6 0.4 0.2 0.0 0 1 2 3 4 Time Figure 2: Survival function for a discrete random lifetime 8

Measuring Central Tendency in Survival Mean Survival call this µ µ = = 0 n a j f j j=1 µf(µ)dµ = 0 for discrete T S(µ)dµ for continuous T Mean survival is the area under the curve of survival function. Mean residual life mrl(x) = E(X x X > x). For a continuous variable X, mrl(x) = x (t x)f(t)dt S(x) = x S(t)dt S(x) (integration by parts). Ex, cancer survivors might want to know how long they can live on average after 5 years relapse free survival. Census has been reporting remaining life expectancy in years stratified by gender and race. According to the 2005 data, for women of all races, mrl (0) = 80.4, mrl (65) = 20, and mrl (75) = 12.8. Median Survival call this τ, is defined by S(τ) = 0.5 In practice, we don t usually hit the median survival at exactly one of the failure times. In this case, the estimated median survival is the smallest time τ such that Ŝ(τ) 0.5 pth quantile (also referred to as the 100pth percentile) of the distribution of X, x p satisfies S(x p ) 1 p, i.e. x p = inf{t : S(t) 1 p}. Example: X exponential (λ). What are mean, mrl(x) and median? 9

Example: X Log-normal (µ, σ 2 ), what is x p? 10

Hazard functions can be of different shapes as shown in Figure 3. h(x) (ii) (iv) (i) (iii) (v) Figure 3: Hazard functions of different shapes (i) constant: e.g. survival of patients with advanced chronic disease (ii) increasing: e.g. aging after 65 (iii) decreasing: e.g. survival after surgery (iv) bathtub-shaped: e.g. age-specific mortality (v) Humpshaped: e.g. tuberculosis 11

Estimating the survival or hazard function We can estimate the survival (or hazard) function in two ways: by specifying a parametric model for h(t) based on a particular density function f(t) by developing an empirical estimate of the survival function (i.e., non-parametric estimation) If no censoring: The empirical estimate of the survival function, S(t), is the proportion of individuals with event times greater than t. Ex. 1,2,3 With censoring: If there are censored observations, then S(t) is not a good estimate of the true S(t), so other non-parametric methods must be used to account for censoring (life-table methods, Kaplan-Meier estimator) Ex. 1,2 +,3 12

Some Parametric Survival Distributions 1. The Exponential distribution (1 parameter, λ > 0) f(t) = λe λt for t 0 S(t) = t f(µ)dµ = e λt h(t) = f(t) = λ constant hazard! S(t) H(t) = t h(µ)dµ = t 0 0 λdµ = λt Check: Does S(t) = e H(t)? median: solve 0.5 = S(τ) = e λτ τ = log(0.5) λ mean: 0 µλe λµ dµ = λ 1 mrl and median: X exponential (λ). What are mean, mrl(x) and median? lack of memory ( t 0 > 0, T t 0 T > t 0 T ) (reason? HW) coef. of variation = s.d. mean = 1 empirical check of the data plot log(s(t)) vs. t (should approximate a straight line through origin), what s the slope? (reason? HW) 13

If T has an arbitrary continuous dist n, the H(T ) has an exponential dist n with unit parameter (reason? HW. Hint: S(T ) Unif(0, 1) for any arbitrary continuous r.v.) 14

2. The Weibull distribution (2 parameters) Weibull(γ, λ) Generalizes exponential: S(t) = e λtγ f(t) = d dt S(t) = γλtγ 1 e λtγ h(t) = γλt γ 1 H(t) = t 0 h(µ)dµ = λt γ λ the scale parameter γ the shape parameter The Weibull distribution is convenient because of its simple form. several hazard shapes: γ = 1 constant hazard 0 < γ < 1 decreasing hazard γ > 1 increasing hazard It includes important generalization of the exponential distribution; allows for a power dependence of the hazard on time. empirical check of the data - plot log( logŝ(t)) vs log t - plot should give approximately a straight line. slope γ. intercept log λ(reason?) 15

Figure 4: Hazard functions of Weibull Function 16

3. log-normal: log-normal distribution (w/parameter µ & σ) 1 e (log(t) µ)2 2σ 2 2πσt f(t) = ( 1 S(t) = 1 Φ (log(t) µ) σ ( ) 1 F (t) = Φ (log(t) µ) σ λ(t) = f(t) S(t) ( log(t) µ = φ σ ) ) /(tσ) incomplete normal integral where φ is the density function of standard normal distribution and Φ is the cumulative distribution function of standard normal distribution. simple to apply if no censoring sensitive to the small failure times Log-logistic dist n provides a good approximation to the log-normal distribution (may frequently be a preferable survival time model) log(t ) N(µ, σ) 17

Figure 5: Hazard functions of Log-Normal Function 18

4. log-logistic: X Log logistic (µ, σ 2 ) if Y = ln X logistic (µ, σ 2 ). W standardized logistic, then f W (w) = e w /{1 + e w } 2, S W (w) = 1/{1 + e w }. Y = µ + σw logistic (µ, σ 2 ) with pdf f Y (y) = ln x µ 1 S X (x) = S W ( ) = σ h X (x) = λαxα 1 1+λx α. = 1 ln x µ 1+exp{ } σ e(y µ)/σ. σ(1+e (y µ)/σ ) 2 1+λx α, where α = 1/σ and λ = e µ/σ. h X (x) is monotone decreasing when α 1. h X (x) decreasing from when α < 1 and decreasing from λ when α = 1. For α > 1, h X (x) increases initially to a maximum value at time {(α 1)/λ} 1/α, and then decreases to 0 as time approaches infinity. relatively simple explicit forms for S(t), f(t) & λ(t)(vs. log-normal) more convenient in handling censored data than the log-normal distribution provides a good approximation to the log-normal distribution except in the extreme tails. 19

5. Gamma Distribution: another extension of exponetial distribution. X gamma (λ, γ), λ, γ > 0, f(x) = λγ x γ 1 e λx Γ(γ) No close form for h( ) and S( ) γ = 1, exponential (λ). γ, a normal distribution. λ = 1/2, γ is integer, χ 2 2γ. When γ > 1, h(x) is monotone increasing with h(0) = 0 and h(x) λ as x. When γ < 1, h(x) is monotone decreasing with h(0) = and h(x) λ as x. Not widely used, Weibull more popular. Figure 6: Hazard functions of Gamma Function 20

h(x) (ii) (iv) (i) (iii) (v) Figure 7: Hazard functions of different shapes (i) constant: e.g. exponential (ii) increasing: e.g. Weibull (γ > 1) (iii) decreasing: e.g. Weibull (0 < γ < 1) (iv) bathtub-shaped: e.g. Lifetime Distribution (3 parameters) (see Dimitrakopoulou etc. IEEE TRANSACTIONS ON RELIABILITY 2007), exponential power distribution with α < 1 (Smith-Bain, 1975) (v) Humpshaped: e.g. log-normal 21

Why use one versus another? technical convenience for estimation and inference explicit simple forms for f(t), S(t), and h(t). qualitative shape of hazard function One can usually distinguish between a one-parameter model (like the exponential) and two-parameter (like Weibull or log-normal) in terms of the adequacy of fit to a dataset. Without a lot of data, it may be hard to distinguish between the fits of various 2- parameter models (i.e., Weibull vs log-normal) 22

Choice of distributions 1. convenience for statistical inference 2. existence of explicit, simple forms for S(t), f(t) & h(t) 3. capability of representing both over- and under-dispersion relative to the exponential distribution (coef. of variation= mean s.d. ) 4. qualitative shape of the hazard (monotonicity) 5. behavior of S(t) for small times (guarantee period) 6. behavior of S(t) for large times (medical research) 7. any connection with a special stochastic model of failure 23

Ways to compare different distributions (to highlight the differences or as a basis for an empirical analysis) 1. not effective to consider the density function directly concentrate on plotting and tabulating 2. plot h(t) or log h(t) vs. t or log(t) 3. H(t) or log S(t) or other transforms vs. t or log(t). Discrete failure time models: (group cont. data because it s imprecise) There is no theoretical justification for adopting particular parametric models for discrete failure time data in many applications 24

Some properties useful in assessing distributional form logh(t) H(t) logh(t) Is it constant? linear in t? exponential exponential Is it linear in t? linear in t? Gompertz (ρ 0 = 0) Gompertz (ρ 0 = 0) Is it linear in log t? linear in log t? Weibull Weibull Is it nonmonotonic? asymptotically Log normal linear in t? Log logistic Distribution with exponential tail 25

Regression models for survival data A typical survival regression setting: Let X be the failure time and Z t = (Z 1,..., Z p ) be a p-dimensional vector of explanatory variables. Q: What are we going to model? Approach 1: log-linear ln X = µ + γ t Z + σw, W some known distribution F, µ, σ, γ unknown. Three choices of F 1. F is normal, that is W N(0, 1), then ln X N(µ + γ t Z, σ 2 ) 2. F is standard extreme value distribution, that is f W (w) = exp{w e w }, < w <. Then X Weibull (α, λ), with α = 1/σ, λ = e ( µ σ + γt Z σ ). 3. F is standardized logistic, then it is log logistic regression model. Two interpretations of γ Let Z 1, Z 2 be different covariate values, then E(X Z 1 ) E(X Z 2 ) = eµ+γtz1 E(e σw ) e µ+γt Z 2E(e σw ) = (Z 1 Z 2 ) eγt ln E(X Z 1) E(X Z 2 ) = γt (Z 1 Z 2 ). Unit increase in Z leads to γ increase in log ratio of the means. Let S 0 (x) = P (X > x Z = 0), then P (X > x Z) = S 0 (xe γtz ). Time is accelerated if e γtz > 1 and decelerated if e γtz < 1. 26

Approach 2: multiplicative or additive hazard models Multiplicative models h(x Z = z) = h 0 (x)c(β t z), where c is a non-negative function of covariates. Cox proportional hazards h(x Z) = h 0 (x)e βt z h 0 (x) is baseline hazard, may be unspecified or parameterized, for e.g., h 0 (x) = e α 0+α 1 x+α 2 x 2. β is log hazard ratio when c( ) = exp( ), and does not depend on time. For each unit increase in Z, there is β increase in log hazard ratio. H(x Z) = e βtz H 0 (x), and S(x Z) = (S 0 (x)) eβt Z. Additive models h(x Z) = h 0 (x) + p j=1 z j(x)β j (x). The effects of covariates on survival are allowed to vary with time. Additive models are less frequently used than multiplicative models. 27