Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 2 (2.1-2.6) Fall 2012 Definitions and Notation There are several equivalent ways to characterize the probability distribution of a survival random variable. Some of these are familiar; others are special to survival analysis. We will focus on the following terms: The density function f(t) The survivor function S(t) The hazard function h(t) The cumulative hazard function H(t) Density function f(t) For discrete r.v. s (Probability Mass Function) Suppose that T takes values in a 1, a 2,..., a n. f j if t = a j, j = 1, 2,..., n f(t) = P r(t = t) = 0 if t a j, j = 1, 2,..., n Density Function for continuous r.v. s 1 f(t) = lim P r(t T t + t) t 0 t Survivorship Function: S(t) = P (T t). In other settings, the cumulative distribution function, F (t) = P (T t), is of interest. In survival analysis, our interest tends to focus on the survival function, S(t). 1
For a continuous random variable: S(t) = t f(µ)dµ Exponential (0.5) 0.4 Density f(x) 0.2 x x+dx Time Figure 1: Plot of probability density function The survival function S(x) corresponds to the area under the curve to the right of x. f(x)dx P (x X < x + dx) = S(x) S(x + dx). f(x)dx is infinitesimal prob. of failure at x, unconditionally on whether individual is alive just prior to x. For a discrete random variable: S(t) = P (T > t) = µ>t f(µ) = a j >t f(a j ) = a j >t f j Notes: 2
1. From the definition of S(t) for a continuous variable, S(t) = 1 F (t) as long as F (t) is absolutely continuous w.r.t the Lebesgue measure. [That is, F (t) has a density function.] 2. For a discrete variable, we have to decide what to do if an event occurs exactly at time t; i.e., does that become part of F (t) or S(t)? 3. To get around this problem, several books define S(t) = P (T > t), or else define F (t) = P (T < t) (eg. Collett). K&M used S(t) = P (T > t). Hazard Function h(t) Sometimes called an instantaneous failure rate, the force of mortality, or the age-specific failure rate. 1. Continuous random variables: h(t) = lim t 0 = lim t 0 = lim t 0 = f(t) S(t) 1 P r(t T t + t T t) t 1 P r([t T t + t] [T t]) t P r(t t) 1 P r(t T t + t) t P r(t t) h(t)dt is infinitesimal prob. of failure at the next instant after t, given that one is alive at t. 2. Discrete random variables: 3
Cumulative Hazard Function H(t) Continuous random variables: Discrete random variables: h(a j ) h j = P r(t = a j T a j ) = P r(t = a j) P r(t a j ) = f(a j) S(a j 1 ) f(a j ) = k:a k >a j 1 f(a k ) H(t) = t 0 H(t) = h(µ)dµ k:a k t h k 4
Relationship between S(t) and h(t) We ve already shown that, for a continuous r.v. h(t) = f(t) S(t) For a left-continuous survivor function S(t), we can show: f(t) = S (t) We can use this relationship to show that: So another way to write h(t) is as follows: d dt [log S(t)] = S (t) S(t) = f(t) S(t) = f(t) S(t) h(t) = d [log S(t)] dt 5
Relationship between S(t) and H(t) Continous case: H(t) = = t 0 t 0 t h(µ)dµ f(µ) S(µ) dµ = d log S(µ) 0 dµ = log S(t) + log S(0) S(t) = e H(t) Discrete case: Suppose that a 1 < a 2 < < a K, and a j t < a j+1. 1st way to derive it: S(t) = P (T > t) = P (T a j+1 ) = P (T a 1, T a 2,..., T a j+1 ) = P (T a 1 )P (T a 2 T a 1 ) P (T a j+1 T a j ) = P (T a 1 ) [1 P (T = a 1 T a 1 )] [1 P (T = a j T a j )] = 1 (1 h(a 1 )) (1 h(a j )) = (1 h(a j )). j:a j t 2nd way to derive it: 6
Since we have h(a j ) = f(a j) S(a j 1 ) = S(a j 1) S(a j ) S(a j 1 ) = 1 S(a j), where j = 1,..., K S(a j 1 ) S(a j ) = (1 h(a j ))S(a j 1 ) = = (1 h(a j )) (1 h(a 1 ))S(a 0 ) = (1 h(a j )) (1 h(a 1 )) The last equation is because S(a 0 ) = 1. Now we have S(a j ) = {1 h(a k )}. a k a j Since h(x) = 0 for x a 1,..., a d, we have S(t) = S(a j ) = {1 h(a k )}. k:a k t Cox defines H(t) = k:a k t log(1 h k ) (1) so that S(t) = e H(t) in the discrete case, as well. K&M used H(t) = h k. (2) k:a k t Equation (2) is an approximation of (1) when h k are small (Try log(1 h k) h k h k 0). 1 when 7
Example (discrete): f j = P (X = j) = 1/3, j = 1, 2, 3., S(x) =? (in Figure 2) h(x) =? 1.0 0.8 Survival Probability 0.6 0.4 0.2 0.0 0 1 2 3 4 Time Figure 2: Survival function for a discrete random lifetime 8
Measuring Central Tendency in Survival Mean Survival call this µ µ = = 0 n a j f j j=1 µf(µ)dµ = 0 for discrete T S(µ)dµ for continuous T Mean survival is the area under the curve of survival function. Mean residual life mrl(x) = E(X x X > x). For a continuous variable X, mrl(x) = x (t x)f(t)dt S(x) = x S(t)dt S(x) (integration by parts). Ex, cancer survivors might want to know how long they can live on average after 5 years relapse free survival. Census has been reporting remaining life expectancy in years stratified by gender and race. According to the 2005 data, for women of all races, mrl (0) = 80.4, mrl (65) = 20, and mrl (75) = 12.8. Median Survival call this τ, is defined by S(τ) = 0.5 In practice, we don t usually hit the median survival at exactly one of the failure times. In this case, the estimated median survival is the smallest time τ such that Ŝ(τ) 0.5 pth quantile (also referred to as the 100pth percentile) of the distribution of X, x p satisfies S(x p ) 1 p, i.e. x p = inf{t : S(t) 1 p}. Example: X exponential (λ). What are mean, mrl(x) and median? 9
Example: X Log-normal (µ, σ 2 ), what is x p? 10
Hazard functions can be of different shapes as shown in Figure 3. h(x) (ii) (iv) (i) (iii) (v) Figure 3: Hazard functions of different shapes (i) constant: e.g. survival of patients with advanced chronic disease (ii) increasing: e.g. aging after 65 (iii) decreasing: e.g. survival after surgery (iv) bathtub-shaped: e.g. age-specific mortality (v) Humpshaped: e.g. tuberculosis 11
Estimating the survival or hazard function We can estimate the survival (or hazard) function in two ways: by specifying a parametric model for h(t) based on a particular density function f(t) by developing an empirical estimate of the survival function (i.e., non-parametric estimation) If no censoring: The empirical estimate of the survival function, S(t), is the proportion of individuals with event times greater than t. Ex. 1,2,3 With censoring: If there are censored observations, then S(t) is not a good estimate of the true S(t), so other non-parametric methods must be used to account for censoring (life-table methods, Kaplan-Meier estimator) Ex. 1,2 +,3 12
Some Parametric Survival Distributions 1. The Exponential distribution (1 parameter, λ > 0) f(t) = λe λt for t 0 S(t) = t f(µ)dµ = e λt h(t) = f(t) = λ constant hazard! S(t) H(t) = t h(µ)dµ = t 0 0 λdµ = λt Check: Does S(t) = e H(t)? median: solve 0.5 = S(τ) = e λτ τ = log(0.5) λ mean: 0 µλe λµ dµ = λ 1 mrl and median: X exponential (λ). What are mean, mrl(x) and median? lack of memory ( t 0 > 0, T t 0 T > t 0 T ) (reason? HW) coef. of variation = s.d. mean = 1 empirical check of the data plot log(s(t)) vs. t (should approximate a straight line through origin), what s the slope? (reason? HW) 13
If T has an arbitrary continuous dist n, the H(T ) has an exponential dist n with unit parameter (reason? HW. Hint: S(T ) Unif(0, 1) for any arbitrary continuous r.v.) 14
2. The Weibull distribution (2 parameters) Weibull(γ, λ) Generalizes exponential: S(t) = e λtγ f(t) = d dt S(t) = γλtγ 1 e λtγ h(t) = γλt γ 1 H(t) = t 0 h(µ)dµ = λt γ λ the scale parameter γ the shape parameter The Weibull distribution is convenient because of its simple form. several hazard shapes: γ = 1 constant hazard 0 < γ < 1 decreasing hazard γ > 1 increasing hazard It includes important generalization of the exponential distribution; allows for a power dependence of the hazard on time. empirical check of the data - plot log( logŝ(t)) vs log t - plot should give approximately a straight line. slope γ. intercept log λ(reason?) 15
Figure 4: Hazard functions of Weibull Function 16
3. log-normal: log-normal distribution (w/parameter µ & σ) 1 e (log(t) µ)2 2σ 2 2πσt f(t) = ( 1 S(t) = 1 Φ (log(t) µ) σ ( ) 1 F (t) = Φ (log(t) µ) σ λ(t) = f(t) S(t) ( log(t) µ = φ σ ) ) /(tσ) incomplete normal integral where φ is the density function of standard normal distribution and Φ is the cumulative distribution function of standard normal distribution. simple to apply if no censoring sensitive to the small failure times Log-logistic dist n provides a good approximation to the log-normal distribution (may frequently be a preferable survival time model) log(t ) N(µ, σ) 17
Figure 5: Hazard functions of Log-Normal Function 18
4. log-logistic: X Log logistic (µ, σ 2 ) if Y = ln X logistic (µ, σ 2 ). W standardized logistic, then f W (w) = e w /{1 + e w } 2, S W (w) = 1/{1 + e w }. Y = µ + σw logistic (µ, σ 2 ) with pdf f Y (y) = ln x µ 1 S X (x) = S W ( ) = σ h X (x) = λαxα 1 1+λx α. = 1 ln x µ 1+exp{ } σ e(y µ)/σ. σ(1+e (y µ)/σ ) 2 1+λx α, where α = 1/σ and λ = e µ/σ. h X (x) is monotone decreasing when α 1. h X (x) decreasing from when α < 1 and decreasing from λ when α = 1. For α > 1, h X (x) increases initially to a maximum value at time {(α 1)/λ} 1/α, and then decreases to 0 as time approaches infinity. relatively simple explicit forms for S(t), f(t) & λ(t)(vs. log-normal) more convenient in handling censored data than the log-normal distribution provides a good approximation to the log-normal distribution except in the extreme tails. 19
5. Gamma Distribution: another extension of exponetial distribution. X gamma (λ, γ), λ, γ > 0, f(x) = λγ x γ 1 e λx Γ(γ) No close form for h( ) and S( ) γ = 1, exponential (λ). γ, a normal distribution. λ = 1/2, γ is integer, χ 2 2γ. When γ > 1, h(x) is monotone increasing with h(0) = 0 and h(x) λ as x. When γ < 1, h(x) is monotone decreasing with h(0) = and h(x) λ as x. Not widely used, Weibull more popular. Figure 6: Hazard functions of Gamma Function 20
h(x) (ii) (iv) (i) (iii) (v) Figure 7: Hazard functions of different shapes (i) constant: e.g. exponential (ii) increasing: e.g. Weibull (γ > 1) (iii) decreasing: e.g. Weibull (0 < γ < 1) (iv) bathtub-shaped: e.g. Lifetime Distribution (3 parameters) (see Dimitrakopoulou etc. IEEE TRANSACTIONS ON RELIABILITY 2007), exponential power distribution with α < 1 (Smith-Bain, 1975) (v) Humpshaped: e.g. log-normal 21
Why use one versus another? technical convenience for estimation and inference explicit simple forms for f(t), S(t), and h(t). qualitative shape of hazard function One can usually distinguish between a one-parameter model (like the exponential) and two-parameter (like Weibull or log-normal) in terms of the adequacy of fit to a dataset. Without a lot of data, it may be hard to distinguish between the fits of various 2- parameter models (i.e., Weibull vs log-normal) 22
Choice of distributions 1. convenience for statistical inference 2. existence of explicit, simple forms for S(t), f(t) & h(t) 3. capability of representing both over- and under-dispersion relative to the exponential distribution (coef. of variation= mean s.d. ) 4. qualitative shape of the hazard (monotonicity) 5. behavior of S(t) for small times (guarantee period) 6. behavior of S(t) for large times (medical research) 7. any connection with a special stochastic model of failure 23
Ways to compare different distributions (to highlight the differences or as a basis for an empirical analysis) 1. not effective to consider the density function directly concentrate on plotting and tabulating 2. plot h(t) or log h(t) vs. t or log(t) 3. H(t) or log S(t) or other transforms vs. t or log(t). Discrete failure time models: (group cont. data because it s imprecise) There is no theoretical justification for adopting particular parametric models for discrete failure time data in many applications 24
Some properties useful in assessing distributional form logh(t) H(t) logh(t) Is it constant? linear in t? exponential exponential Is it linear in t? linear in t? Gompertz (ρ 0 = 0) Gompertz (ρ 0 = 0) Is it linear in log t? linear in log t? Weibull Weibull Is it nonmonotonic? asymptotically Log normal linear in t? Log logistic Distribution with exponential tail 25
Regression models for survival data A typical survival regression setting: Let X be the failure time and Z t = (Z 1,..., Z p ) be a p-dimensional vector of explanatory variables. Q: What are we going to model? Approach 1: log-linear ln X = µ + γ t Z + σw, W some known distribution F, µ, σ, γ unknown. Three choices of F 1. F is normal, that is W N(0, 1), then ln X N(µ + γ t Z, σ 2 ) 2. F is standard extreme value distribution, that is f W (w) = exp{w e w }, < w <. Then X Weibull (α, λ), with α = 1/σ, λ = e ( µ σ + γt Z σ ). 3. F is standardized logistic, then it is log logistic regression model. Two interpretations of γ Let Z 1, Z 2 be different covariate values, then E(X Z 1 ) E(X Z 2 ) = eµ+γtz1 E(e σw ) e µ+γt Z 2E(e σw ) = (Z 1 Z 2 ) eγt ln E(X Z 1) E(X Z 2 ) = γt (Z 1 Z 2 ). Unit increase in Z leads to γ increase in log ratio of the means. Let S 0 (x) = P (X > x Z = 0), then P (X > x Z) = S 0 (xe γtz ). Time is accelerated if e γtz > 1 and decelerated if e γtz < 1. 26
Approach 2: multiplicative or additive hazard models Multiplicative models h(x Z = z) = h 0 (x)c(β t z), where c is a non-negative function of covariates. Cox proportional hazards h(x Z) = h 0 (x)e βt z h 0 (x) is baseline hazard, may be unspecified or parameterized, for e.g., h 0 (x) = e α 0+α 1 x+α 2 x 2. β is log hazard ratio when c( ) = exp( ), and does not depend on time. For each unit increase in Z, there is β increase in log hazard ratio. H(x Z) = e βtz H 0 (x), and S(x Z) = (S 0 (x)) eβt Z. Additive models h(x Z) = h 0 (x) + p j=1 z j(x)β j (x). The effects of covariates on survival are allowed to vary with time. Additive models are less frequently used than multiplicative models. 27