Estimation Procedure for Parametric Survival Distribution Without Covariates

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Estimation Procedure for Parametric Survival Distribution Without Covariates"

Transcription

1 Estimation Procedure for Parametric Survival Distribution Without Covariates The maximum likelihood estimates of the parameters of commonly used survival distribution can be found by SAS. The following two procedures can be used to find the maximum likelihood estimates of parameters: 1. PROC LIFEREG 2. PROC PHREG PROC PHREG is more popular, but PROC LIFEREG is not obsolete. In fact, PROC LIFEREG can do some things better than PROC PHREG, and it can do other things that PROC PHREG cannot do it at all. The greatest limitation of PROC LIFEREG is that it does not handle time-dependent covariates, something at which PROC PHREG excels. It should be mentioned that: PROC PHREG only allows right censoring while PROC LIFEREG handle right, left and interval censored data. PROC PHREG only gives nonparametric estimates of the survival function (which can be difficult to interpret). Certain hypothesis test about the shape of hazard function can be tested by using PROC LIFEREG. PROC LIFEREG produces more efficient estimates (with smaller standard errors) than PROC PHREH, if the shape of the survival function is known. We have to create sets of dummy (indicator) variables in the DATA step to represent categorical data in PROC PHREG. PROC LIFEREG automatically creates such variables. We discuss PROC LIFEREG in this chapter. Note that PROC PHREG does semi-parametric regression analysis using a method known as partial likelihood. The reason for using this method (and hence PROC PHREG) become apparent in next chapters.

2 Example: The remission times of 42 patients with acute leukemia were recorded in a clinical trial to assess the ability of 6-mercaptopurine (6-MP) to maintain remission. Each patient was randomized to receive 6-MP or a placebo. The study was terminated after one year. The remission times, in weeks, for 21 patients who received 6-MP are: 6,6,6,7,10,13,16, 22,23,6+,9+,10+, 11+, 17+, 19+, 20+, 25+, 32+, 32+, 34+, 35+ Let t denote the survival time (exact or censored) and C be a dummy variable with C=0 if t is censored and 1 otherwise. Assume that the data have been saved in C:\Example.dat as a text file, which contains two columns (t, in the first column and C in the second column), separated by space(s). The following SAS code for procedure LIFEREG can be used to obtain maximum likelihood estimate of parameters of the lognormal distribution for the observed survival data in C:\Example.Dat. data B; infile 'Data:\Example.dat ; input t c; run; proc lifereg data=b; model t*c(0) = /covb d=lnormal; run; quit The class of regression models estimated by PROC LIFEREG is known as the accelerated failure time (AFT) model. What ROC LIFEREG actually estimates is a special case of AFT that is quite similar in form to an ordinary linear regression model. Let T be a random variable i denoting the event for the i th individual in the sample. The model is then Log = β + σ i (1) T i Where ε i is a random disturbance term and β 0 and σ are parameters to be estimated. The estimated parameters of the lognormal distribution are: 0 ε ) µ = Intercept, and ) σ =Scale

3 25, The SAS System 10:17 Wednesday, January The LIFEREG Procedure Model Information Data Set WORK.B Dependent Variable Log(t) Censoring Variable C Censoring Value(s) 0 Number of Observations 21 Noncensored Values 9 Right Censored Values 12 Left Censored Values 0 Interval Censored Values 0 Name of Distribution Lognormal Log Likelihood Number of Observations Read 21 Number of Observations Used 21 Parameter Information Parameter Intercept Effect Intercept Algorithm converged. Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 Scale Estimated Covariance Matrix Intercept Scale Intercept Scale

4 In ordinary linear regression, the distribution for the disturbance term is normal. PROC LIFEREG allows other distributions for the disturbance termε. For each of these distributions, there is a corresponding distribution for T: Distribution of T Exponential Weibull Log-normal Log-logistic Gamma Distribution of ε Extreme value (one parameter) Extreme value (two parameters) Normal Logistic Log-gamma Note that all AFT models are named for the distribution of T rather than the distribution of LogT orε. Exponential Distribution: To fit the exponential distribution with PROC LIFEREG, we should specify DIST=EXPONENTIAL as an option in the MODEL statement. As we saw in Chapter 6, an exponential distribution for T corresponds to a constant hazard function. That is * Logh ( t) = β 0 We added * to distinguish this coefficient from the coefficient in the first model. It can be shown that the two models are completely equivalent. In fact, we have parameter of the exponential distribution can be obtained by ) λ = exp( INTERCEPT ). * β = β 0 0. The estimated The relationship between parameters in the log-hazard model ( Logh(t) ) and the log-survival time (Log T) is more complicated for other distributions.

5 25, The SAS System 09:35 Wednesday, January The LIFEREG Procedure Model Information Data Set WORK.B Dependent Variable Log(t) Censoring Variable C Censoring Value(s) 0 Number of Observations 21 Noncensored Values 9 Right Censored Values 12 Left Censored Values 0 Interval Censored Values 0 Name of Distribution Exponential Log Likelihood Number of Observations Read 21 Number of Observations Used 21 Algorithm converged. Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 Scale Weibull Scale Weibull Shape Lagrange Multiplier Statistics Parameter Chi-Square Pr > ChiSq Scale

6 Weibull Distribution: To fit the Weibull distribution with PROC LIFEREG, we should specify DIST=WEIBUL as an option in the MODEL statement. The estimated parameter of the Weibull distribution can be obtained by ) λ = exp( INTERCEPT ) and ) γ = 1 Log-Logistic Distribution: To fit the Log-logistic distribution with PROC LIFEREG, we should specify DIST= LLOGISTIC as an option in the MODEL statement. The estimated parameter of the Log-logistic distribution can be obtained by ) INTERCEPT λ = exp( ) and ) γ = 1 Gamma Distribution: We discussed two different gamma distributions: the standard (2- parmeters) gamma distribution and the generalized (3-paramerts) gamma distribution. PROC LIFEREG fits the generalized gamma distribution. Note that the exponential, Weibull, standard gamma, and log-normal distribution (but not the log-logistic) are all special case of the generalized gamma distribution. To fit the generalized gamma distribution with PROC LIFEREG, we should specify DIST=GAMMA as an option in the MODEL statement. The estimated parameter of the generalized gamma distribution can be obtained by ) λ = exp( INTERCEPT ), ) SHAPE α = and ) γ = 1 SHAPE When the shape parameter is 0, we get the log-normal distribution. When it is 1.0, we have the Weibull distribution. When the shape parameter and the scale parameter are equal, we have the standard gamma distribution.

7 As for the standard gamma distribution, there is no direct way of fitting this in PROC LIFEREG. We cannot impose the constraint = SHAPE in PROC LIFEREG since it does not handle equality constraints. However, PROC LIFEREG allows fixing both the scale and shaping parameters at specific values. For example, we can have proc lifereg data=b; model t*c(0) = /d=gamma noshape1 shape1=0.7 noscale scale = 0.7; run; quit We can try a bunch of different values until to find the common value for the shape and scale parameters that maximizes the log-likelihood.