Implementing Personalized Medicine: Estimating Optimal Treatment Regimes

Implementing Personalized Medicine: Estimating Optimal Treatment Regimes Baqun Zhang, Phillip Schulte, Anastasios Tsiatis, Eric Laber, and Marie Davidian Department of Statistics North Carolina State University http://www4.stat.ncsu.edu/~davidian 1/39 Optimal Treatment Regimes

Personalized medicine Clinical practice: Clinicians make (a series of) treatment decision(s) over the course of a patient s disease or disorder Fixed schedule Milestone in the disease process Event necessitating a decision Personalized medicine: Make the best treatment decision(s) for a patient given all the available information on the patient up to the time of the decision Genetic/genomic, demographic,... Physiologic, clinical measures, medical history,... 2/39 Optimal Treatment Regimes

Dynamic treatment regime Operationalizing personalized medicine: At any decision Need a rule that takes as input the accrued information on the patient to that point and dictates the next treatment from among the possible options Rule(s) must be developed based on evidence ; i.e., data Ideally, rule(s) should lead to the best clinical outcome Dynamic treatment regime: A set of formal such rules, each corresponding to a decision point 3/39 Optimal Treatment Regimes

Single decision Simple example: Which treatment to give to patients who present with primary operable breast cancer? Treatment options : L-phenylalanine mustard and 5-fluorouracil (c 1 ) or c 1 + tamoxifen (c 2 ) Information : age, progesterone receptor level (PR) Example rule: If age < 50 years and PR < 10 fmol, give c 1 (coded as 1); otherwise, give c 2 (coded as 0) Mathematically, the rule d is d(age, PR) = I(age < 50 and PR < 10) Alternatively : Rules of form d(age, PR) = I{age > 60 8.7 log(pr)} 4/39 Optimal Treatment Regimes

Multiple decision points Two decision points: Decision 1 : Induction chemotherapy (options c 1, c 2 ) Decision 2 : Maintenance treatment for patients who respond (options m 1, m 2 ) Salvage chemotherapy for those who don t (options s 1, s 2 ) 5/39 Optimal Treatment Regimes

Multiple decision points At baseline: Information x 1 (accrued information h 1 = x 1 ) Decision 1: Two options {c 1, c 2 }; rule 1: d 1 (x 1 ) x 1 {c 1, c 2 } Between decisions 1 and 2: Collect additional information x 2, including responder status Accrued information h 2 = {x 1, chemotherapy at decision 1, x 2 } Decision 2: Four options {m 1, m 2, s 1, s 2 }; rule 2: d 2 (h 2 ) h 2 {m 1, m 2 } (responder), h 2 {s 1, s 2 } (nonresponder) Regime : d = (d 1, d 2 ) 6/39 Optimal Treatment Regimes

Summary Single decision: 1 decision point Baseline information x X Set of treatment options a A Decision rule d(x), d : X A Treatment regime: d Multiple decisions: K decision points Baseline information x 1, intermediate information x k between decisions k 1 and k, k = 2,..., K Set of treatment options at each decision k: a k A k Accrued information h 1 = x 1 H 1, h k = {x 1, a 1, x 2, a 2,..., x k 1, a k 1, x k } H k, k = 2,..., K Decision rules d 1 (h 1 ), d 2 (h 2 ),..., d K (h K ), d k : H k A k Treatment regime d = (d 1, d 2,..., d K ) 7/39 Optimal Treatment Regimes

Defining best Outcome: There is a clinical outcome by which treatment benefit can be assessed Survival time, CD4 count, indicator of no myocardial infarction within 30 days,... Larger outcomes are better 8/39 Optimal Treatment Regimes

Optimal regime Obviously: There is an infinitude of possible regimes d An optimal regime d opt : Should satisfy If all patients in the population were to receive treatment according to d opt, the expected (average) outcome for the population would be as large as possible If an individual patient were to receive treatment according to d opt, his/her expected outcome would be as large as possible given the information available on him/her Can we formalize this? 9/39 Optimal Treatment Regimes

Optimal regime for single decision For simplicity: Consider regimes involving a single decision with two treatment options (0 and 1) A = {0, 1} Baseline covariate information x X Treatment regime: A single rule d(x) d : X {0, 1} d D, the class of all regimes 10/39 Optimal Treatment Regimes

Potential outcomes Formalize: We can hypothesize potential outcomes Y (1) = outcome that would be achieved if a patient were to receive treatment 1; Y (0) defined similarly = E{Y (1)} is the average outcome if all patients in the population were to receive 1; and similarly for E{Y (0)} Potential outcome for a regime: For any d D, define Y (d) to be the potential outcome for a patient with baseline covariate information X if s/he were to receive treatment in accordance with regime d; i.e., Y (d) = Y (1)d(X) + Y (0){1 d(x)} 11/39 Optimal Treatment Regimes

Potential outcomes Thus: E{Y (d)} = E[ E{Y (d) X} ] is the average outcome in the population if all patients in the population were assigned treatment according to d D E{Y (d) X = x} is the expected outcome for a patient with baseline information x if s/he were to receive treatment according to regime d D Optimal regime: d opt is a regime in D such that E{Y (d)} E{Y (d opt )} for all d D E{Y (d) X = x} E{Y (d opt ) X = x} for all d D and x X 12/39 Optimal Treatment Regimes

Important philosophical point Distinguish between: The best treatment for a patient The best treatment decision for a patient given the information available on the patient Best treatment for a patient: The option a best A corresponding to the largest Y (a) for the patient Best treatment given the information available: We cannot hope to determine a best because we can never see all potential outcomes on a given patient We can hope to make the optimal decision given the information available, i.e., find d opt that makes E{Y (d)} and E{Y (d) X = x} as large as possible 13/39 Optimal Treatment Regimes

Statistical framework Goal: Given data from a clinical trial or observational study, estimate the optimal regime d opt satisfying this definition Observed data: (X i, A i, Y i ), i = 1,..., n, iid X baseline covariate information, A = 0, 1 treatment received, Y outcome observed under A We observe Y = Y (1)A + Y (0)(1 A) 14/39 Optimal Treatment Regimes

Critical assumption No unmeasured confounders: Assume that Y (0), Y (1) A X X contains all information used to assign treatments Automatically satisfied for data from a randomized trial Standard but unverifiable assumption for observational studies Implies that E{Y (1)} = E[E{Y (1) X}] and similarly for E{Y (0)} = E[E{Y (1) X, A = 1}] = E{ E(Y X, A = 1) } 15/39 Optimal Treatment Regimes

Optimal regime Recall: Y (d) = Y (1)d(X) + Y (0){1 d(x)} This implies (using no unmeasured confounders ) E{Y (d)} = E[E{Y (d) X}] [ ] = E E{Y (1) X}d(X) + E{Y (0) X}{1 d(x)} [ ] = E E(Y X, A = 1)d(X) + E(Y X, A = 0){1 d(x)} Thus it is clear that d opt (x) = I[E{Y (1) X = x} > E{Y (0) X = x}] = I{ E(Y X = x, A = 1) > E(Y X = x, A = 0) } Result : If E(Y X, A) were known, we could find d opt 16/39 Optimal Treatment Regimes

Estimating an optimal regime Problem: E(Y X, A) is not known Posit a model Q(X, A; β) for E(Y X, A) Estimate β using observed data = β (e.g., least squares) Estimate d opt by the regression estimator d opt reg (x) = I{ Q(x, 1; β) > Q(x, 0; β) } Corresponding estimator for E{Y (d opt )} REG( β) = n 1 n [ Q(X i, 1; i=1 opt opt β) d reg (X i )+Q, X i, 0; β){1 d reg (X i )} ]. If correct, E(Y X, A) = Q(X, A; β 0 ) for some β 0 Concern: Q(X, A; β) may be misspecified, so from the true d opt d opt reg could be far 17/39 Optimal Treatment Regimes

Estimating an optimal regime Alternative perspective: Q(X, A; β) defines a class of regimes d(x, β) = I{Q(x, 1; β) > Q(x, 0; β)}, indexed by β, that may or may not contain d opt Posit Q(X, A; β) = β 0 + β 1 X 1 + β 2 X 2 + A(β 3 + β 4 X 1 + β 5 X 2 ) Regimes d(x, β) lead to a class D η I(x 2 η 1 x 1 +η 0 ) or I(x 2 η 1 x 1 +η 0 ), η 0 = β 3 /β 5, η 1 = β 4 /β 5 depending on the sign of β 5 If in truth E(Y X, A) = exp{1 + X 1 + 2X 2 + 3X 1 X 2 + A(1 2X 1 + X 2 )} = d opt (x) = I(x 2 2x 1 1) ( so d opt D η in this case) 18/39 Optimal Treatment Regimes

Estimating an optimal regime Result: The parameter η is defined as a function of β If the posited model is correct, then the optimal regime is contained in D η However, the estimated regime I{Q(x, 1; β) > Q(x, 0; β)} may or may not estimate the optimal regime within D η if the posited model is incorrect Even if the model is correct, if X is high-dimensional and/or Q(X, A; β) is complicated, the resulting regimes may be too complex for practice ( black box ) 19/39 Optimal Treatment Regimes

Optimal restricted regime Suggests: Consider directly a restricted set of regimes D η of the form d(x, η) indexed by η Write d η (x) = d(x, η) Such regimes may be motivated by a regression model or based on cost, feasibility in practice, interpretability; e.g., d(x, η) = I(x 1 < η 0, x 2 < η 1 ) D η may or may not contain d opt, but still of interest Optimal restricted regime d opt η (x) = d(x, η opt ), η opt = arg max η E{Y (d η )} Estimate the optimal restricted regime by estimating η opt 20/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Approach: Maximize a good estimator for E{Y (d η )} in η Missing data analogy: For fixed η, define C η = Ad(X, η) + (1 A){1 d(x, η)} C η = 1 if the treatment received is consistent with having following d η and = 0 otherwise Full data are {X, Y (d η )} Observed data are (X, C η, C η Y ) Only subjects with C η = 1 have observed outcomes consistent with following d η ; for the rest, such outcomes are missing 21/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Propensity score: Propensity for treatment 1 π(x) = pr(a = 1 X) Randomized trial: pi(x) is known Observational study: Posit a model π(x; γ) (e.g., logistic regression) and obtain γ using (A i, X i ), i = 1,..., n Propensity of receiving treatment consistent with d η π c (X; η) = pr(c η = 1 X) = E[Ad(X, η) + (1 A){1 d(x, η)} X] = π(x)d(x, η) + {1 π(x)}{1 d(x, η)} Write π c (X; η, γ) with π(x; γ) 22/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Inverse probability weighted estimator for E{Y (d η )}: IPWE(η) = n 1 n i=1 C η,i Y i π c (X i ; η, γ). Consistent for E{Y (d η )} if π(x; γ) (hence π c (X; η, γ)) is correctly specified But only uses data from subjects with C η = 1 23/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Doubly robust augmented inverse probability weighted estimator: n { AIPWE(η) = n 1 Cη,i Y i π c (X i ; η, γ) C } η,i π c (X i ; η, γ) m(x i ; η, π c (X i ; η, γ) β) i=1 m(x; η, β) = E{Y (d η ) X} = Q(X, 1; β)d(x, η)+q(x, 0; β){1 d(x, η)} Q(X, A; β) is a model for E(Y X, A) Consistent if either π(x, γ) or Q(X, A; β) is correct (doubly robust ) Attempts to gain efficiency by using data from all subjects 24/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Result: Estimators η opt for η opt obtained by maximizing IPWE(η) or AIPWE(η) in η opt Estimated optimal restricted regime d η (x) = d(x, η opt ) Non-smooth in η; need suitable optimization techniques Estimators for E{Y (d η )} IPWE( η opt IPWE ) or AIPWE( ηopt AIPWE ) Can calculate standard errors Semiparametric theory : AIPWE(η) is more efficient than IPWE(η) for estimating E{Y (d η )} Estimating regimes based on AIPWE(η) should be better Zhang et al. (2012), Biometrics 25/39 Optimal Treatment Regimes

Empirical studies Extensive simulations: Qualitative conclusions Estimated optimal regime based on regression can achieve the true E{Y (d opt )} if Q(X, A; β) is correctly specified But performs poorly if Q(X, A; β) is misspecified Estimated regimes based on IPWE(η) are so-so even if the propensity model is correct Estimated regimes based on AIPWE(η) achieve the true E{Y (d opt )} if Q(X, A; β) is correctly specified even if the propensity model is misspecified And are much better than the regression estimator when Q(X, A; β) is misspecified 26/39 Optimal Treatment Regimes

Empirical studies One representative scenario: True E(Y X, A) of form Q t (X, A; β) = exp{β 0 +β 1 X1 2 +β 2X2 2 +β 3X 1 X 2 +A(β 4 +β 5 X 1 +β 6 X 2 )} Misspecified model for E(Y X, A) Q m (X, A; β) = β 0 + β 1 X 1 + β 2 X 2 + A(β 3 + β 4 X 1 + β 5 X 2 ) True propensity score logit{π t (X; γ)} = γ 0 + γ 1 X 2 1 + γ 2X 2 2 Misspecified propensity score logit{π m (X; γ)} = γ 0 + γ 1 X 1 + γ 2 X 2 27/39 Optimal Treatment Regimes

Empirical studies Here: Both the correct and misspecified outcome regression models define a class of regimes D η = {I(η 0 + η 1 x 1 + η 2 x 2 > 0)} so that d opt D η Other scenarios with d opt / D η 28/39 Optimal Treatment Regimes

Empirical studies Truth: E{Y (d opt )} = 3.71 V (η) = E{Y (d η )} (using 10 6 Monte Carlo simulations) Method Ê{Y (d opt )} SE Cov V ( η opt ) REG-Q t 3.70 (0.14) 3.71 (0.00) REG-Q m 3.44 (0.18) 3.27 (0.19) PS correct IPWE 4.01 (0.26) 0.28 86.1 3.63 (0.07) AIPWE-Q t 3.72 (0.15) 0.15 94.7 3.70 (0.01) AIPWE-Q m 3.85 (0.21) 0.23 91.8 3.66 (0.07) PS incorrect IPWE 4.06 (0.22) 0.23 69.4 3.42 (0.20) AIPWE-Q t 3.72 (0.15) 0.15 95.2 3.70 (0.01) AIPWE-Q m 3.81 (0.18) 0.19 94.1 3.57 (0.20) 29/39 Optimal Treatment Regimes

Multiple decisions Same ideas, only more complicated... Find d opt so that a patient with baseline information X 1 = x 1 who receives all K treatments according to d opt has expected outcome as large as possible Potential outcomes under a regime d D: Initial information X 1, potential outcomes Optimal regime: d opt satisfies X 2 (d),..., X K (d), Y (d) E{Y (d) X 1 = x 1 } E{Y (d opt ) X 1 = x 1 } for all d D and values of x 1 And thus E{Y (d)} E{Y (d opt )} for all d D 30/39 Optimal Treatment Regimes

Estimating an optimal regime Complication: Can t we just estimate the rules at each decision separately using previous methods and piece together the estimated regime from separate studies? Unfortunately not Delayed effects : E.g., c 1 may not appear best initially but may have enhanced effectiveness when followed by m 1 = Must use data from a single study (same patients) reflecting the entire sequence of decisions and use methods that acknowledge this 31/39 Optimal Treatment Regimes

Observed data Data required: (X 1i, A 1i, X 2i, A 2i,..., X (K 1)i, A (K 1)i, X Ki, A Ki, Y i ), i = 1,..., n, iid X 1 = Baseline covariate information X k, k = 2,..., K = intermediate information observed between decisions k 1 and k A k, k = 1,..., K = treatment received at decision k Y = observed outcome ; can be ascertained after decision K or can be a function of X 2,..., X K Studies: Longitudinal observational Sequential, Multiple Assignment, Randomized Trial 32/39 Optimal Treatment Regimes

SMART Cancer example: Randomization at s M 1 Response M 2 C 1 No Response S 1 S 2 Cancer Response M 1 C 2 M 2 No Response S 1 S 2 33/39 Optimal Treatment Regimes

Estimating an optimal regime Sequential regression: Using backward induction illustrate for K = 2 and 2 treatment options {0, 1} at each decision Posit and fit a model for Q 2 (X 1, A 1, X 2, A 2 ) = E(Y X 1, A 1, X 2, A 2 ) and substitute in d opt 2 (x 1, a 1, x 2 ) = I{Q 2 (x 1, a 1, x 2, 1) > Q 2 (x 1, a 1, x 2, 0)} Move backward: posit and fit a model for expected outcome assuming d opt 2 (x 1, a 1, x 2 ) is used to determine treatment at decision 2 in the future, i.e. Q 1 (X 1, A 1 ) = E[ max{q 2 (X 1, A 1, X 2, 0), Q 2 (X 1, A 1, X 2, 1)} X 1, A 1 ] and substitute in d opt 1 (x 1, a 1 ) = I{Q 1 (x 1, 1) > Q 1 (x 1, 0)]} Q-learning, form of dynamic programming, reinforcement learning in computer science See Schulte et al. (2013), Statistical Science, for overview 34/39 Optimal Treatment Regimes

Estimating an optimal regime Inverse weighted methods: Analogy to methods for semiparametric estimation with monotone coarsening (like a dropout problem) Restricted class D η Generalization of IPWE(η), AIPWE(η) Subjects are included as long as observed sequential treatments are consistent with following d η See Zhang et al. (2013), Biometrika In either case: Require a generalization of the no unmeasured confounders assumption 35/39 Optimal Treatment Regimes

Future challenges Estimation of optimal treatment regimes is a wide open area of research High-dimensional covariate information? Regression model selection? Black box vs. restricted class of regimes? Design considerations for SMARTs? Alternative formulation as an optimal classification problem, e.g., Zhang et al. (2012), Stat 36/39 Optimal Treatment Regimes

Recognition 2013 MacArthur Fellow Susan Murphy and Jamie Robins 37/39 Optimal Treatment Regimes

References Schulte, P. J., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2014). Q- and A-learning methods for estimating optimal dynamic treatment regimes. Statistical Science, in press. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics 68, 1010 1018. Zhang, B., Tsiatis, A.A., Davidian, M., Zhang, M., and Laber, E.B., (2012). Estimating optimal treatment regimes from a classification perspective. Stat 1, 103 114. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013). Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika 100, 681 694. 38/39 Optimal Treatment Regimes

Augmented inverse propensity weighted estimator Under MAR : Y (d η ) C η X If γ p γ and β p β p, this estimator { C η Y E π c (X; η, γ ) C η π c (X; η, γ } ) π c (X; η, γ m(x; η, β ) ) [ { = E Y Cη π c (X; η, γ } ] ) (d η ) + π c (X; η, γ {Y (d η ) m(x; η, β )} ) [{ = E{Y Cη π c (X; η, γ } ] ) (d η )} + E π c (X; η, γ {Y (d η ) m(x; η, β )} ) Hence the estimator is consistent for E{Y (d η )} if either π(x; γ ) = π(x) π c (X; η, γ ) = π c (X; η) (propensity correct) Q(X, A; β ) = Q(X, A) m(x; η, β ) = m(x; η) (regression correct ) Double robustness 39/39 Optimal Treatment Regimes