Implementing Personalized Medicine: Estimating Optimal Treatment Regimes

Similar documents
Test Volume 12, Number 1. June 2003

Objective calibration of the Bayesian CRM. Ken Cheung Department of Biostatistics, Columbia University

Institute for Clinical Evaluative Sciences. From the SelectedWorks of Peter Austin. Peter C Austin, Institute for Clinical Evaluative Sciences

Reinforcement Learning and Simulation-Based Search

Fast Convergence of Regress-later Series Estimators

4 Reinforcement Learning Basic Algorithms

Postestimation commands predict Remarks and examples References Also see

Personalized screening intervals for biomarkers using joint models for longitudinal and survival data

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Sensitivity Analysis for Unmeasured Confounding: Formulation, Implementation, Interpretation

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Measures of Association

Optimal Investment with Deferred Capital Gains Taxes

Bootstrap Inference for Multiple Imputation Under Uncongeniality

INTERTEMPORAL ASSET ALLOCATION: THEORY

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

Unobserved Heterogeneity Revisited

Abadie s Semiparametric Difference-in-Difference Estimator

Reinforcement Learning

Practical example of an Economic Scenario Generator

IEOR 3106: Introduction to Operations Research: Stochastic Models SOLUTIONS to Final Exam, Sunday, December 16, 2012

Machine Learning for Quantitative Finance

Log-Robust Portfolio Management

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

Random Variables and Applications OPRE 6301

Duangporn Jearkpaporn, Connie M. Borror Douglas C. Montgomery and George C. Runger Arizona State University Tempe, AZ

2.1 Mathematical Basis: Risk-Neutral Pricing

Journal of Economic Studies. Quantile Treatment Effect and Double Robust estimators: an appraisal on the Italian job market.

Intro to Reinforcement Learning. Part 3: Core Theory

Provocation and the Strategy of Terrorist and Guerilla Attacks: Online Theory Appendix

Lecture 21: Logit Models for Multinomial Responses Continued

Module 4: Point Estimation Statistics (OA3102)

Equity correlations implied by index options: estimation and model uncertainty analysis

Review: Population, sample, and sampling distributions

On Sensitivity Value of Pair-Matched Observational Studies

STK 3505/4505: Summary of the course

Microeconomic Foundations of Incomplete Price Adjustment

Investment Horizon, Risk Drivers and Portfolio Construction

Strategies for Improving the Efficiency of Monte-Carlo Methods

IEOR E4703: Monte-Carlo Simulation

UPDATED IAA EDUCATION SYLLABUS

Optimal decumulation into annuity after retirement: a stochastic control approach

Environmental Protection and Rare Disasters

Dynamic Programming and Reinforcement Learning

Reinforcement Learning

Reinforcement Learning. Monte Carlo and Temporal Difference Learning

Lecture outline. Monte Carlo Methods for Uncertainty Quantification. Importance Sampling. Importance Sampling

Session 5. Predictive Modeling in Life Insurance

Fixed Effects Maximum Likelihood Estimation of a Flexibly Parametric Proportional Hazard Model with an Application to Job Exits

Lecture 5 January 30

A new look at tree based approaches

Inferences on Correlation Coefficients of Bivariate Log-normal Distributions

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

16 MAKING SIMPLE DECISIONS

Chapter 6: Temporal Difference Learning

Machine Learning in Computer Vision Markov Random Fields Part II

INTEREST RATES AND FX MODELS

FIGURE A1.1. Differences for First Mover Cutoffs (Round one to two) as a Function of Beliefs on Others Cutoffs. Second Mover Round 1 Cutoff.

Decision Theory: Value Iteration

Indian Sovereign Yield Curve using Nelson-Siegel-Svensson Model

E509A: Principle of Biostatistics. GY Zou

Complex Decisions. Sequential Decision Making

MONTE CARLO EXTENSIONS

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

1 Asset Pricing: Bonds vs Stocks

Growth-indexed bonds and Debt distribution: Theoretical benefits and Practical limits

MODELLING OPTIMAL HEDGE RATIO IN THE PRESENCE OF FUNDING RISK

Financial Economics Field Exam August 2011

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model

An experimental design for the development of adaptive treatment strategies

Adaptive Experiments for Policy Choice. March 8, 2019

Optimal Investment for Worst-Case Crash Scenarios

Quantitative Risk Management

Callable Libor exotic products. Ismail Laachir. March 1, 2012

Estimation Procedure for Parametric Survival Distribution Without Covariates

PASS Sample Size Software

Frequency of Price Adjustment and Pass-through

Modelling strategies for bivariate circular data

Application of the Collateralized Debt Obligation (CDO) Approach for Managing Inventory Risk in the Classical Newsboy Problem

CSC 411: Lecture 08: Generative Models for Classification

IEOR E4703: Monte-Carlo Simulation

Lecture 2 Dynamic Equilibrium Models: Three and More (Finite) Periods

The 15-Minute Retirement Plan

Supplementary Material for: Belief Updating in Sequential Games of Two-Sided Incomplete Information: An Experimental Study of a Crisis Bargaining

A DYNAMIC CONTROL STRATEGY FOR PENSION PLANS IN A STOCHASTIC FRAMEWORK

Computational Finance Improving Monte Carlo

Reasoning with Uncertainty

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

Forecast Horizons for Production Planning with Stochastic Demand

Measuring Retirement Plan Effectiveness

Multivariate Cox PH model with log-skew-normal frailties

Analysis of truncated data with application to the operational risk estimation

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

The Role of APIs in the Economy

Consumption and Portfolio Choice under Uncertainty

9. Logit and Probit Models For Dichotomous Data

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Market Risk Analysis Volume IV. Value-at-Risk Models

arxiv: v2 [q-fin.pr] 23 Nov 2017

Transcription:

Implementing Personalized Medicine: Estimating Optimal Treatment Regimes Baqun Zhang, Phillip Schulte, Anastasios Tsiatis, Eric Laber, and Marie Davidian Department of Statistics North Carolina State University http://www4.stat.ncsu.edu/~davidian 1/39 Optimal Treatment Regimes

Personalized medicine Clinical practice: Clinicians make (a series of) treatment decision(s) over the course of a patient s disease or disorder Fixed schedule Milestone in the disease process Event necessitating a decision Personalized medicine: Make the best treatment decision(s) for a patient given all the available information on the patient up to the time of the decision Genetic/genomic, demographic,... Physiologic, clinical measures, medical history,... 2/39 Optimal Treatment Regimes

Dynamic treatment regime Operationalizing personalized medicine: At any decision Need a rule that takes as input the accrued information on the patient to that point and dictates the next treatment from among the possible options Rule(s) must be developed based on evidence ; i.e., data Ideally, rule(s) should lead to the best clinical outcome Dynamic treatment regime: A set of formal such rules, each corresponding to a decision point 3/39 Optimal Treatment Regimes

Single decision Simple example: Which treatment to give to patients who present with primary operable breast cancer? Treatment options : L-phenylalanine mustard and 5-fluorouracil (c 1 ) or c 1 + tamoxifen (c 2 ) Information : age, progesterone receptor level (PR) Example rule: If age < 50 years and PR < 10 fmol, give c 1 (coded as 1); otherwise, give c 2 (coded as 0) Mathematically, the rule d is d(age, PR) = I(age < 50 and PR < 10) Alternatively : Rules of form d(age, PR) = I{age > 60 8.7 log(pr)} 4/39 Optimal Treatment Regimes

Multiple decision points Two decision points: Decision 1 : Induction chemotherapy (options c 1, c 2 ) Decision 2 : Maintenance treatment for patients who respond (options m 1, m 2 ) Salvage chemotherapy for those who don t (options s 1, s 2 ) 5/39 Optimal Treatment Regimes

Multiple decision points At baseline: Information x 1 (accrued information h 1 = x 1 ) Decision 1: Two options {c 1, c 2 }; rule 1: d 1 (x 1 ) x 1 {c 1, c 2 } Between decisions 1 and 2: Collect additional information x 2, including responder status Accrued information h 2 = {x 1, chemotherapy at decision 1, x 2 } Decision 2: Four options {m 1, m 2, s 1, s 2 }; rule 2: d 2 (h 2 ) h 2 {m 1, m 2 } (responder), h 2 {s 1, s 2 } (nonresponder) Regime : d = (d 1, d 2 ) 6/39 Optimal Treatment Regimes

Summary Single decision: 1 decision point Baseline information x X Set of treatment options a A Decision rule d(x), d : X A Treatment regime: d Multiple decisions: K decision points Baseline information x 1, intermediate information x k between decisions k 1 and k, k = 2,..., K Set of treatment options at each decision k: a k A k Accrued information h 1 = x 1 H 1, h k = {x 1, a 1, x 2, a 2,..., x k 1, a k 1, x k } H k, k = 2,..., K Decision rules d 1 (h 1 ), d 2 (h 2 ),..., d K (h K ), d k : H k A k Treatment regime d = (d 1, d 2,..., d K ) 7/39 Optimal Treatment Regimes

Defining best Outcome: There is a clinical outcome by which treatment benefit can be assessed Survival time, CD4 count, indicator of no myocardial infarction within 30 days,... Larger outcomes are better 8/39 Optimal Treatment Regimes

Optimal regime Obviously: There is an infinitude of possible regimes d An optimal regime d opt : Should satisfy If all patients in the population were to receive treatment according to d opt, the expected (average) outcome for the population would be as large as possible If an individual patient were to receive treatment according to d opt, his/her expected outcome would be as large as possible given the information available on him/her Can we formalize this? 9/39 Optimal Treatment Regimes

Optimal regime for single decision For simplicity: Consider regimes involving a single decision with two treatment options (0 and 1) A = {0, 1} Baseline covariate information x X Treatment regime: A single rule d(x) d : X {0, 1} d D, the class of all regimes 10/39 Optimal Treatment Regimes

Potential outcomes Formalize: We can hypothesize potential outcomes Y (1) = outcome that would be achieved if a patient were to receive treatment 1; Y (0) defined similarly = E{Y (1)} is the average outcome if all patients in the population were to receive 1; and similarly for E{Y (0)} Potential outcome for a regime: For any d D, define Y (d) to be the potential outcome for a patient with baseline covariate information X if s/he were to receive treatment in accordance with regime d; i.e., Y (d) = Y (1)d(X) + Y (0){1 d(x)} 11/39 Optimal Treatment Regimes

Potential outcomes Thus: E{Y (d)} = E[ E{Y (d) X} ] is the average outcome in the population if all patients in the population were assigned treatment according to d D E{Y (d) X = x} is the expected outcome for a patient with baseline information x if s/he were to receive treatment according to regime d D Optimal regime: d opt is a regime in D such that E{Y (d)} E{Y (d opt )} for all d D E{Y (d) X = x} E{Y (d opt ) X = x} for all d D and x X 12/39 Optimal Treatment Regimes

Important philosophical point Distinguish between: The best treatment for a patient The best treatment decision for a patient given the information available on the patient Best treatment for a patient: The option a best A corresponding to the largest Y (a) for the patient Best treatment given the information available: We cannot hope to determine a best because we can never see all potential outcomes on a given patient We can hope to make the optimal decision given the information available, i.e., find d opt that makes E{Y (d)} and E{Y (d) X = x} as large as possible 13/39 Optimal Treatment Regimes

Statistical framework Goal: Given data from a clinical trial or observational study, estimate the optimal regime d opt satisfying this definition Observed data: (X i, A i, Y i ), i = 1,..., n, iid X baseline covariate information, A = 0, 1 treatment received, Y outcome observed under A We observe Y = Y (1)A + Y (0)(1 A) 14/39 Optimal Treatment Regimes

Critical assumption No unmeasured confounders: Assume that Y (0), Y (1) A X X contains all information used to assign treatments Automatically satisfied for data from a randomized trial Standard but unverifiable assumption for observational studies Implies that E{Y (1)} = E[E{Y (1) X}] and similarly for E{Y (0)} = E[E{Y (1) X, A = 1}] = E{ E(Y X, A = 1) } 15/39 Optimal Treatment Regimes

Optimal regime Recall: Y (d) = Y (1)d(X) + Y (0){1 d(x)} This implies (using no unmeasured confounders ) E{Y (d)} = E[E{Y (d) X}] [ ] = E E{Y (1) X}d(X) + E{Y (0) X}{1 d(x)} [ ] = E E(Y X, A = 1)d(X) + E(Y X, A = 0){1 d(x)} Thus it is clear that d opt (x) = I[E{Y (1) X = x} > E{Y (0) X = x}] = I{ E(Y X = x, A = 1) > E(Y X = x, A = 0) } Result : If E(Y X, A) were known, we could find d opt 16/39 Optimal Treatment Regimes

Estimating an optimal regime Problem: E(Y X, A) is not known Posit a model Q(X, A; β) for E(Y X, A) Estimate β using observed data = β (e.g., least squares) Estimate d opt by the regression estimator d opt reg (x) = I{ Q(x, 1; β) > Q(x, 0; β) } Corresponding estimator for E{Y (d opt )} REG( β) = n 1 n [ Q(X i, 1; i=1 opt opt β) d reg (X i )+Q, X i, 0; β){1 d reg (X i )} ]. If correct, E(Y X, A) = Q(X, A; β 0 ) for some β 0 Concern: Q(X, A; β) may be misspecified, so from the true d opt d opt reg could be far 17/39 Optimal Treatment Regimes

Estimating an optimal regime Alternative perspective: Q(X, A; β) defines a class of regimes d(x, β) = I{Q(x, 1; β) > Q(x, 0; β)}, indexed by β, that may or may not contain d opt Posit Q(X, A; β) = β 0 + β 1 X 1 + β 2 X 2 + A(β 3 + β 4 X 1 + β 5 X 2 ) Regimes d(x, β) lead to a class D η I(x 2 η 1 x 1 +η 0 ) or I(x 2 η 1 x 1 +η 0 ), η 0 = β 3 /β 5, η 1 = β 4 /β 5 depending on the sign of β 5 If in truth E(Y X, A) = exp{1 + X 1 + 2X 2 + 3X 1 X 2 + A(1 2X 1 + X 2 )} = d opt (x) = I(x 2 2x 1 1) ( so d opt D η in this case) 18/39 Optimal Treatment Regimes

Estimating an optimal regime Result: The parameter η is defined as a function of β If the posited model is correct, then the optimal regime is contained in D η However, the estimated regime I{Q(x, 1; β) > Q(x, 0; β)} may or may not estimate the optimal regime within D η if the posited model is incorrect Even if the model is correct, if X is high-dimensional and/or Q(X, A; β) is complicated, the resulting regimes may be too complex for practice ( black box ) 19/39 Optimal Treatment Regimes

Optimal restricted regime Suggests: Consider directly a restricted set of regimes D η of the form d(x, η) indexed by η Write d η (x) = d(x, η) Such regimes may be motivated by a regression model or based on cost, feasibility in practice, interpretability; e.g., d(x, η) = I(x 1 < η 0, x 2 < η 1 ) D η may or may not contain d opt, but still of interest Optimal restricted regime d opt η (x) = d(x, η opt ), η opt = arg max η E{Y (d η )} Estimate the optimal restricted regime by estimating η opt 20/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Approach: Maximize a good estimator for E{Y (d η )} in η Missing data analogy: For fixed η, define C η = Ad(X, η) + (1 A){1 d(x, η)} C η = 1 if the treatment received is consistent with having following d η and = 0 otherwise Full data are {X, Y (d η )} Observed data are (X, C η, C η Y ) Only subjects with C η = 1 have observed outcomes consistent with following d η ; for the rest, such outcomes are missing 21/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Propensity score: Propensity for treatment 1 π(x) = pr(a = 1 X) Randomized trial: pi(x) is known Observational study: Posit a model π(x; γ) (e.g., logistic regression) and obtain γ using (A i, X i ), i = 1,..., n Propensity of receiving treatment consistent with d η π c (X; η) = pr(c η = 1 X) = E[Ad(X, η) + (1 A){1 d(x, η)} X] = π(x)d(x, η) + {1 π(x)}{1 d(x, η)} Write π c (X; η, γ) with π(x; γ) 22/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Inverse probability weighted estimator for E{Y (d η )}: IPWE(η) = n 1 n i=1 C η,i Y i π c (X i ; η, γ). Consistent for E{Y (d η )} if π(x; γ) (hence π c (X; η, γ)) is correctly specified But only uses data from subjects with C η = 1 23/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Doubly robust augmented inverse probability weighted estimator: n { AIPWE(η) = n 1 Cη,i Y i π c (X i ; η, γ) C } η,i π c (X i ; η, γ) m(x i ; η, π c (X i ; η, γ) β) i=1 m(x; η, β) = E{Y (d η ) X} = Q(X, 1; β)d(x, η)+q(x, 0; β){1 d(x, η)} Q(X, A; β) is a model for E(Y X, A) Consistent if either π(x, γ) or Q(X, A; β) is correct (doubly robust ) Attempts to gain efficiency by using data from all subjects 24/39 Optimal Treatment Regimes

Estimating an optimal restricted regime Result: Estimators η opt for η opt obtained by maximizing IPWE(η) or AIPWE(η) in η opt Estimated optimal restricted regime d η (x) = d(x, η opt ) Non-smooth in η; need suitable optimization techniques Estimators for E{Y (d η )} IPWE( η opt IPWE ) or AIPWE( ηopt AIPWE ) Can calculate standard errors Semiparametric theory : AIPWE(η) is more efficient than IPWE(η) for estimating E{Y (d η )} Estimating regimes based on AIPWE(η) should be better Zhang et al. (2012), Biometrics 25/39 Optimal Treatment Regimes

Empirical studies Extensive simulations: Qualitative conclusions Estimated optimal regime based on regression can achieve the true E{Y (d opt )} if Q(X, A; β) is correctly specified But performs poorly if Q(X, A; β) is misspecified Estimated regimes based on IPWE(η) are so-so even if the propensity model is correct Estimated regimes based on AIPWE(η) achieve the true E{Y (d opt )} if Q(X, A; β) is correctly specified even if the propensity model is misspecified And are much better than the regression estimator when Q(X, A; β) is misspecified 26/39 Optimal Treatment Regimes

Empirical studies One representative scenario: True E(Y X, A) of form Q t (X, A; β) = exp{β 0 +β 1 X1 2 +β 2X2 2 +β 3X 1 X 2 +A(β 4 +β 5 X 1 +β 6 X 2 )} Misspecified model for E(Y X, A) Q m (X, A; β) = β 0 + β 1 X 1 + β 2 X 2 + A(β 3 + β 4 X 1 + β 5 X 2 ) True propensity score logit{π t (X; γ)} = γ 0 + γ 1 X 2 1 + γ 2X 2 2 Misspecified propensity score logit{π m (X; γ)} = γ 0 + γ 1 X 1 + γ 2 X 2 27/39 Optimal Treatment Regimes

Empirical studies Here: Both the correct and misspecified outcome regression models define a class of regimes D η = {I(η 0 + η 1 x 1 + η 2 x 2 > 0)} so that d opt D η Other scenarios with d opt / D η 28/39 Optimal Treatment Regimes

Empirical studies Truth: E{Y (d opt )} = 3.71 V (η) = E{Y (d η )} (using 10 6 Monte Carlo simulations) Method Ê{Y (d opt )} SE Cov V ( η opt ) REG-Q t 3.70 (0.14) 3.71 (0.00) REG-Q m 3.44 (0.18) 3.27 (0.19) PS correct IPWE 4.01 (0.26) 0.28 86.1 3.63 (0.07) AIPWE-Q t 3.72 (0.15) 0.15 94.7 3.70 (0.01) AIPWE-Q m 3.85 (0.21) 0.23 91.8 3.66 (0.07) PS incorrect IPWE 4.06 (0.22) 0.23 69.4 3.42 (0.20) AIPWE-Q t 3.72 (0.15) 0.15 95.2 3.70 (0.01) AIPWE-Q m 3.81 (0.18) 0.19 94.1 3.57 (0.20) 29/39 Optimal Treatment Regimes

Multiple decisions Same ideas, only more complicated... Find d opt so that a patient with baseline information X 1 = x 1 who receives all K treatments according to d opt has expected outcome as large as possible Potential outcomes under a regime d D: Initial information X 1, potential outcomes Optimal regime: d opt satisfies X 2 (d),..., X K (d), Y (d) E{Y (d) X 1 = x 1 } E{Y (d opt ) X 1 = x 1 } for all d D and values of x 1 And thus E{Y (d)} E{Y (d opt )} for all d D 30/39 Optimal Treatment Regimes

Estimating an optimal regime Complication: Can t we just estimate the rules at each decision separately using previous methods and piece together the estimated regime from separate studies? Unfortunately not Delayed effects : E.g., c 1 may not appear best initially but may have enhanced effectiveness when followed by m 1 = Must use data from a single study (same patients) reflecting the entire sequence of decisions and use methods that acknowledge this 31/39 Optimal Treatment Regimes

Observed data Data required: (X 1i, A 1i, X 2i, A 2i,..., X (K 1)i, A (K 1)i, X Ki, A Ki, Y i ), i = 1,..., n, iid X 1 = Baseline covariate information X k, k = 2,..., K = intermediate information observed between decisions k 1 and k A k, k = 1,..., K = treatment received at decision k Y = observed outcome ; can be ascertained after decision K or can be a function of X 2,..., X K Studies: Longitudinal observational Sequential, Multiple Assignment, Randomized Trial 32/39 Optimal Treatment Regimes

SMART Cancer example: Randomization at s M 1 Response M 2 C 1 No Response S 1 S 2 Cancer Response M 1 C 2 M 2 No Response S 1 S 2 33/39 Optimal Treatment Regimes

Estimating an optimal regime Sequential regression: Using backward induction illustrate for K = 2 and 2 treatment options {0, 1} at each decision Posit and fit a model for Q 2 (X 1, A 1, X 2, A 2 ) = E(Y X 1, A 1, X 2, A 2 ) and substitute in d opt 2 (x 1, a 1, x 2 ) = I{Q 2 (x 1, a 1, x 2, 1) > Q 2 (x 1, a 1, x 2, 0)} Move backward: posit and fit a model for expected outcome assuming d opt 2 (x 1, a 1, x 2 ) is used to determine treatment at decision 2 in the future, i.e. Q 1 (X 1, A 1 ) = E[ max{q 2 (X 1, A 1, X 2, 0), Q 2 (X 1, A 1, X 2, 1)} X 1, A 1 ] and substitute in d opt 1 (x 1, a 1 ) = I{Q 1 (x 1, 1) > Q 1 (x 1, 0)]} Q-learning, form of dynamic programming, reinforcement learning in computer science See Schulte et al. (2013), Statistical Science, for overview 34/39 Optimal Treatment Regimes

Estimating an optimal regime Inverse weighted methods: Analogy to methods for semiparametric estimation with monotone coarsening (like a dropout problem) Restricted class D η Generalization of IPWE(η), AIPWE(η) Subjects are included as long as observed sequential treatments are consistent with following d η See Zhang et al. (2013), Biometrika In either case: Require a generalization of the no unmeasured confounders assumption 35/39 Optimal Treatment Regimes

Future challenges Estimation of optimal treatment regimes is a wide open area of research High-dimensional covariate information? Regression model selection? Black box vs. restricted class of regimes? Design considerations for SMARTs? Alternative formulation as an optimal classification problem, e.g., Zhang et al. (2012), Stat 36/39 Optimal Treatment Regimes

Recognition 2013 MacArthur Fellow Susan Murphy and Jamie Robins 37/39 Optimal Treatment Regimes

References Schulte, P. J., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2014). Q- and A-learning methods for estimating optimal dynamic treatment regimes. Statistical Science, in press. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics 68, 1010 1018. Zhang, B., Tsiatis, A.A., Davidian, M., Zhang, M., and Laber, E.B., (2012). Estimating optimal treatment regimes from a classification perspective. Stat 1, 103 114. Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013). Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika 100, 681 694. 38/39 Optimal Treatment Regimes

Augmented inverse propensity weighted estimator Under MAR : Y (d η ) C η X If γ p γ and β p β p, this estimator { C η Y E π c (X; η, γ ) C η π c (X; η, γ } ) π c (X; η, γ m(x; η, β ) ) [ { = E Y Cη π c (X; η, γ } ] ) (d η ) + π c (X; η, γ {Y (d η ) m(x; η, β )} ) [{ = E{Y Cη π c (X; η, γ } ] ) (d η )} + E π c (X; η, γ {Y (d η ) m(x; η, β )} ) Hence the estimator is consistent for E{Y (d η )} if either π(x; γ ) = π(x) π c (X; η, γ ) = π c (X; η) (propensity correct) Q(X, A; β ) = Q(X, A) m(x; η, β ) = m(x; η) (regression correct ) Double robustness 39/39 Optimal Treatment Regimes