Bootstrap Inference for Multiple Imputation Under Uncongeniality

Similar documents
Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 7: Point Estimation and Sampling Distributions

Nonresponse Adjustment of Survey Estimates Based on. Auxiliary Variables Subject to Error. Brady T. West. University of Michigan, Ann Arbor, MI, USA

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations

Bias Reduction Using the Bootstrap

Review: Population, sample, and sampling distributions

BIO5312 Biostatistics Lecture 5: Estimations

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

574 Flanders Drive North Woodmere, NY ~ fax

Chapter 7 - Lecture 1 General concepts and criteria

Applied Statistics I

Online Appendix of. This appendix complements the evidence shown in the text. 1. Simulations

Chapter 5. Statistical inference for Parametric Models

Effects of missing data in credit risk scoring. A comparative analysis of methods to gain robustness in presence of sparce data

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

Inferences on Correlation Coefficients of Bivariate Log-normal Distributions

Quantile Regression in Survival Analysis

Value at Risk Ch.12. PAK Study Manual

Chapter 5: Statistical Inference (in General)

A Two-Step Estimator for Missing Values in Probit Model Covariates

IEOR E4703: Monte-Carlo Simulation

Statistical analysis and bootstrapping

Resampling techniques to determine direction of effects in linear regression models

Analysis of truncated data with application to the operational risk estimation

On Some Statistics for Testing the Skewness in a Population: An. Empirical Study

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

ECE 295: Lecture 03 Estimation and Confidence Interval

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Estimating Mixed Logit Models with Large Choice Sets. Roger H. von Haefen, NC State & NBER Adam Domanski, NOAA July 2013

Week 7 Quantitative Analysis of Financial Markets Simulation Methods

Comparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations

Introduction to Algorithmic Trading Strategies Lecture 8

Jackknife Empirical Likelihood Inferences for the Skewness and Kurtosis

The Jackknife Estimator for Estimating Volatility of Volatility of a Stock

Review of the Topics for Midterm I

Context Power analyses for logistic regression models fit to clustered data

Confidence Intervals Introduction

Reverse Sensitivity Testing: What does it take to break the model? Silvana Pesenti

Section 7.2. Estimating a Population Proportion

Asymmetric Price Transmission: A Copula Approach

Approximating the Confidence Intervals for Sharpe Style Weights

The Vasicek Distribution

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

NAMWOOK KOO UNIVERSITY OF FLORIDA

Lecture 17: More on Markov Decision Processes. Reinforcement learning

The Multinomial Logit Model Revisited: A Semiparametric Approach in Discrete Choice Analysis

MM and ML for a sample of n = 30 from Gamma(3,2) ===============================================

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

Statistics for Business and Economics

Monte Carlo Based Reliability Analysis

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Multivariate longitudinal data analysis for actuarial applications

Chapter 8: Sampling distributions of estimators Sections

A New Hybrid Estimation Method for the Generalized Pareto Distribution

MVE051/MSG Lecture 7

Week 1 Quantitative Analysis of Financial Markets Distributions B

Fixed Effects Maximum Likelihood Estimation of a Flexibly Parametric Proportional Hazard Model with an Application to Job Exits

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Back to estimators...

Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk?

Lecture outline. Monte Carlo Methods for Uncertainty Quantification. Importance Sampling. Importance Sampling

STRESS-STRENGTH RELIABILITY ESTIMATION

Loss Simulation Model Testing and Enhancement

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Lecture 12: The Bootstrap

NBER WORKING PAPER SERIES A REHABILITATION OF STOCHASTIC DISCOUNT FACTOR METHODOLOGY. John H. Cochrane

Chapter 7. Inferences about Population Variances

12 The Bootstrap and why it works

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

A new look at tree based approaches

Sampling and sampling distribution

8.1 Estimation of the Mean and Proportion

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Chapter 9: Sampling Distributions

On the Existence of Constant Accrual Rates in Clinical Trials and Direction for Future Research

Monte Carlo Methods for Uncertainty Quantification

The Two-Sample Independent Sample t Test

Chapter 8. Introduction to Statistical Inference

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Basics. STAT:5400 Computing in Statistics Simulation studies in statistics Lecture 9 September 21, 2016

Linda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach

The histogram should resemble the uniform density, the mean should be close to 0.5, and the standard deviation should be close to 1/ 12 =

Internet Appendix for Asymmetry in Stock Comovements: An Entropy Approach

Firing Costs, Employment and Misallocation

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics

Linear Regression with One Regressor

A comment on Christoffersen, Jacobs and Ornthanalai (2012), Dynamic jump intensities and risk premiums: Evidence from S&P500 returns and options

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

Market Risk: FROM VALUE AT RISK TO STRESS TESTING. Agenda. Agenda (Cont.) Traditional Measures of Market Risk

A New Test for Correlation on Bivariate Nonnormal Distributions

Reserve Risk Modelling: Theoretical and Practical Aspects

The Importance (or Non-Importance) of Distributional Assumptions in Monte Carlo Models of Saving. James P. Dow, Jr.

Test Volume 12, Number 1. June 2003

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

INSTITUTE OF ACTUARIES OF INDIA

On modelling of electricity spot price

Transcription:

Bootstrap Inference for Multiple Imputation Under Uncongeniality Jonathan Bartlett www.thestatsgeek.com www.missingdata.org.uk Department of Mathematical Sciences University of Bath, UK Joint Statistical Meetings, 1st August 2018 1 / 28

Acknowledgement This research made use of the Balena High Performance Computing (HPC) Service at the University of Bath. 2 / 28

Outline Motivation Rubin s rules Impute then bootstrap Bootstrap then impute Control based imputation simulation example Conclusions 3 / 28

Motivation MI is very popular, for many reasons, part of which are the simplicity of Rubin s rules. If imputations are proper and imputation and analysis models are congenial : Rubin s variance estimator is asymptotically unbiased Confidence intervals attain nominal coverage Under uncongeniality, Rubin s variance estimator can be biased upwards or downwards, depending on setting - Meng 1994 [2], Wang and Robins 1998 [6]. 4 / 28

Motivation When the imputer and analyst are the same, but we do not have congeniality, in some settings we may want to obtain the sharpest (valid) inference possible. e.g. using control based MI for missing data in confirmatory phase 3 randomised clinical trials. Here Rubin s rule variance estimator is biased upwards. For particular settings, we may be able to derive valid analytical variance estimators. For continuous endpoints analysed using mixed models, Tang 2017 [4] derived the following delta method variance estimator... 5 / 28

Tang 2017 [4] 6 / 28

Bootstrap alternatives Deriving and implementing such variance estimators is hard, and model specific. What other options do we have? Recently Schomaker and Heumann 2018 [3] investigated four combinations of bootstrap with MI. von Hippel 2018 [5] has also proposed a bootstrap MI combination approach. We investigate which are valid under uncongeniality, and of these, which are computationally efficient. We will assume sample size is sufficiently large such that the MI estimator is normally distributed. 7 / 28

Outline Motivation Rubin s rules Impute then bootstrap Bootstrap then impute Control based imputation simulation example Conclusions 8 / 28

Rubin s rules MI Parameter of interest θ. Impute M times, and estimate θ, yielding ˆθ m, m = 1,.., M. ˆθM = M 1 M m=1 ˆθ m. Imputation specific estimates follow ˆθ m = ˆθ + a m where ˆθ = lim M ˆθ M, Var(ˆθ ) = σ 2, E(a m ) = 0, Var(a m ) = σ 2 btw 9 / 28

Rubin s rules MI The variance of ˆθ M is thus Var(ˆθ M ) = σ 2 + σ2 btw M Under congeniality σ 2 = σ 2 btw + σ2 wtn, which leads to Rubin s variance estimator: (1 + M 1 1 ) M 1 M (ˆθ m ˆθ M ) 2 + M 1 m=1 M m=1 Var(ˆθ m ) 10 / 28

Outline Motivation Rubin s rules Impute then bootstrap Bootstrap then impute Control based imputation simulation example Conclusions 11 / 28

MI boot Rubin 1. Impute M times 2. For m = 1,.., M, generate B nonparametric bootstraps 3. ˆθ m,b estimate from imputation m, bootstrap b 4. For imputation m, then estimate σ 2 wtn by Var bs (ˆθ m ) = (B 1) 1 B b=1 (ˆθ m,b θ m ) 2 where θ m = B 1 B b=1 ˆθ m,b 5. Rubin s rules applied to ˆθ m and Var bs (ˆθ m ), m = 1,.., M Inference is based on Rubin s rules, so we don t expect unbiased variance estimates under uncongeniality 12 / 28

MI boot pooled As per MI boot Rubin, except at the final stage, a (1 2α)% percentile confidence interval for θ is formed by taking the α and 1 α empirical percentiles of the pooled MB sample of ˆθ m,b values. Assuming the estimator is unbiased, point estimates follow ˆθ m,b = ˆθ + a m + b b where Var(a m ) = σ 2 btw and Var(b b) = σ 2 wtn. 13 / 28

MI boot pooled For large B the corresponding MI boot pooled variance estimator is approximately unbiased for (1 M 1 )σ 2 btw + σ2 wtn Thus for large M and B this will be close to Rubin s variance estimator, and hence be unbiased under congeniality. However, for small M, it is biased downwards and intervals expected to undercover (under congeniality), as Schomaker and Heumann found. Inference is again based (essentially) on Rubin s rules, so we don t expect unbiased variance estimates under uncongeniality 14 / 28

Outline Motivation Rubin s rules Impute then bootstrap Bootstrap then impute Control based imputation simulation example Conclusions 15 / 28

Boot MI 1. Bootstrap B times 2. For b = 1,.., B, impute M times 3. Let ˆθ b = M 1 m ˆθ b,m 4. Form percentile intervals based on ˆθ b, or alternatively a Wald interval based on Var BootMI = (B 1) 1 where ˆθ BM = B 1 B b=1 ˆθ b B b=1 (ˆθ b ˆθ BM ) 2 (1) 16 / 28

Boot MI The point estimates ˆθ bm now follow ˆθ bm = ˆθ + c b + a m with Var(c b ) = σ 2 and Var(a m ) = σ 2 btw It follows that Var BootMI is unbiased for σ 2 + σ2 btw M. We expect unbiased variance estimation under congeniality or uncongeniality 17 / 28

Boot MI pooled The same as Boot MI, but form percentile intervals based on pooled sample of ˆθ b,m. Schomaker and Heumann found this overcovered in simulations (under congeniality). For large B and M, the variance of the pooled sample estimates σ 2 + σbtw 2, and hence is biased upwards, explaining the overcoverage. We would not expect nominal coverage, under congeniality or uncongeniality 18 / 28

Boot MI for inference under uncongeniality Boot MI is the only approach we expect to give unbiased variance estimates under uncongeniality. We need relatively large B for reliable estimates of variance. If we choose M small, point estimator is inefficient, and Monte-Carlo error may be larger than desired. If we choose M large, BM is large, and computationally costly! 19 / 28

von Hippel s boot MI proposal von Hippel [5] proposed using boot MI, with ˆθ BM as the point estimator Its variance is Var(ˆθ BM ) = (1 + B 1 )σ 2 + (BM) 1 σ 2 btw We can fit a one way random intercepts model to the estimates ˆθ b,m to estimate σ 2 and σbtw 2, and insert into the preceding expression. Since large B is required for reliable variance estimates, von Hippel suggested using M = 2. With M = 2, the approach becomes computationally much less costly. 20 / 28

Outline Motivation Rubin s rules Impute then bootstrap Bootstrap then impute Control based imputation simulation example Conclusions 21 / 28

Simulation setup Sample size n = 500. Binary treatment randomly assigned. Y 1, Y 2 (baseline,follow-up) generated from correlated bivariate normal, with mean of Y 2 dependent on treatment. 50% of Y 2 values made missing completely at random. Analysis model is linear regression of Y 2 on treatment and Y 1, and interest focuses on the treatment coefficient. 10,000 simulations 22 / 28

Imputation methods Each of the previously described combinations was used with M = 10 and B = 200 Except, Boot MI von Hippel, which used B = 200 and M = 2 First we imputed Y 2 using normal linear regression under MAR. Next we impute Y 2 using the jump to reference MNAR approach, proposed by Carpenter et al [1]. This imputation model is uncongenial with the analysis model. 23 / 28

Results under congeniality (MAR imputation) Emp. SD Est. SD Med. CI width CI coverage MI Rubin 0.082 0.082 0.327 95.2 MI boot Rubin 0.082 0.082 0.327 95.1 MI boot pooled 0.082 0.078 0.301 93.4 Boot MI 0.082 0.082 0.321 95.1 Boot MI pooled 0.082 0.098 0.383 98.0 Boot MI von Hippel 0.080 0.080 0.315 95.1 MI boot pooled downward biased slightly, as expected. Boot MI pooled biased upwards, as expected. 24 / 28

Results under uncongeniality (J2R imputation) Emp. SD Est. SD Med. CI width CI coverage MI Rubin 0.045 0.051 0.200 97.5 MI boot Rubin 0.045 0.051 0.200 97.5 MI boot pooled 0.045 0.050 0.197 97.3 Boot MI 0.045 0.044 0.175 94.8 Boot MI pooled 0.045 0.047 0.185 96.1 Boot MI von Hippel 0.044 0.044 0.174 94.9 Only Boot MI and Boot MI von Hippel are unbiased for the true repeated sampling variance. All the others overestimate the variance, and hence CIs overcover. 25 / 28

Outline Motivation Rubin s rules Impute then bootstrap Bootstrap then impute Control based imputation simulation example Conclusions 26 / 28

Conclusions Under uncongeniality, bootstrap followed by MI can provide unbiased variance estimation and intervals which attain nominal coverage. von Hippel s version of this is attractive on computational efficiency grounds. Importantly, its application requires no customisation to the particular imputation/analysis model, unlike analytic alternatives. We have assumed: the estimator is normally distributed data are i.i.d. (c.f. stratified randomization) These slides at www.thestatsgeek.com 27 / 28

References [1] J R Carpenter, J H Roger, and M G Kenward. Analysis of longitudinal trials with protocol deviations: a framework for relevant, accessible assumptions and inference via multiple imputation. Journal of Biopharmaceutical Statistics, 23:1352 1371, 2013. [2] X L Meng. Multiple-imputation inferences with uncongenial sources of input (with discussion). Statistical Science, 10:538 573, 1994. [3] M Schomaker and C Heumann. Bootstrap inference when using multiple imputation. Statistics in Medicine, 37(14):2252 2266, 2018. [4] Y Tang. On the multiple imputation variance estimator for control-based and delta-adjusted pattern mixture models. Biometrics, 73(4):1379 1387, 2017. [5] P. T. von Hippel. Maximum likelihood multiple imputation: Faster, more efficient imputation without posterior draws. ArXiv e-prints, 2018. 1210.0870v9. [6] N Wang and J M Robins. Large-sample theory for parametric multiple imputation procedures. Biometrika, 85:935 948, 1998. 28 / 28