Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University Possible Model SBMD i,j is spinal bone mineral density on ith subject at age equal to age i,j lide http://wwworiecornelledu/~davidr January 24 Joint work with Babette Brumback, Ray Carroll, Brent Coull, Ciprian Crainiceanu, Matt Wand, Yan Yu, and others Slide 3 SBMD i,j = U i + m(age i,j ) + ɛ i,j, i =,, m = 23, j = i,, n i U i is the random intercept for subject i {U i } are assumed iid N(, σ 2 U ) Example (data from Hastie and James, this analysis in RWC) lide 2 spinal bone mineral density 6 8 2 4 Slide 4 Underlying philosophy minimalist statistics keep it as simple as possible 2 build on classical parametric statistics 3 modular methodology 5 2 25 age (years)
3 2 lide 5 Reference Semiparametric Regression by Ruppert, Wand, and Carroll (23) Slide 7 IQ 9 Quadratic Lots of examples from biostatistics 8 7 Spline 6 5 5 2 25 3 35 lead (microgram/deciliter) Thanks to Rich Canfield for data and estimates Recent Example April 7, 23 Canfield et al (23) Intellectual impairment and blood lead Semiparametric regression Partial linear or partial spline model: Y i = W T i β W + m(x i ) + ɛ i lide 6 longitudinal (mixed model) nine covariates (modelled linearly) effect of lead modelled as a spline (semiparametric model) disturbing conclusion Slide 8 Eg, m(x) = X T i β X + B T (x)b B T (x) = ( B (x) B K (x) ) X T i = ( X i X p i ) B T (x) = { (x κ ) p + (x κ K ) p + }
Fitting LIDAR data with plus functions lide 9 Example m(x) = β + β x + b (x κ ) + + + b K (x κ K ) + slope jumps by b k at κ k Slide log ratio - -8-6 -4-2 4 5 6 7 range Linear plus function 2 plus fn derivative 8 6 4 Generalization ide 2 8 Slide 2 m(x) = β +β x+ +β p x p +b (x κ ) p ++ +b K (x κ K ) p + pth derivative jumps by p! b k at κ k 6 4 first p derivatives are continuous 2 5 5 2 25 3
Quadratic plus function ide 3 4 35 3 25 2 5 plus fn derivative 2nd derivative Slide 5 Penalized least-squares Minimize n { Y (W T i β W + X T i β X + B T (X i )b) } 2 + λ b T Db i= Eg, D = I 5 5 5 2 25 3 Ordinary Least Squares Penalized Least Squares Raw Data 2 knots 3 knots 5 knots Raw Data 2 knots 3 knots 5 knots 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 ide 4 4 6 4 6 4 6 4 6 Slide 6 4 6 4 6 4 6 4 6 knots 2 knots 5 knots knots knots 2 knots 5 knots knots 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6
Ridge Regression From previous slides: From previous slide: n { Y (W T i β W + X T i β X + B T (X i )b) } 2 + λ b T Db i= Let X have row ( Wi T X T i B T (X i ) ) Then β W = { X T X + λ blockdiag(,, D) } X T Y β X b ide 7 Let X have row ( Wi T X T i B T (X i ) ) Then β W = { X T X + λ blockdiag(,, D) } X T Y β X b Also, a BLUP in a mixed model and an empirical Bayes estimator Slide 9 Linear mixed model: ( ) ( ) β X T X X T Z ( ) X T Y = b Z T X Z T Z + λσ b Z T Y { = ( X Z ) T ( X Z ) + λ blockdiag(, Σ b )} ( X Z ) T Y Linear Mixed Models Y = Xβ + Zb + ε ide 8 where b is N(, σ 2 b Σ b ) Xβ are the fixed effects and Zb are the random effects Henderson s equations ( ) ( β X T X X T Z = b Z T X Z T Z + λσ b ) ( ) X T Y Z T Y Slide 2 Selecting λ cross-validation (CV) 2 generalized cross-validation (GCV) 3 ML or REML in mixed model framework λ = σ2 ɛ σb 2
Selecting the Number of Knots (a) SpaHet, j = 3, typical data set 5 5 (b) MASE comparisons x 4 ide 2 y 5 5 True full search 2 4 6 8 5 relative MASE 5 fixed nknots myopic full search 95 5 2 4 8 2 K 25 Slide 23 MSE 2 Variance MSE frequency 5 2 3 4 5 6 number of knots (coded) ASE K=4 n = 2 25 25 25 ASE K=5 Bias 5 5 2 25 df (λ) fit Optimal n =,, 2 knots, quadratic spline (a) SpaHetLS, j = 3, n = 2, 5 5 (b) MASE comparisons Return to spinal bone mineral density study ide 22 y frequency True full search 5 2 4 6 8 25 2 5 5 2 3 4 5 6 number of knots (coded) relative MASE 5 fixed nknots myopic full search 95 5 2 4 8 2 ASE K=4 n = 2, 5 x 3 5 K 5 5 ASE K=5 x 3 Slide 24 spinal bone mineral density 6 8 2 4 5 2 25 age (years) SBMD i,j = U i + m(age i,j ) + ɛ i,j, i =,, m = 23, j = i,, n i
ide 25 X = age age n age m age mnm ide 26 Z = (age κ ) + (age κ K ) + (age n κ ) + (age n κ K ) + (age m κ ) + (age m κ K ) + (age mnm κ ) + (age mnm κ K ) + Slide 27 u = U U m b b K Slide 28 age (years) spinal bone mineral density 5 2 25 6 8 Variability bars on m and estimated density of U i
ide 29 spinal bone mineral density 4 2 8 6 Broken down by ethnicity Hispanic Asian 5 2 25 White Black 4 2 8 6 Slide 3 Only requires an expansion of the fixed effects by adding the columns black hispanic white black hispanic white black m hispanic m white m 5 2 25 age (years) black m hispanic m white m ide 3 Model with ethnicity effects SBMD ij = U i + m(age ij ) + β black i + β 2 hispanic i +β 3 white i + ε ij, j n i, i m Asian is the reference group Slide 32 contrast with Asian subjects 5 5 Black Hispanic White
ide 33 In this model, the age effects curve for the four ethnic groups are parallel Could we model them as non-parallel? Might be problematic in this example because of the small values of the n i Slide 35 Penalized Splines and Additive Models Additive model: Y i = m (X,i ) + + m P (X P,i ) + ɛ i But the methodology should be useful in other contexts Add interactions between age and black, hispanic, and white Bivariate additive spline model ide 34 These are fixed effects Then add interactions between black, hispanic, white, and asian and the linear plus functions in age These are mean-zero random effects with their own variance component This variance component control the amount of shrinkage of the enthicity-specific curves to the overall effect Slide 36 Y i = β +β x, X i + b x, (X i κ x, ) + + + b x,k (X i κ x,kx ) + + β z, Z i + b z, (Z i κ z, ) + + + b z,k (Z i κ z,kz ) + + ɛ i no need for backfitting computation very rapid no identifiability issues inference is simple
The Bias-Variance Trade-off and Confidence Bands lambda= lambda= ide 37 Bayesian methods The linear mixed model is half-bayesian The random effects have a prior The parameters without a prior are: fixed effects give them diffuse normal priors variance components give them diffuse inverse gamma priors Slide 39 log ratio -8-4 log ratio -8-4 4 5 6 7 range lambda=3 log ratio -8-4 log ratio -8-4 4 5 6 7 range lambda= 4 5 6 7 range 4 5 6 7 range Bayesian methods ide 38 Can be easily implemented in WinBUGS or programmed in, say, MATLAB Allows Bayes rather than empirical Bayes inference Uncertainty due to smoothing parameter selection is taken into account Slide 4 How does one adjust confidence intervals for bias? undersmooth so variance dominates and bias can be safetly ignored
x 4 45 4 35 n=, 2 knots σ=3 Wahba/Nychka Bayesian Intervals [ ] u [ σ 2 u I ] ide 4 MSE 3 25 2 MSE Slide 43 y = Xβ + Zu + ε, Cov ε = σ 2 εi, 5 5 Variance Bias 2 C = ( X Z ) β and ũ are BLUPs 6 5 4 log(λ) 3 optimal 2 ide 42 Adjustment for bias continued estimate bias by a higher order method and subtract off bias (essentially the same as above) Wahba/Nychka Bayesian intervals bias is random so adds to posterior variance interval is widened but there is no offset Slide 44 Cov ([ β ũ ] u ) = σε(c 2 T C+ σ2 ε D) C T C(C T C+ σ2 σu 2 ε D) σu 2 (Frequentist variance Ignores bias) ([ ]) Cov β ũ u = σ 2 ε(c T C + σ2 ε σ 2 u D) (Bayesian posterior variance Takes bias into account)
Correction for measurement error ide 45 strontium ratio 772 7725 773 7735 774 7745 775 95 5 5 2 age (million years) Slide 47 Relatively little research in this area Fan and Truong (993): deconvolution kernels first work inefficient in finite-sample studies no inference strictly for -dimensional smoothing Carroll, Maca, Ruppert functional SIMEX methods and structural spline methods more efficient than Fan and Truong Effect of measurement error 8 6 4 Berry, Carroll, and Ruppert (JASA, 22) fully Bayesian 2 smoothing or penalized splines ide 46 y 2 4 6 8 Slide 48 rather efficient in finite-sample studies inference available scales up semiparametric inference is easy structural 4 3 2 2 3 4 5 x plus error W = X + error and Var(X) = Var(error)
Berry, Carroll, and Ruppert Correction for measurement error starts with mixed-model spline formulation 8 but fully Bayesian conjugate priors 6 4 2 ide 49 true covariates are iid normal but surprisingly robust Slide 5 2 4 normal measurement error in Gibbs, only sampling of true (unknown) covariates requires a Hastings-Metropolis step 6 8 4 3 2 2 3 4 Solid: true Dotted: uncorrected Dashed: corrected Effect of measurement error 8 6 4 2 Measurement Error, continued Ganguli, Staudenmayer, Wand: ide 5 y 2 4 6 Slide 52 EM maximum likelihood estimation in BCR model Works about as well as the fully Bayesian approach Extension to additive models 8 4 3 2 2 3 4 5 x plus error W = X + error and Var(X) = Var(error)
Generalized Regression Extension to non-gaussian responses is conceptually easy ide 53 Get a GLLM However, GLIM s are not trivial Can use: Monte Carlo EM Or MCMC Single-Index Models Y i = g(x T i θ) + Z T i β + ɛ i Yu and Ruppert (22, JASA) ide 54 Let g(x) = γ + γ x + + γ p x p +c (x κ ) p + + + c K (x κ K ) p + Becomes a nonlinear regression model Y i = m(x i, Z i, θ, β, γ, c) + ɛ i