Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University

Similar documents
Semiparametric Modeling, Penalized Splines, and Mixed Models

Estimating the Interest Rate Term Structures of Treasury and Corporate Debt with Bayesian Penalized Splines

Machine Learning for Quantitative Finance

Computational Statistics Handbook with MATLAB

1. You are given the following information about a stationary AR(2) model:

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Calibration of Interest Rates

Application of MCMC Algorithm in Interest Rate Modeling

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Introduction to Sequential Monte Carlo Methods

Chapter 7: Estimation Sections

2D penalized spline (continuous-by-continuous interaction)

Chapter 8: Sampling distributions of estimators Sections

Ridge, Bayesian Ridge and Shrinkage

Statistical Models and Methods for Financial Markets

CS340 Machine learning Bayesian model selection

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Window Width Selection for L 2 Adjusted Quantile Regression

(5) Multi-parameter models - Summarizing the posterior

Quantitative Risk Management

Bayesian Inference for Random Coefficient Dynamic Panel Data Models

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

Practice Exam 1. Loss Amount Number of Losses

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Market Risk Analysis Volume I

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Idiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

Weight Smoothing with Laplace Prior and Its Application in GLM Model

Non-informative Priors Multiparameter Models

SELECTION OF VARIABLES INFLUENCING IRAQI BANKS DEPOSITS BY USING NEW BAYESIAN LASSO QUANTILE REGRESSION

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

High Dimensional Bayesian Optimisation and Bandits via Additive Models

Objective Bayesian Analysis for Heteroscedastic Regression

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections

Robust Regression for Capital Asset Pricing Model Using Bayesian Approach

IMPLEMENTING THE SPECTRAL CALIBRATION OF EXPONENTIAL LÉVY MODELS

Estimating the Interest Rate Term Structure of Corporate Debt. with a Semiparametric Penalized Spline Model. Robert Jarrow, David Ruppert, and Yan Yu

European option pricing under parameter uncertainty

Therefore, statistical modelling tools are required which make thorough space-time analyses of insurance regression data possible and allow to explore

Generalized Additive Modelling for Sample Extremes: An Environmental Example

RESEARCH ARTICLE. The Penalized Biclustering Model And Related Algorithms Supplemental Online Material

CS340 Machine learning Bayesian statistics 3

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Extracting Information from the Markets: A Bayesian Approach

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Relevant parameter changes in structural break models

Outline. Review Continuation of exercises from last time

Statistics and Finance

Statistical Inference and Methods

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Maximum Likelihood Estimation

Bayesian Normal Stuff

Chapter 8: Sampling distributions of estimators Sections

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Fitting financial time series returns distributions: a mixture normality approach

Using MCMC and particle filters to forecast stochastic volatility and jumps in financial time series

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

Exam STAM Practice Exam #1

discussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models

The Monte Carlo Method in High Performance Computing

Which GARCH Model for Option Valuation? By Peter Christoffersen and Kris Jacobs

Understanding Tail Risk 1

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the Determination of Specifications in Food Industry

1 Bayesian Bias Correction Model

Two-step conditional α-quantile estimation via additive models of location and scale 1

From Financial Engineering to Risk Management. Radu Tunaru University of Kent, UK

CPSC 540: Machine Learning

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

is the bandwidth and controls the level of smoothing of the estimator, n is the sample size and

Estimating Term Structure of U.S. Treasury Securities: An Interpolation Approach

Equity correlations implied by index options: estimation and model uncertainty analysis

Analyzing Oil Futures with a Dynamic Nelson-Siegel Model

Bayesian Linear Model: Gory Details

Adaptive Experiments for Policy Choice. March 8, 2019

Estimation after Model Selection

CPSC 540: Machine Learning

An Improved Skewness Measure

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Can Rare Events Explain the Equity Premium Puzzle?

Personalized screening intervals for biomarkers using joint models for longitudinal and survival data

Probabilistic Meshless Methods for Bayesian Inverse Problems. Jon Cockayne July 8, 2016

UNIVERSITY OF OSLO. Please make sure that your copy of the problem set is complete before you attempt to answer anything.

A Multivariate Analysis of Intercompany Loss Triangles

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics

Supplementary Appendix to The Risk Premia Embedded in Index Options

Monte Carlo Methods in Financial Engineering

A Test of the Normality Assumption in the Ordered Probit Model *

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

COS 513: Gibbs Sampling

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

9. Logit and Probit Models For Dichotomous Data

Nonresponse Adjustment of Survey Estimates Based on. Auxiliary Variables Subject to Error. Brady T. West. University of Michigan, Ann Arbor, MI, USA

Regret-based Selection

A Multifrequency Theory of the Interest Rate Term Structure

Transcription:

Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University Possible Model SBMD i,j is spinal bone mineral density on ith subject at age equal to age i,j lide http://wwworiecornelledu/~davidr January 24 Joint work with Babette Brumback, Ray Carroll, Brent Coull, Ciprian Crainiceanu, Matt Wand, Yan Yu, and others Slide 3 SBMD i,j = U i + m(age i,j ) + ɛ i,j, i =,, m = 23, j = i,, n i U i is the random intercept for subject i {U i } are assumed iid N(, σ 2 U ) Example (data from Hastie and James, this analysis in RWC) lide 2 spinal bone mineral density 6 8 2 4 Slide 4 Underlying philosophy minimalist statistics keep it as simple as possible 2 build on classical parametric statistics 3 modular methodology 5 2 25 age (years)

3 2 lide 5 Reference Semiparametric Regression by Ruppert, Wand, and Carroll (23) Slide 7 IQ 9 Quadratic Lots of examples from biostatistics 8 7 Spline 6 5 5 2 25 3 35 lead (microgram/deciliter) Thanks to Rich Canfield for data and estimates Recent Example April 7, 23 Canfield et al (23) Intellectual impairment and blood lead Semiparametric regression Partial linear or partial spline model: Y i = W T i β W + m(x i ) + ɛ i lide 6 longitudinal (mixed model) nine covariates (modelled linearly) effect of lead modelled as a spline (semiparametric model) disturbing conclusion Slide 8 Eg, m(x) = X T i β X + B T (x)b B T (x) = ( B (x) B K (x) ) X T i = ( X i X p i ) B T (x) = { (x κ ) p + (x κ K ) p + }

Fitting LIDAR data with plus functions lide 9 Example m(x) = β + β x + b (x κ ) + + + b K (x κ K ) + slope jumps by b k at κ k Slide log ratio - -8-6 -4-2 4 5 6 7 range Linear plus function 2 plus fn derivative 8 6 4 Generalization ide 2 8 Slide 2 m(x) = β +β x+ +β p x p +b (x κ ) p ++ +b K (x κ K ) p + pth derivative jumps by p! b k at κ k 6 4 first p derivatives are continuous 2 5 5 2 25 3

Quadratic plus function ide 3 4 35 3 25 2 5 plus fn derivative 2nd derivative Slide 5 Penalized least-squares Minimize n { Y (W T i β W + X T i β X + B T (X i )b) } 2 + λ b T Db i= Eg, D = I 5 5 5 2 25 3 Ordinary Least Squares Penalized Least Squares Raw Data 2 knots 3 knots 5 knots Raw Data 2 knots 3 knots 5 knots 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 ide 4 4 6 4 6 4 6 4 6 Slide 6 4 6 4 6 4 6 4 6 knots 2 knots 5 knots knots knots 2 knots 5 knots knots 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 6

Ridge Regression From previous slides: From previous slide: n { Y (W T i β W + X T i β X + B T (X i )b) } 2 + λ b T Db i= Let X have row ( Wi T X T i B T (X i ) ) Then β W = { X T X + λ blockdiag(,, D) } X T Y β X b ide 7 Let X have row ( Wi T X T i B T (X i ) ) Then β W = { X T X + λ blockdiag(,, D) } X T Y β X b Also, a BLUP in a mixed model and an empirical Bayes estimator Slide 9 Linear mixed model: ( ) ( ) β X T X X T Z ( ) X T Y = b Z T X Z T Z + λσ b Z T Y { = ( X Z ) T ( X Z ) + λ blockdiag(, Σ b )} ( X Z ) T Y Linear Mixed Models Y = Xβ + Zb + ε ide 8 where b is N(, σ 2 b Σ b ) Xβ are the fixed effects and Zb are the random effects Henderson s equations ( ) ( β X T X X T Z = b Z T X Z T Z + λσ b ) ( ) X T Y Z T Y Slide 2 Selecting λ cross-validation (CV) 2 generalized cross-validation (GCV) 3 ML or REML in mixed model framework λ = σ2 ɛ σb 2

Selecting the Number of Knots (a) SpaHet, j = 3, typical data set 5 5 (b) MASE comparisons x 4 ide 2 y 5 5 True full search 2 4 6 8 5 relative MASE 5 fixed nknots myopic full search 95 5 2 4 8 2 K 25 Slide 23 MSE 2 Variance MSE frequency 5 2 3 4 5 6 number of knots (coded) ASE K=4 n = 2 25 25 25 ASE K=5 Bias 5 5 2 25 df (λ) fit Optimal n =,, 2 knots, quadratic spline (a) SpaHetLS, j = 3, n = 2, 5 5 (b) MASE comparisons Return to spinal bone mineral density study ide 22 y frequency True full search 5 2 4 6 8 25 2 5 5 2 3 4 5 6 number of knots (coded) relative MASE 5 fixed nknots myopic full search 95 5 2 4 8 2 ASE K=4 n = 2, 5 x 3 5 K 5 5 ASE K=5 x 3 Slide 24 spinal bone mineral density 6 8 2 4 5 2 25 age (years) SBMD i,j = U i + m(age i,j ) + ɛ i,j, i =,, m = 23, j = i,, n i

ide 25 X = age age n age m age mnm ide 26 Z = (age κ ) + (age κ K ) + (age n κ ) + (age n κ K ) + (age m κ ) + (age m κ K ) + (age mnm κ ) + (age mnm κ K ) + Slide 27 u = U U m b b K Slide 28 age (years) spinal bone mineral density 5 2 25 6 8 Variability bars on m and estimated density of U i

ide 29 spinal bone mineral density 4 2 8 6 Broken down by ethnicity Hispanic Asian 5 2 25 White Black 4 2 8 6 Slide 3 Only requires an expansion of the fixed effects by adding the columns black hispanic white black hispanic white black m hispanic m white m 5 2 25 age (years) black m hispanic m white m ide 3 Model with ethnicity effects SBMD ij = U i + m(age ij ) + β black i + β 2 hispanic i +β 3 white i + ε ij, j n i, i m Asian is the reference group Slide 32 contrast with Asian subjects 5 5 Black Hispanic White

ide 33 In this model, the age effects curve for the four ethnic groups are parallel Could we model them as non-parallel? Might be problematic in this example because of the small values of the n i Slide 35 Penalized Splines and Additive Models Additive model: Y i = m (X,i ) + + m P (X P,i ) + ɛ i But the methodology should be useful in other contexts Add interactions between age and black, hispanic, and white Bivariate additive spline model ide 34 These are fixed effects Then add interactions between black, hispanic, white, and asian and the linear plus functions in age These are mean-zero random effects with their own variance component This variance component control the amount of shrinkage of the enthicity-specific curves to the overall effect Slide 36 Y i = β +β x, X i + b x, (X i κ x, ) + + + b x,k (X i κ x,kx ) + + β z, Z i + b z, (Z i κ z, ) + + + b z,k (Z i κ z,kz ) + + ɛ i no need for backfitting computation very rapid no identifiability issues inference is simple

The Bias-Variance Trade-off and Confidence Bands lambda= lambda= ide 37 Bayesian methods The linear mixed model is half-bayesian The random effects have a prior The parameters without a prior are: fixed effects give them diffuse normal priors variance components give them diffuse inverse gamma priors Slide 39 log ratio -8-4 log ratio -8-4 4 5 6 7 range lambda=3 log ratio -8-4 log ratio -8-4 4 5 6 7 range lambda= 4 5 6 7 range 4 5 6 7 range Bayesian methods ide 38 Can be easily implemented in WinBUGS or programmed in, say, MATLAB Allows Bayes rather than empirical Bayes inference Uncertainty due to smoothing parameter selection is taken into account Slide 4 How does one adjust confidence intervals for bias? undersmooth so variance dominates and bias can be safetly ignored

x 4 45 4 35 n=, 2 knots σ=3 Wahba/Nychka Bayesian Intervals [ ] u [ σ 2 u I ] ide 4 MSE 3 25 2 MSE Slide 43 y = Xβ + Zu + ε, Cov ε = σ 2 εi, 5 5 Variance Bias 2 C = ( X Z ) β and ũ are BLUPs 6 5 4 log(λ) 3 optimal 2 ide 42 Adjustment for bias continued estimate bias by a higher order method and subtract off bias (essentially the same as above) Wahba/Nychka Bayesian intervals bias is random so adds to posterior variance interval is widened but there is no offset Slide 44 Cov ([ β ũ ] u ) = σε(c 2 T C+ σ2 ε D) C T C(C T C+ σ2 σu 2 ε D) σu 2 (Frequentist variance Ignores bias) ([ ]) Cov β ũ u = σ 2 ε(c T C + σ2 ε σ 2 u D) (Bayesian posterior variance Takes bias into account)

Correction for measurement error ide 45 strontium ratio 772 7725 773 7735 774 7745 775 95 5 5 2 age (million years) Slide 47 Relatively little research in this area Fan and Truong (993): deconvolution kernels first work inefficient in finite-sample studies no inference strictly for -dimensional smoothing Carroll, Maca, Ruppert functional SIMEX methods and structural spline methods more efficient than Fan and Truong Effect of measurement error 8 6 4 Berry, Carroll, and Ruppert (JASA, 22) fully Bayesian 2 smoothing or penalized splines ide 46 y 2 4 6 8 Slide 48 rather efficient in finite-sample studies inference available scales up semiparametric inference is easy structural 4 3 2 2 3 4 5 x plus error W = X + error and Var(X) = Var(error)

Berry, Carroll, and Ruppert Correction for measurement error starts with mixed-model spline formulation 8 but fully Bayesian conjugate priors 6 4 2 ide 49 true covariates are iid normal but surprisingly robust Slide 5 2 4 normal measurement error in Gibbs, only sampling of true (unknown) covariates requires a Hastings-Metropolis step 6 8 4 3 2 2 3 4 Solid: true Dotted: uncorrected Dashed: corrected Effect of measurement error 8 6 4 2 Measurement Error, continued Ganguli, Staudenmayer, Wand: ide 5 y 2 4 6 Slide 52 EM maximum likelihood estimation in BCR model Works about as well as the fully Bayesian approach Extension to additive models 8 4 3 2 2 3 4 5 x plus error W = X + error and Var(X) = Var(error)

Generalized Regression Extension to non-gaussian responses is conceptually easy ide 53 Get a GLLM However, GLIM s are not trivial Can use: Monte Carlo EM Or MCMC Single-Index Models Y i = g(x T i θ) + Z T i β + ɛ i Yu and Ruppert (22, JASA) ide 54 Let g(x) = γ + γ x + + γ p x p +c (x κ ) p + + + c K (x κ K ) p + Becomes a nonlinear regression model Y i = m(x i, Z i, θ, β, γ, c) + ɛ i