Semiparametric Modeling, Penalized Splines, and Mixed Models

Similar documents
Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University

Estimating the Interest Rate Term Structures of Treasury and Corporate Debt with Bayesian Penalized Splines

Machine Learning for Quantitative Finance

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Computational Statistics Handbook with MATLAB

Calibration of Interest Rates

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Introduction to Sequential Monte Carlo Methods

1. You are given the following information about a stationary AR(2) model:

Chapter 7: Estimation Sections

Chapter 8: Sampling distributions of estimators Sections

2D penalized spline (continuous-by-continuous interaction)

Application of MCMC Algorithm in Interest Rate Modeling

Ridge, Bayesian Ridge and Shrinkage

Statistical Models and Methods for Financial Markets

CS340 Machine learning Bayesian model selection

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Window Width Selection for L 2 Adjusted Quantile Regression

(5) Multi-parameter models - Summarizing the posterior

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Quantitative Risk Management

2 Control variates. λe λti λe e λt i where R(t) = t Y 1 Y N(t) is the time from the last event to t. L t = e λr(t) e e λt(t) Exercises

Practice Exam 1. Loss Amount Number of Losses

Market Risk Analysis Volume I

Bayesian Inference for Random Coefficient Dynamic Panel Data Models

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

Idiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Bayesian Normal Stuff

Non-informative Priors Multiparameter Models

SELECTION OF VARIABLES INFLUENCING IRAQI BANKS DEPOSITS BY USING NEW BAYESIAN LASSO QUANTILE REGRESSION

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

High Dimensional Bayesian Optimisation and Bandits via Additive Models

Objective Bayesian Analysis for Heteroscedastic Regression

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections

Robust Regression for Capital Asset Pricing Model Using Bayesian Approach

IMPLEMENTING THE SPECTRAL CALIBRATION OF EXPONENTIAL LÉVY MODELS

Estimating the Interest Rate Term Structure of Corporate Debt. with a Semiparametric Penalized Spline Model. Robert Jarrow, David Ruppert, and Yan Yu

European option pricing under parameter uncertainty

Therefore, statistical modelling tools are required which make thorough space-time analyses of insurance regression data possible and allow to explore

Generalized Additive Modelling for Sample Extremes: An Environmental Example

CS340 Machine learning Bayesian statistics 3

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

discussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models

Weight Smoothing with Laplace Prior and Its Application in GLM Model

Extracting Information from the Markets: A Bayesian Approach

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Relevant parameter changes in structural break models

Estimating Term Structure of U.S. Treasury Securities: An Interpolation Approach

Statistical Inference and Methods

Outline. Review Continuation of exercises from last time

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

RESEARCH ARTICLE. The Penalized Biclustering Model And Related Algorithms Supplemental Online Material

Chapter 8: Sampling distributions of estimators Sections

Maximum Likelihood Estimation

Fitting financial time series returns distributions: a mixture normality approach

Using MCMC and particle filters to forecast stochastic volatility and jumps in financial time series

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

A Multivariate Analysis of Intercompany Loss Triangles

Which GARCH Model for Option Valuation? By Peter Christoffersen and Kris Jacobs

Understanding Tail Risk 1

A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the Determination of Specifications in Food Industry

Two-step conditional α-quantile estimation via additive models of location and scale 1

Statistics and Finance

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

A New Hybrid Estimation Method for the Generalized Pareto Distribution

is the bandwidth and controls the level of smoothing of the estimator, n is the sample size and

Equity correlations implied by index options: estimation and model uncertainty analysis

Exam STAM Practice Exam #1

Bayesian Linear Model: Gory Details

Estimation after Model Selection

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS001) p approach

Adaptive Experiments for Policy Choice. March 8, 2019

An Improved Skewness Measure

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Can Rare Events Explain the Equity Premium Puzzle?

Personalized screening intervals for biomarkers using joint models for longitudinal and survival data

Probabilistic Meshless Methods for Bayesian Inverse Problems. Jon Cockayne July 8, 2016

UNIVERSITY OF OSLO. Please make sure that your copy of the problem set is complete before you attempt to answer anything.

The Monte Carlo Method in High Performance Computing

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Adaptive Metropolis-Hastings samplers for the Bayesian analysis of large linear Gaussian systems

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics

Supplementary Appendix to The Risk Premia Embedded in Index Options

From Financial Engineering to Risk Management. Radu Tunaru University of Kent, UK

A Test of the Normality Assumption in the Ordered Probit Model *

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

CPSC 540: Machine Learning

1 Bayesian Bias Correction Model

COS 513: Gibbs Sampling

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

9. Logit and Probit Models For Dichotomous Data

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Asymptotic results discrete time martingales and stochastic algorithms

Regret-based Selection

Parameter estimation in SDE:s

Transcription:

Semi 1 Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University http://wwworiecornelledu/~davidr January 24 Joint work with Babette Brumback, Ray Carroll, Brent Coull, Ciprian Crainiceanu, Matt Wand, Yan Yu, and others

Semi 2 Example (data from Hastie and James, this analysis in RWC) spinal bone mineral density 6 8 1 12 14 1 15 2 25 age (years)

Semi 3 Possible Model SBMD i,j is spinal bone mineral density on ith subject at age equal to age i,j SBMD i,j = U i + m(age i,j ) + ɛ i,j, i = 1,, m = 23, j = i,, n i U i is the random intercept for subject i {U i } are assumed iid N(, σu 2 )

Semi 4 Underlying philosophy 1 minimalist statistics keep it as simple as possible 2 build on classical parametric statistics 3 modular methodology

Semi 5 Reference Semiparametric Regression by Ruppert, Wand, and Carroll (23) Lots of examples from biostatistics

Semi 6 Recent Example April 17, 23 Canfield et al (23) Intellectual impairment and blood lead longitudinal (mixed model) nine covariates (modelled linearly) effect of lead modelled as a spline (semiparametric model) disturbing conclusion

Semi 7 13 12 11 1 Quadratic IQ 9 8 7 Spline 6 5 1 15 2 25 3 35 lead (microgram/deciliter) Thanks to Rich Canfield for data and estimates

Semi 8 Semiparametric regression Partial linear or partial spline model: Y i = W T i β W + m(x i ) + ɛ i m(x) = X T i β X + B T (x)b B T (x) = ( B 1 (x) B K (x) ) Eg, X T i = ( X i X p i ) B T (x) = { (x κ 1 ) p + (x κ K ) p + }

Semi 9 Example m(x) = β + β 1 x + b 1 (x κ 1 ) + + + b K (x κ K ) + slope jumps by b k at κ k

Semi 1 Linear plus function 2 plus fn 18 derivative 16 14 12 1 8 6 4 2 5 1 15 2 25 3

Semi 11 Fitting LIDAR data with plus functions log ratio -1-8 -6-4 -2 4 5 6 7 range

Semi 12 Generalization m(x) = β +β 1 x+ +β p x p +b 1 (x κ 1 ) p ++ +b K (x κ K ) p + pth derivative jumps by p! b k at κ k first p 1 derivatives are continuous

Semi 13 4 35 Quadratic plus function plus fn derivative 2nd derivative 3 25 2 15 1 5 5 1 15 2 25 3

Semi 14 Raw Data Ordinary Least Squares 2 knots 3 knots 5 knots 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 1 4 6 1 4 6 1 4 6 1 4 6 1 knots 2 knots 5 knots 1 knots 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 1 4 6 1 4 6 1 4 6 1 4 6

Semi 15 Penalized least-squares Minimize n { Y (W T i β W + X T i β X + B T (X i )b) } 2 + λ b T Db i=1 Eg, D = I

Semi 16 Raw Data Penalized Least Squares 2 knots 3 knots 5 knots 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 1 4 6 1 4 6 1 4 6 1 4 6 1 knots 2 knots 5 knots 1 knots 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 1 4 6 1 4 6 1 4 6 1 4 6

Semi 17 Ridge Regression From previous slide: n { Y (W T i β W + X T i β X + B T (X i )b) } 2 + λ b T Db i=1 Let X have row ( Wi T X T i B T (X i ) ) Then β W β = { X X b T X + λ blockdiag(,, D) } 1 X T Y Also, a BLUP in a mixed model and an empirical Bayes estimator

Semi 18 where b is N(, σ 2 b Σ b ) Linear Mixed Models Y = Xβ + Zb + ε Xβ are the fixed effects and Zb are the random effects Henderson s equations ( ) β b = ( X T X X T Z Z T X Z T Z + λσ 1 b λ = σ2 ɛ σ 2 b ) 1 ( X T Y Z T Y )

Semi 19 From previous slides: Let X have row ( Wi T X T i B T (X i ) ) Then β W β = { X X b T X + λ blockdiag(,, D) } 1 X T Y Linear mixed model: ( ) ( β X T X X T Z = b Z T X = Z T Z + λσ 1 b ) 1 ( X T Y Z T Y { ( X Z ) T ( X Z ) + λ blockdiag(, Σ 1 b )} 1 ( X Z ) T Y )

Semi 2 Selecting λ 1 cross-validation (CV) 2 generalized cross-validation (GCV) 3 ML or REML in mixed model framework

Semi 21 Selecting the Number of Knots (a) SpaHet, j = 3, typical data set 15 115 (b) MASE comparisons y 1 5 5 True full search 1 2 4 6 8 1 relative MASE 11 15 1 95 5 2 4 8 12 K fixed nknots myopic full search 15 25 frequency 1 5 ASE K=4 125 1 2 3 4 5 6 number of knots (coded) n = 2 125 25 ASE K=5

Semi 22 5 (a) SpaHetLS, j = 3, n = 2, 115 (b) MASE comparisons y 5 True full search 2 4 6 8 1 relative MASE 11 15 1 95 5 2 4 8 12 K fixed nknots myopic full search 25 15 x 1 3 frequency 2 15 1 5 ASE K=4 1 5 1 2 3 4 5 6 number of knots (coded) n = 2, 5 1 15 ASE K=5 x 1 3

Semi 23 x 1 4 2 MSE MSE 1 Variance Bias 5 1 15 2 25 df fit (λ) Optimal n = 1,, 2 knots, quadratic spline

Semi 24 Return to spinal bone mineral density study spinal bone mineral density 6 8 1 12 14 1 15 2 25 age (years) SBMD i,j = U i + m(age i,j ) + ɛ i,j, i = 1,, m = 23, j = i,, n i

Semi 25 X = 1 age 11 1 age 1n1 1 age m1 1 age mnm

Semi 26 Z = 1 (age 11 κ 1 ) + (age 11 κ K ) + 1 (age 1n1 κ 1 ) + (age 1n1 κ K ) + 1 (age m1 κ 1 ) + (age m1 κ K ) + 1 (age mnm κ 1 ) + (age mnm κ K ) +

Semi 27 u = U 1 U m b 1 b K

Semi 28 spinal bone mineral density 6 8 1 1 15 2 25 age (years) Variability bars on m and estimated density of U i

Semi 29 Broken down by ethnicity Hispanic 1 15 2 25 White 14 12 spinal bone mineral density 14 12 1 Asian Black 1 8 6 8 6 1 15 2 25 age (years)

Semi 3 Model with ethnicity effects SBMD ij = U i + m(age ij ) + β 1 black i + β 2 hispanic i Asian is the reference group +β 3 white i + ε ij, 1 j n i, 1 i m

Semi 31 Only requires an expansion of the fixed effects by adding the columns black 1 hispanic 1 white 1 black 1 hispanic 1 white 1 black m hispanic m white m black m hispanic m white m

Semi 32 contrast with Asian subjects 5 1 15 Black Hispanic White

Semi 33 In this model, the age effects curve for the four ethnic groups are parallel Could we model them as non-parallel? Might be problematic in this example because of the small values of the n i But the methodology should be useful in other contexts

Semi 34 Add interactions between age and black, hispanic, and white These are fixed effects Then add interactions between black, hispanic, white, and asian and the linear plus functions in age These are mean-zero random effects with their own variance component This variance component control the amount of shrinkage of the enthicity-specific curves to the overall effect

Semi 35 Penalized Splines and Additive Models Additive model: Y i = m 1 (X 1,i ) + + m P (X P,i ) + ɛ i

Semi 36 Bivariate additive spline model Y i = β +β x,1 X i + b x,1 (X i κ x,1 ) + + + b x,k (X i κ x,kx ) + + β z,1 Z i + b z,1 (Z i κ z,1 ) + + + b z,k (Z i κ z,kz ) + + ɛ i no need for backfitting computation very rapid no identifiability issues inference is simple

Semi 37 Bayesian methods The linear mixed model is half-bayesian The random effects have a prior The parameters without a prior are: fixed effects give them diffuse normal priors variance components give them diffuse inverse gamma priors

Semi 38 Bayesian methods Can be easily implemented in WinBUGS or programmed in, say, MATLAB Allows Bayes rather than empirical Bayes inference Uncertainty due to smoothing parameter selection is taken into account

Semi 39 The Bias-Variance Trade-off and Confidence Bands lambda= lambda=1 log ratio -8-4 log ratio -8-4 4 5 6 7 range 4 5 6 7 range lambda=3 lambda=1 log ratio -8-4 log ratio -8-4 4 5 6 7 range 4 5 6 7 range

Semi 4 How does one adjust confidence intervals for bias? undersmooth so variance dominates and bias can be safetly ignored

Semi 41 x 1 4 45 4 35 n=1, 2 knots σ=3 MSE 3 25 2 15 MSE 1 5 Variance Bias 2 1 6 1 5 1 4 1 3 1 2 log(λ) optimal

Semi 42 Adjustment for bias continued estimate bias by a higher order method and subtract off bias (essentially the same as above) Wahba/Nychka Bayesian intervals bias is random so adds to posterior variance interval is widened but there is no offset

Semi 43 Wahba/Nychka Bayesian Intervals y = Xβ + Zu + ε, Cov [ ] u ε [ σ 2 = u I σεi 2 ], C = ( X Z ) β and ũ are BLUPs

Semi 44 ([ β ] u) Cov ũ = σε(c 2 T C+ σ2 ε D) 1 C T C(C T C+ σ2 σu 2 ε D) 1 σu 2 (Frequentist variance Ignores bias) ([ Cov ]) β ũ u = σ 2 ε(c T C + σ2 ε σ 2 u D) 1 (Bayesian posterior variance Takes bias into account)

Semi 45 strontium ratio 772 7725 773 7735 774 7745 775 95 1 15 11 115 12 age (million years)

Semi 46 1 8 6 4 2 Effect of measurement error y 2 4 6 8 1 4 3 2 1 1 2 3 4 5 x plus error W = X + error and Var(X) = Var(error)

Semi 47 Correction for measurement error Relatively little research in this area Fan and Truong (1993): deconvolution kernels first work inefficient in finite-sample studies no inference strictly for 1-dimensional smoothing Carroll, Maca, Ruppert functional SIMEX methods and structural spline methods more efficient than Fan and Truong

Semi 48 Berry, Carroll, and Ruppert (JASA, 22) fully Bayesian smoothing or penalized splines rather efficient in finite-sample studies inference available scales up semiparametric inference is easy structural

Semi 49 Berry, Carroll, and Ruppert starts with mixed-model spline formulation but fully Bayesian conjugate priors true covariates are iid normal but surprisingly robust normal measurement error in Gibbs, only sampling of true (unknown) covariates requires a Hastings-Metropolis step

Semi 5 1 8 6 4 2 Effect of measurement error y 2 4 6 8 1 4 3 2 1 1 2 3 4 5 x plus error W = X + error and Var(X) = Var(error)

Semi 51 Correction for measurement error 1 8 6 4 2 2 4 6 8 1 4 3 2 1 1 2 3 4 Solid: true Dotted: uncorrected Dashed: corrected

Semi 52 Measurement Error, continued Ganguli, Staudenmayer, Wand: EM maximum likelihood estimation in BCR model Works about as well as the fully Bayesian approach Extension to additive models

Semi 53 Generalized Regression Extension to non-gaussian responses is conceptually easy Get a GLLM However, GLIM s are not trivial Can use: Monte Carlo EM Or MCMC

Semi 54 Single-Index Models Y i = g(x T i θ) + Z T i β + ɛ i Yu and Ruppert (22, JASA) Let g(x) = γ + γ 1 x + + γ p x p +c 1 (x κ 1 ) p + + + c K (x κ K ) p + Becomes a nonlinear regression model Y i = m(x i, Z i, θ, β, γ, c) + ɛ i