Comparing effects across nested logistic regression models

Similar documents
Comparing effects across nested logistic regression models

A Correlation Metric for Cross-Sample Comparisons Using Logit and Probit

Context Power analyses for logistic regression models fit to clustered data

Econometric Methods for Valuation Analysis

Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models

NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

Multiple paths in educational transitions: A multinomial transition model with unobserved heterogeneity Karlson, Kristian Bernt

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Journal of Economic Studies. Quantile Treatment Effect and Double Robust estimators: an appraisal on the Italian job market.

Is neglected heterogeneity really an issue in binary and fractional regression models? A simulation exercise for logit, probit and loglog models

Estimating Heterogeneous Choice Models with Stata

Intro to GLM Day 2: GLM and Maximum Likelihood

Discrete Choice Modeling

15. Multinomial Outcomes A. Colin Cameron Pravin K. Trivedi Copyright 2006

Econometrics II Multinomial Choice Models

CHAPTER 8: INDEX MODELS

CHAPTER 11 Regression with a Binary Dependent Variable. Kazu Matsuda IBEC PHBU 430 Econometrics

Difficult Choices: An Evaluation of Heterogenous Choice Models

Analysis of Microdata

Modelling Returns: the CER and the CAPM

Dynamic Replication of Non-Maturing Assets and Liabilities

Comparing Odds Ratios and Marginal Effects from Logistic Regression and Linear Probability Models

Threshold cointegration and nonlinear adjustment between stock prices and dividends

VARIANCE ESTIMATION FROM CALIBRATED SAMPLES

Monetary policy under uncertainty

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

The Simple Regression Model

Vlerick Leuven Gent Working Paper Series 2003/30 MODELLING LIMITED DEPENDENT VARIABLES: METHODS AND GUIDELINES FOR RESEARCHERS IN STRATEGIC MANAGEMENT

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

A Comparison of Univariate Probit and Logit. Models Using Simulation

PASS Sample Size Software

Introduction to POL 217

Parametric versus nonparametric methods in risk scoring: an application to microcredit

Principles of Finance

Limited Dependent Variables

Logit Models for Binary Data

Yannan Hu 1, Frank J. van Lenthe 1, Rasmus Hoffmann 1,2, Karen van Hedel 1,3 and Johan P. Mackenbach 1*

List of tables List of boxes List of screenshots Preface to the third edition Acknowledgements

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

ECON Introductory Econometrics. Lecture 1: Introduction and Review of Statistics

Dan Breznitz Munk School of Global Affairs, University of Toronto, 1 Devonshire Place, Toronto, Ontario M5S 3K7 CANADA

Estimation of a credit scoring model for lenders company

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Analyzing the Determinants of Project Success: A Probit Regression Approach

The Simple Regression Model

Empirical Methods for Corporate Finance. Panel Data, Fixed Effects, and Standard Errors

The Delta Method. j =.

Introductory Econometrics for Finance

Long-run Consumption Risks in Assets Returns: Evidence from Economic Divisions

THE EQUIVALENCE OF THREE LATENT CLASS MODELS AND ML ESTIMATORS

West Coast Stata Users Group Meeting, October 25, 2007

The analysis of the multivariate linear regression model of. soybean future influencing factors

The Multinomial Logit Model Revisited: A Semiparametric Approach in Discrete Choice Analysis

Problem Set # Due Monday, April 19, 3004 by 6:00pm

Banking Industry Risk and Macroeconomic Implications

Pension Scheme Redesign and Wealth Redistribution Between Members and Sponsor: The USS Rule Change in October 2011

Equilibrium Asset Returns

Lecture 3: Factor models in modern portfolio choice

Multinomial Choice (Basic Models)

DANMARKS NATIONALBANK

The Response of Asset Prices to Unconventional Monetary Policy

Final Exam Suggested Solutions

FDI and economic growth: new evidence on the role of financial markets

Earnings Inequality and the Minimum Wage: Evidence from Brazil

Power of t-test for Simple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach

CHAPTER 8: INDEX MODELS

The Impact of a $15 Minimum Wage on Hunger in America

Online Appendices Practical Procedures to Deal with Common Support Problems in Matching Estimation

A Two-Step Estimator for Missing Values in Probit Model Covariates

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

On the Use of Stock Index Returns from Economic Scenario Generators in ERM Modeling

Portfolio Management

Laplace approximation

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Inter-ethnic Marriage and Partner Satisfaction

Calculating VaR. There are several approaches for calculating the Value at Risk figure. The most popular are the

Snapshot Images of Country Risk Ratings: An International Comparison

ACTIVE PORTFOLIO CONSTRUCTION WHEN RISK AND ALPHA FACTORS ARE MISALIGNED

A Stochastic Reserving Today (Beyond Bootstrap)

Probits. Catalina Stefanescu, Vance W. Berger Scott Hershberger. Abstract

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Final Exam

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Volume 29, Issue 3. Application of the monetary policy function to output fluctuations in Bangladesh

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

Estimating Ordered Categorical Variables Using Panel Data: A Generalised Ordered Probit Model with an Autofit Procedure

Return Decomposition over the Business Cycle

Multivariate probit models for conditional claim-types

Microéconomie de la finance

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

Financial Literacy and the Demand for Financial Advice

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

1. Logit and Linear Probability Models

An analysis of momentum and contrarian strategies using an optimal orthogonal portfolio approach

A Micro Data Approach to the Identification of Credit Crunches

Forecast Combination

Application to Portfolio Theory and the Capital Asset Pricing Model

A Mixed Grouped Response Ordered Logit Count Model Framework

Empirical Test of Affine Stochastic Discount Factor Model of Currency Pricing. Abstract

Calculator Advanced Features. Capital Budgeting. Contents. Net Present Value (NPV) Net Present Value (NPV) Net Present Value (NPV) Capital Budgeting

Transcription:

Comparing effects across nested logistic regression models CAPS Methods Core Quantitative Working Group Seminar September 3, 011 Steve Gregorich SEGregorich 1 Sept 3, 011

SEGregorich Sept 3, 011

Comparing parameter estimates across two nested linear models Covariate-adjusted (Full) model yɺ = a + x bɺ + c bɺ + eɺ i F i x.f i c.f i.f Unadjusted (Restricted) model yɺ = a + xɺ + eɺ i R ib x. R i.r What is the effect of adjustment for c?. Compare b ɺ x.f to b ɺ x.r, either formally or just 'eyeball' the difference SEGregorich 3 Sept 3, 011

Comparing parameter estimates across two nested logistic models Covariate-adjusted (Full) model logit(y i = 1 x i, c i ) = a F + x i b x.f + c i b c.f Unadjusted (Restricted) model logit(y i = 1 x i ) = a R + x i b x.r. Here, comparing b x.f to b x.r is more complex. To understand why, we'll look at the binary outcome threshold model SEGregorich 4 Sept 3, 011

Binary outcome regression represented as a threshold model.y* is an unobserved (latent) continuous outcome variable representing the propensity of outcome occurrence * yi = a + xib + ei ɺ ɺ ɺ, where e i ~ Logistic(0,π /3) for logistic or N(0,1) for probit ɺ. Usually, the relationship between continuous y* and binary y is defined as if y i * >0 then y i = 1; else y i = 0 Given e i ~ Logistic(0,π /3), model parameters for correctly specified models ɺ are equivalent across linear model of y*, and logistic model of y SEGregorich 5 Sept 3, 011

Three identifying assumptions of logistic regression model. conditional mean of e i = 0 ɺ. Var( e i x) = π /3 ɺ. threshold value for y* is 0 (usually): if y* > 0 then y = 1; else y = 0 SEGregorich 6 Sept 3, 011

Comparing linear and logistic regression Basics of modeled variation outcome variance linear regression (y) σ y is observed logistic regression σ y * is model-dependent Effects of added X variables on modeled variation outcome variance linear regression (y) σ y unchanged residual variance σ e ɺ is model-dependent σ e ɺ is fixed residual variance σ e ɺ decreased logistic regression (y*) σ y * increased σ e ɺ unchanged. Adding explanatory vars. to a logistic model, increases implied variance of y*. Essentially, y* is rescaled.. When y* is rescaled, model parameters are also rescaled. Same for models of y SEGregorich 7 Sept 3, 011

Comparing parameters across nested logistic regression models logit(y i = 1 x i, c i ) = a F + x i b x.f + c i b c.f (Full model) logit(y i = 1 x i ) = a R + x i b x.r (Restricted model). b 1.F and b 1.R may differ because of. confounding (expectation: b x.f < b x.r ). negative confounding (expectation: b x.f > b x.r ). rescaling (expectation: b x.f > b x.r ). a combination (expectation:??) Parameter rescaling is almost universally unknown/ignored except in specific contexts. testing mediation. generalized linear mixed models SEGregorich 8 Sept 3, 011

A simulated example of faux negative confounding Simulated data. A single sample with N=500,000. x and c are bivariate normal with the following sample statistics (exactly). x = c = 0. σ = 1, σ = 4 x c. r xc = 0. Next, I used x and c values to generate a continuous y * variate as y = x + c + e, (i.e., both regression parameters equaled unity) ɺ * i i i i where the e i ~ Logistic(0,π /3) ɺ. Finally, I created a binary version of y * as y = 1 if y * > 0; y = 0 otherwise SEGregorich 9 Sept 3, 011

A simulated example of faux negative confounding Results of linear models regressing y * onto x and c Full model Adjusted b Restricted model Unadjusted b x modeled, c excluded 1.00 1.00 c modeled, x excluded 1.00 1.00 Results of logistic models regressing y onto x and c Full model Adjusted b Restricted model Unadjusted b x modeled, c excluded 1.00 0.61 c modeled, x excluded 1.00 0.85 SEGregorich 10 Sept 3, 011

A simulated example of faux negative confounding Explanation for results on previous slide In this simplified example, x and c are orthogonal, so the implied variance of y* equals Full model σ y*.f = σ xbx.f + σ c bc.f + π 3 = 8.9 Restricted model including x σ y*.r = σ xbx.r + π 3 = 3.66 Scaling of the outcome and parameter estimates is not equivalent across models SEGregorich 11 Sept 3, 011

One attempted solution in the literature. In the context of testing mediation, Winship and Mare (1984) and MacKinnon & Dwyer (1993) suggested a rescaling of model parameters based upon the σ and * * y. This is known as y-standardization. However, it does not work very well.f σ to allow comparison of, e.g., b x.r and b x.f y. For the previous example, the rescaled value of b x.r equals.r rescaled b x.r = 0.61 σ σ y y * *.F.R = 0.61 8.9 3.66 =0.61 1.51 = 0.9, not 1.00. There have been other proposed solutions that I have not studied (reportedly they don't work well, either) SEGregorich 1 Sept 3, 011

Karlson, Holm, & Breen (KHB) (in press). KHB argue that the scaling is a factor of the error standard deviation, σ e, not the standard deviation of y*. Of course y* and σ e are unobserved, in practice, ɺ but given our simulated data, we can take a look. For the Full model, σ e.f = π 3 = 3.9 ɺ. For the Restricted model, σ e.r = σ c + π 3 = 7.9 ɺ. Therefore, the KHB-suggested rescaled value equals rescaled b x.r = 0.61 ɺ σ σ ɺ e.r e.f 7.9 = 0.61 3 π 3 = 0.61 1.49 = 0.91, not 1.00 SEGregorich 13 Sept 3, 011

Comparing parameter rescaling methods From the earlier simulated example Full model Restricted model ˆb Restricted σ -rescaled * y Restricted σ -rescaled ɺ x 1.00 0.61 0.9 0.91 e Regardless of these results, KHB suggest a method to rescale parameter estimates from binary outcome models that appears to work. SEGregorich 14 Sept 3, 011

KHB method. Here, C i refers to the vector of covariates in the Full model. Replace all covariates, C i, in the Full logisitc regression model with residuals from regression of C i on x, R i. Name this the KHB model. The KHB model provides an estimate of the unadjusted effect of x on y that is on the same scale as parameters from the Full model. Clever The R i are uncorrelated with x The KHB model obtains an unadjusted estimate of the x effect. (the KHB model obtains Type 1 estimates of the x effect). model-dependent σ * and σ y e are equivalent across the KHB and Full models ɺ The KHB model obtains unadjusted parameter estimates for x that are on the scale of the Full model.. Method easily extends to accommodate any number of x and c variables SEGregorich 15 Sept 3, 011

KHB method What about binary covariates? KHB suggest using the linear probability model (LPM) to generate residuals of the C i Then fit the KHB model in the usual way SEGregorich 16 Sept 3, 011

KHB method LPM. Fit a linear regression model of the binary outcome. Conditional expectation of y given x, E(y i x i ) = Pr(y i =1 x) = a + x i b x + c i b c. Binary y does not affect interpretation the parameters, compared to continuous y. For a unit change in x, the expected change in the probability that y=1 is b x, holding any control variables constant.. Because the model is linear, a unit change in x always results in the same change in probability the model is linear in the probability.. In general practice, there are problems with the linear probability model:. heteroskedasticity (the variance of y x depends on x). residuals cannot be normally distributed. predicted probabilities outside [0,1]. functional form Even so, the LPM could meet the needs of the KHB model I.e., to estimate the unadjusted effect of x on the scale of the Full model SEGregorich 17 Sept 3, 011

Simulation study: Population model x (σ = 1) 0.5 c1 (σ =.5) 1.0 r xc c (σ =.5) c3 (σ =.5) 1.0 1.0 1.0 y; y* c4 (σ =.5) Unadjusted effects of x ( b R ) as a function of r xc : ɺ. r xc = 0.50; b R = 0.5 + 0.50 0.5 4 = 1.00 ɺ. r xc = 0 ; b R = 0.5 + 0 0.5 4 = 0.50 ɺ. r xc = 0.15; b R = 0.5 + 0.15 0.5 4 = 0.5 ɺ SEGregorich 18 Sept 3, 011

Simulation Details. N=15, 50, 500, 1000. R=1000. x ~ N(0, 1). c1 - c4 ~N(0, 0.5); or B(0.50). conditions: norm./bin. c; variance = 0.5. b x = bx = 0.5; b c = b c = 1.0 ɺ ɺ. r xc = 0.5; 0; 0.15. 3 conditions: pos., no, and neg. confounding. r cc = 0 y i * = x i 0.5 + c1 i + c i +c3 i +c4 i + e i, where e i ~ Logistic(0,π /3) ɺ ɺ if y i * >0 then y i = 1; else y i = 0. y* ~ N(0, ). y ~ B(0.50) dependent on r xc : ranges from approximately 4.0 to 5.0 SEGregorich 19 Sept 3, 011

Simulation results: N=1000. R=1000 replicate samples Continuous x and c: linear reg: y* logistic reg: y KHB: y (a) (b) (c) (d) (e) (f) (g) (h) r xc ˆ σ b e R b (b) (c) R b ɺ π 3 R se σ b ɺɺ R covg. +0.5 1.00 1.13 0.84 0.96 1.00 0.09 0.09 0.933 0.0 0.50 1.14 0.41 0.47 0.50 0.07 0.08 0.99 0.15 0.5 1.14 0.0 0.3 0.5 0.07 0.08 0.93 Continuous x and binary c: linear reg: y* logistic reg: y KHB: y (a) (b) (c) (d) (e) (f) (g) (h) r xc ˆ σ b e R b (b) (c) R b ɺ π 3 R se σ b ɺɺ R covg. +0.5 1.00 1.11 0.91 1.00 1.0 0.11 0.11 0.953 0.0 0.50 1.14 0.43 0.49 0.51 0.09 0.09 0.95 0.15 0.5 1.13 0.1 0.4 0.5 0.08 0.09 0.98 SEGregorich 0 Sept 3, 011

Some implications about naïve point estimates of b R If you naïvely compare b F to b R, you might draw incorrect conclusions Results for continuous x and c r xc b F b R naïve b R KHB naïve true ɺɺ +0.50 0.50 0.91 1.00 +0.41 +0.50 0 0.50 0.43 0.50 0.07 0 0.15 0.50 0.1 0.5 0.9* 0.5 under-estimating the degree of positive confounding suggesting negative confounding when none exists * over-estimating the degree of negative confounding Simulations were simplistic. models with multiple covariates may include those that are positively, negatively, and un-confounded with x SEGregorich 1 Sept 3, 011

More Tests of differences between adjusted and rescaled unadjusted effects Normally I don't care about this (except in the context of testing mediation) KHB present a test based upon Sobel. Can accommodate multiple x and multiple c variables Known problems with Sobel, Aroian, etc SEGregorich Sept 3, 011

Conclusions. KHB model is simple to implement. Quality of KHB model point estimates Seems to do a good job of obtaining rescaled unadjusted point estimates Use of LPM for binary covariates seemed to work well I considered other scenarios, Varied the distribution of binary c and y Lognormal distribution of X KHB (011) report upon a fairly extensive simulation study. Quality of KHB model standard errors/coverage Coverage of rescaled unadjusted x effects was just OK in my limited simulation. If one wants to emphasize any tests of rescaled unadjusted effects, the bootstrap should be considered SEGregorich 3 Sept 3, 011

Resources KHB papers (contact Kristian Karlson: kbk@sfi.dk) 1. Kristian Bernt Karlson, Anders Holm, and Richard Breen. (March, 09, 011). Comparing Regression Coefficients Between Models using Logit and Probit: A New Method. Draft manuscript.. Kohler, U., Karlson, K.B., Holm, A. (in press). Comparing coefficients of nested nonlinear probability models. The Stata Journal. 3. Breen, R., Karlson, K.B., Holm, A. (April 11, 011). Total, Direct, and Indirect Effects in Logit Models. Abstract available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1730065 4. Karlson, K.B. and Holm, A. (011). Decomposing primary and secondary effects: A new decomposition method. Research in Social Stratification and Mobility, 9, 1-37. http://www.sciencedirect.com/science/article/pii/s07656410000697 KHB Stata ado http://ideas.repec.org/c/boc/bocode/s45715.html SEGregorich 4 Sept 3, 011