Modeling Binary outcome

Similar documents
Introduction to POL 217

NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Logit Models for Binary Data

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

The Simple Regression Model

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Logistics Regression & Industry Modeling

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Tests for the Odds Ratio in a Matched Case-Control Design with a Binary X

The Simple Regression Model

Logistic Regression Analysis

CHAPTER 11 Regression with a Binary Dependent Variable. Kazu Matsuda IBEC PHBU 430 Econometrics

To be two or not be two, that is a LOGISTIC question

Logistic Regression. Logistic Regression Theory

STA 4504/5503 Sample questions for exam True-False questions.

Non-Inferiority Tests for the Odds Ratio of Two Proportions

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

STAT 453/653 Homework 6 Solutions

Superiority by a Margin Tests for the Ratio of Two Proportions

σ e, which will be large when prediction errors are Linear regression model

Embedded predictive analysis of misrepresentation risk in insurance ratemaking

Non-Inferiority Tests for the Ratio of Two Proportions

The method of Maximum Likelihood.

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

Online Appendix for The Interplay between Online Reviews and Physician Demand: An Empirical Investigation

Lecture 10: Alternatives to OLS with limited dependent variables, part 1. PEA vs APE Logit/Probit

Joint Research Centre

6-3 Dividing Polynomials

Equivalence Tests for the Odds Ratio of Two Proportions

1. You are given the following information about a stationary AR(2) model:

Econometric Methods for Valuation Analysis

Estimation Parameters and Modelling Zero Inflated Negative Binomial

Postestimation commands predict Remarks and examples References Also see

INSTITUTE OF ACTUARIES OF INDIA

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 6

CREDIT RISK MODELING IN R. Logistic regression: introduction

Chapter 8: CAPM. 1. Single Index Model. 2. Adding a Riskless Asset. 3. The Capital Market Line 4. CAPM. 5. The One-Fund Theorem

Quantitative Methods for Health Care Professionals PUBH 741 (2013)

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Online Appendix to Bond Return Predictability: Economic Value and Links to the Macroeconomy. Pairwise Tests of Equality of Forecasting Performance

Allison notes there are two conditions for using fixed effects methods.

Rand Final Pop 2. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Predicting the Probability of Being a Smoker: A Probit Analysis

Credit Risk Modelling

Lecture 1: Logit. Quantitative Methods for Economic Analysis. Seyed Ali Madani Zadeh and Hosein Joshaghani. Sharif University of Technology

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Introduction to General and Generalized Linear Models

Nonresponse Adjustment of Survey Estimates Based on. Auxiliary Variables Subject to Error. Brady T. West. University of Michigan, Ann Arbor, MI, USA

PASS Sample Size Software

WHAT HAPPENED TO LONG TERM EMPLOYMENT? ONLINE APPENDIX

Investment Platforms Market Study Interim Report: Annex 7 Fund Discounts and Promotions

Comparison of classification methods

SEX DISCRIMINATION PROBLEM

Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

SYLLABUS AND SAMPLE QUESTIONS FOR MSQE (Program Code: MQEK and MQED) Syllabus for PEA (Mathematics), 2013

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

A Comparison of Univariate Probit and Logit. Models Using Simulation

Extension Analysis. Lauren Goodwin Advisor: Steve Cherry. Spring Introduction and Background Filing Basics... 2

Chapter 6. Transformation of Variables

Laplace approximation

ESTIMATION OF MODIFIED MEASURE OF SKEWNESS. Elsayed Ali Habib *

Drawbacks of MNL. MNL may not work well in either of the following cases due to its IIA property:

Regression with a binary dependent variable: Logistic regression diagnostic

Calculating the Probabilities of Member Engagement

Modelling the potential human capital on the labor market using logistic regression in R

Arbitrage Asymmetry and the Idiosyncratic Volatility Puzzle

Multinomial Logit Models for Variable Response Categories Ordered

Econ 582 Nonlinear Regression

9. Logit and Probit Models For Dichotomous Data

A Non-Parametric Technique of Option Pricing

Econometrics II Multinomial Choice Models

Practice Test Set- 12 Solutions

Resampling techniques to determine direction of effects in linear regression models

A Micro Data Approach to the Identification of Credit Crunches

Econometrics and Economic Data

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

Periodic Returns, and Their Arithmetic Mean, Offer More Than Researchers Expect

Ministry of Health, Labour and Welfare Statistics and Information Department

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

Week 3 Supplemental: The Odds......Never tell me them. Stat 305 Notes. Week 3 Supplemental Page 1 / 23

X ln( +1 ) +1 [0 ] Γ( )

Common Measures and Statistics in Epidemiological Literature

Chapter 14. Descriptive Methods in Regression and Correlation. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 14, Slide 1

Economics Multinomial Choice Models

Earnings Dynamics, Mobility Costs and Transmission of Firm and Market Level Shocks

IBM SPSS Regression 25 IBM

Pushing on a string: US monetary policy is less powerful in recessions

Evaluating Methods to Estimate the Effect of State Laws on Firearm Deaths

1 Excess burden of taxation

Vlerick Leuven Gent Working Paper Series 2003/30 MODELLING LIMITED DEPENDENT VARIABLES: METHODS AND GUIDELINES FOR RESEARCHERS IN STRATEGIC MANAGEMENT

Statistics 13 Elementary Statistics

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Abadie s Semiparametric Difference-in-Difference Estimator

Intro to GLM Day 2: GLM and Maximum Likelihood

6.1 Greatest Common Factor and Factor by Grouping *

Unit 5: Study Guide Multilevel models for macro and micro data MIMAS The University of Manchester

Comparing effects across nested logistic regression models

Transcription:

Statistics April 2, 2013 Debdeep Pati Modeling Binary outcome 1. Outcome variable can be binary instead of normally distributed. In biostatistics or epidemiology, we are often interested in the effect of risk factors (x) to a disease (y). 2. We are interested in the relationship between risk factors x and r. 1

2

3

Problems with Linear Regression models 1. The r-x relationship may not be linear 2. Proportions (including risks) must lie between 0 and 1. 3. When observed proportions scan most of this allowable range, the pattern in the scatterplot is generally nonlinear. 4. The tendency toward squashing up as proportions approach the asymptotes at 0 or 1. 5. Predicted values of the risk may be outside the valid range: 6. Fitted linear regression model for r regressed on x is given as r = a + bx. 7. This can lead to predictions of risks that are negative or are greater than unity, and thus impossible. 8. Fitting a linear regression line to the data in Table 4 gives r = 25.394+0.645 age. 9. If we use this model to predict the risk of death for someone aged 39, the prediction gives r = 25.394 + 0.645 39 = 0.239, a negative risk! 10. Similar problems are found with confidence limits for predicted risks within the range of the observed data. 11. The error distribution is not normal. In simple linear regression, we fit the model r = α + βx + ɛ, where ɛ arises from a standard normal distribution. 12. r models proportions: proportions are not likely to have a normal distribution; they are likely to be binomial. 13. The inferences drawn from the linear regression would be inaccurate Logistic regression function 4

1. The logistic function has an S shape 2. solved the non-linearity problem 3. There is an asymptote at y = 0 and y = 1 4. solved the out of bound problem 5. When using logistic function, we assume the data have binomial rather than normal. 6. Solved the assumption of normal error problem 7. The alternative form ( ) ˆr log = b 0 + b 1 x 1 ˆr 8. The left-hand side is called the logit (log of the odds of disease) 9. Logistic regression model postulates a linear relationship between the log odds of disease and the risk factor. 10. The right-hand side is called the linear predictor. 5

6

Interpretation of logistic regression coefficients 1. Smoking and cardiovascular disease: smoker and disease: 31, smoker and no disease: 1386, nonsmoker and disease: 15, nonsmoker and no disease: 1883. 2. 3. logit = 4.8326 + 1.0324x, x = 1 for smokers and 0 for nonsmokers. 4. The odds ratio for disease, comparing smokers to nonsmokers is exp[1.0324(1 0)] = exp[1.0324] = 2.808 5. Observe that log( ˆψ) = log( odds ˆ 1 / odds ˆ 0 ) = log( odds ˆ 1 ) log( odds ˆ 0 ) = logit ˆ 1 logit ˆ 2 = b 0 + b 1 x 1 (b 0 + b 1 x 0 ) Hence ˆψ = exp{b 1 (x 1 x 0 )}. = b 1 (x 1 x 0 ) 6. The estimated standard error of the log odds ratio is 0.3165. An approximate 95% confidence limit for the odds ratio is exp[1.0324 ± 1.96 0.3165] (1.510, 5.221) 7. Since we know the log odds, we can find odds directly from the fitted logit function. 8. The risk of the disease for smoker is r = [1 + exp(4.8326 1.0324 1] 1 = 0.0219 = [1 + exp( logit)] 1 implying logit = -3.8002 9. The risk of the disease for nonsmoker is r = [1 + exp(4.8326)] 1 = 0.0079 10. The relative risk for smokers to nonsmokers: 0.0219/.0079 = 2.77 Case Study Cedergren s 1974 study of final s-deletion in Panama City, Panama. Cedergren had noticed that speakers in Panama City, like in many dialects of Spanish, variably deleted thesat the end of words. She undertook a study to find out if there was a change in progress: if final s was systematically dropping out of Panamanian Spanish. She performed interviews 7

across the city in several different social classes, to see how the variation was structured in the community. She also investigated the linguistic constraints on deletion, so she coded for a phonetic constraint - whether the following segment was consonant, vowel, or pause and the grammatical category of word that the s is part of a: monomorpheme, where the s is part of the free morpheme (e.g.,menos) verb, where the s is the second singular inflection (e.g.,tu tienes,el tienes) determiner, where s is plural marked on a determiner (e.g.,los,las) adjective, where s is a nominal plural agreeing with the noun (e.g.,buenos) noun, wheresmarks a plural noun (e.g.,amigos). 8