Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541

Similar documents
proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

Final Exam - section 1. Thursday, December hours, 30 minutes

Alastair Hall ECG 790F: Microeconometrics Spring Computer Handout # 2. Estimation of binary response models : part II

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Logistic Regression with R: Example One

Lecture 21: Logit Models for Multinomial Responses Continued

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Intro to GLM Day 2: GLM and Maximum Likelihood

Calculating the Probabilities of Member Engagement

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

STA 4504/5503 Sample questions for exam True-False questions.

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Logit Models for Binary Data

PASS Sample Size Software

############################ ### toxo.r ### ############################

Estimation Procedure for Parametric Survival Distribution Without Covariates

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7

To be two or not be two, that is a LOGISTIC question

ARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

Comparing Odds Ratios and Marginal Effects from Logistic Regression and Linear Probability Models

Actuarial Research on the Effectiveness of Collision Avoidance Systems FCW & LDW. A translation from Hebrew to English of a research paper prepared by

Financial Econometrics: Problem Set # 3 Solutions

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

Generalized Linear Models

Class Notes: Week 6. Multinomial Outcomes

Logistic Regression. Logistic Regression Theory

Maximum Likelihood Estimation

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS)

9. Logit and Probit Models For Dichotomous Data

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Chapter 4 Level of Volatility in the Indian Stock Market

Brief Sketch of Solutions: Tutorial 1. 2) descriptive statistics and correlogram. Series: LGCSI Sample 12/31/ /11/2009 Observations 2596

Logistic Regression Analysis

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

Brief Sketch of Solutions: Tutorial 2. 2) graphs. 3) unit root tests

Gov 2001: Section 5. I. A Normal Example II. Uncertainty. Gov Spring 2010

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

Phd Program in Transportation. Transport Demand Modeling. Session 11

Allison notes there are two conditions for using fixed effects methods.

Catherine De Vries, Spyros Kosmidis & Andreas Murr

CREDIT RISK MODELING IN R. Logistic regression: introduction

CREDIT SCORING & CREDIT CONTROL XIV August 2015 Edinburgh. Aneta Ptak-Chmielewska Warsaw School of Ecoomics

u panel_lecture . sum

Ordinary Least Squares Regression Examples Vartanian: SW 504

Tests for the Odds Ratio in a Matched Case-Control Design with a Binary X

Variance clustering. Two motivations, volatility clustering, and implied volatility

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction

Bayesian Multinomial Model for Ordinal Data

Didacticiel - Études de cas. In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA.

Quantitative Techniques Term 2

book 2014/5/6 15:21 page 261 #285

Final Exam, section 2. Tuesday, December hour, 30 minutes

Homework Solutions - Lecture 2 Part 2

Final Exam Suggested Solutions

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

Econometric Methods for Valuation Analysis

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Module 4 Bivariate Regressions

Unit 8 Notes: Solving Quadratics by Factoring Alg 1

Chapter 18: The Correlational Procedures

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Advanced Econometrics

Modelling the potential human capital on the labor market using logistic regression in R

Financial Literacy in Urban India: A Case Study of Bohra Community in Mumbai

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

Applying Logistics Regression to Forecast Annual Organizational Retirements

Problem Set 5 Answers. ( ) 2. Yes, like temperature. See the plot of utility in the notes. Marginal utility should be positive.

LAMPIRAN. Null Hypothesis: LO has a unit root Exogenous: Constant Lag Length: 1 (Automatic based on SIC, MAXLAG=13)

Fall 2004 Social Sciences 7418 University of Wisconsin-Madison Problem Set 5 Answers

Estimating Support Labor for a Production Program

DATABASE AND RESEARCH METHODOLOGY

Economics 413: Economic Forecast and Analysis Department of Economics, Finance and Legal Studies University of Alabama

How can saving deposit rate and Hang Seng Index affect housing prices : an empirical study in Hong Kong market

Final Exam, section 1. Thursday, May hour, 30 minutes

Statistical Intervals (One sample) (Chs )

Discrete Choice Modeling William Greene Stern School of Business, New York University. Lab Session 2 Binary Choice Modeling with Panel Data

Chapter 8 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (May 1, 2010)

logistic logistic Merton Black - Scholes Black&Cox Merton Longstaff&Schwarlz Jarrow&Turnbull

Renters Report Future Home Buying Optimism, While Family Financial Assistance Is Most Available to Populations with Higher Homeownership Rates

1) The Effect of Recent Tax Changes on Taxable Income

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Midterm

Appendix. Table A.1 (Part A) The Author(s) 2015 G. Chakrabarti and C. Sen, Green Investing, SpringerBriefs in Finance, DOI /

Generalized Multilevel Regression Example for a Binary Outcome

Stat 401XV Exam 3 Spring 2017

Multiple regression - a brief introduction

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Final Exam

The Economic Consequences of Dollar Appreciation for US Manufacturing Investment: A Time-Series Analysis

Transcription:

Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541 In determining logistic regression results, you will generally be given the odds ratio in the SPSS or SAS output. However, you will not be given the probability estimates for different values of your independent variables. For example, you may be examining the likelihood of being in poverty and would like to know what the probability is for single mothers, for those with 5 children, or for single mothers with 5 children. Through the method I will show you below, you will be able to determine the probability estimates for any group you=re interested in. I will show you how to determine these probability estimates in SAS. Let=s say we=re examining the likelihood of having income below the poverty line (inpov) and are using 4 independent variables: 1.Whether a person lives in public housing (pubhouse B dummy variable) 2. Whether a person lives in a big city (over 750,000 population) (bigcit -- dummy variable) 3. The wages of the head of household (wagehd B continuous variable) 4. The level of education of the head of household (edhd -- continuous variable) The first step will be to use SAS to determine the logistic regression results. SAS automatically determines the likelihood of the 0 condition instead of the likelihood of the 1 condition (for example, the likelihood of not being in poverty), unless you use a descending command, which we will do. Prob(pov = 1)= Remember that the logistic equation takes the following form: We will use this form to determine probability estimates for the different variables within the model. While some researchers use mean values of all other variables to determine the probability estimates of a given variable, you will use the actual observed values to determine these probabilities. e 1+e You will use SAS commands to determine the probabilities of having income below the poverty line for individuals with specific conditions. Below, I have presented the first set of SAS commands to determine the logistic regression results and then to determine the probability estimates from these results. a+xb a+ xb libname in 'P:\pubdata\gssw'; (or wherever the data is located) D:\WP60\LECT2.PHD\LOGIST\LOGIST.PROBAB.SAS.WPD Page 1

data a;set in.psidphd; * You will use this a variable as a merge variable later. For some reason, SAS insists on having this variable to merge by. proc logist descending outest=dd maxiter=100; output out=cc xbeta=xb; model inpov=pubhouse bigcit wagehd edhd; *proc logist is the command for determining logistic regression results. data f;set dd; rename pubhouse=cpubhous bigcit=cbigcit wagehd=cwagehd edhd=cedhd; drop _type_; * You=re creating this merge variable in the second data set, so you can eventually merge the two data sets together. What the different commands mean: The outest=dd. This command tells SAS to create a new data set that only contains coefficient estimates from the logistic regression model. These newly created variables (the b coefficients) have the same names as the original variables (the X variables). We will need to change the names of these because we will eventually be multiply the Xs and the Bs together to determine the probability of an event occurring. Notice that we SET the variables from data set CC into data set f, and we use a rename statement to give these coefficient estimates new names. I=ve changed the names of the variables from their regular names to names that begin with c (for coefficient estimate). Maxiter=100. Logistic regression uses an interative process to determine b coefficients. The process keeps iterating until a stable b coefficient if found. The default number of iterations in SAS is 25. I have simply upped this number to 100 iterations. Out=CC. This is a means for creating a new data set that contains all the variables from the data that went into the logistic regression analysis. Thus, the data in the new data set CC, contains all the variables that went into the logistic run. You will need these variables to determine the probability estimates B these are your X variables. Xbeta=XB. This gives you the estimate for the variable values for each observation times the b coefficient. This XB estimate includes the estimate for the intercept (or a+xb). You=ll be able to use this value for determining probability estimates. However, you will be adding or subtracting from this value, depending on the probability you=re examining. D:\WP60\LECT2.PHD\LOGIST\LOGIST.PROBAB.SAS.WPD Page 2

SECOND SET OF SAS COMMANDS: data g;merge f cc;by a; Now, we will merge together the two created data sets, which will put the b coefficient estimates and the variable values into one data set. Each observation will have the same values for the b coefficients and will have different values for the variables. For example, each observation in the new merged data set below will have the same value for cpubhous=1.4513, while the value for pubhouse will depend on whether or not they lived in public housing (1=yes, 0=no). xb_nopub=xb-cpubhous*pubhouse; * This is the derivation of your XB for the likelihood of being in poverty for those who do not live in public housing. What I=ve done is taken the overall XB (which includes the XB for public housing) and subtracted off the coefficient*variable estimate for living in public housing. This is the same as setting pubhouse=0. In other words, we=re asking for the likelihood of being in poverty, given that all individuals do not live in public housing. This is just as what we did with the estimates in class. That is, we wanted to determine the probability of being in poverty if someone had 5 kids. We substituted the number 5 for the X and multiplied it by the b coefficient. What we will do here is determine this probability for all individuals (holding all else equal), and then take a mean for the sample. This will give us the overall likelihood of being in the condition. xb_pub=xb_nopub+cpubhous; * In this second estimate for public housing, I=m determining the XB for those who live in public housing. We again need to subtract off the coefficient*variable estimates for public housing (as we did above), and then add back in cpubhous*1 (or simply cpubhous). We=re again forcing everyone into a particular state, they live in public housing. We=ll then see how this affects their likelihood of being in poverty. Xb_nobig=xb-cbigcit*bigcit; xb_big=xb_nobig+cbigcit; xb_wag5=xb-cwagehd*wagehd+cwagehd*5; xb_wag10=xb-cwagehd*wagehd+cwagehd*10; xb_wag1=xb-cwagehd*wagehd+cwagehd*1; xb_ed10=xb-cedhd*edhd+cedhd*10; xb_ed12=xb-cedhd*edhd+cedhd*12; xb_ed16=xb-cedhd*edhd+cwagehd*16; With interval/ratio scale variables, we again need to subtract off the coefficient estimates of the variable times the actual value of the variable since these values are already contained in xb. We want to determine estimates for this variable at specific levels B not at the particular level of any individual. So we will add back on the coefficient estimate and multiply it by 10 (to get an estimate of those who have a 10 th grade education), or 12 (high school graduate) or 16 (college D:\WP60\LECT2.PHD\LOGIST\LOGIST.PROBAB.SAS.WPD Page 3

graduate). Below, we take the Xb estimates from above and put them into a logistic form (see the formula above). Each individual within the sample will a value for each of the variables below. We will then take the mean, which will give us an overall average probability of being in poverty. pr_nopub=(exp(xb_nopub))/(1+exp(xb_nopub)); pr_pub=(exp(xb_pub))/(1+exp(xb_pub)); pr_nobig=(exp(xb_nobig))/(1+exp(xb_nobig)); pr_big=(exp(xb_big))/(1+exp(xb_big)); pr_wag5=(exp(xb_wag5))/(1+exp(xb_wag5)); pr_wag10=(exp(xb_wag10))/(1+exp(xb_wag10)); pr_wag1=(exp(xb_wag1))/(1+exp(xb_wag1)); pr_ed10=(exp(xb_ed10))/(1+exp(xb_ed10)); pr_ed12=(exp(xb_ed12))/(1+exp(xb_ed12)); pr_ed16=(exp(xb_ed16))/(1+exp(xb_ed16)); proc means;var pr_nopub pr_pub pr_nobig pr_big pr_wag5 pr_wag10 pr_wag1 pr_ed10 pr_ed12 pr_ed16; weight weight; run; Results The LOGISTIC Procedure Data Set: WORK.A Response Variable: INPOV Response Levels: 2 Number of Observations: 15406 Link Function: Logit Response Profile Ordered Value INPOV Count 1 1 3208 2 0 12198 (In this sample, 3,208 lived below the poverty line, 12,198 did not.) Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 15765.507 10460.511. SC 15773.149 10498.723. -2 LOG L 15763.507 10450.511 5312.996 with 4 DF (p=0.0001) Score.. 3249.816 with 4 DF (p=0.0001) Analysis of Maximum Likelihood Estimates D:\WP60\LECT2.PHD\LOGIST\LOGIST.PROBAB.SAS.WPD Page 4

Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 1.0972 0.1038 111.6740 0.0001.. PUBHOUSE 1 1.4513 0.0832 304.4061 0.0001 0.192948 4.269 BIGCIT 1 0.3778 0.0568 44.2624 0.0001 0.086084 1.459 WAGEHD 1-0.2556 0.00592 1864.9660 0.0001-1.628713 0.774 EDHD 1-0.0964 0.00908 112.8993 0.0001-0.150797 0.908 Notice that all coefficient estimates are significantly related to being in poverty. Those living in public housing and in big cities are positively related to having income below the poverty line. Wages and education level of the head have a negative relationship to having income below the poverty line. Also note that -2 Log L is significant (p=.0001). Thus, all of the variables together are related to the dependent variable. Below are the probability estimates, or the likelihood of being in poverty, given particular conditions. The likelihood of being in poverty given that you don=t live in public housing, controlling for all other variables in the model, is 17.12%. For the probability for those living in public housing is 33.9355%. For those with wages of $1/hour, the likelihood of being in poverty is 43.04%, while for those earning $10/hour, this likelihood is 7.64%. All of the probability estimates are determined holding all other variables within the model constant. The SAS System 15:40 Thursday, January 28, 1999 Variable N Mean Std Dev Minimum Maximum ----------------------------------------------------------------------- PR_NOPUB 15406 0.1712863 0.2055081 4.642831E-12 0.8138157 PR_PUB 15406 0.3393527 0.3135689 1.98185E-11 0.9491308 PR_NOBIG 15406 0.1738347 0.2131065 4.642831E-12 0.9274740 PR_BIG 15406 0.2129280 0.2445817 6.774009E-12 0.9491308 PR_WAG5 15406 0.2209343 0.0850559 0.1394452 0.8386856 PR_WAG10 15406 0.0764884 0.0457277 0.0432017 0.5916215 PR_WAG1 15406 0.4303992 0.0979813 0.3105265 0.9352771 PR_ED10 15406 0.1962643 0.2239156 9.118824E-12 0.8767470 PR_ED12 15406 0.1759773 0.2071500 7.519345E-12 0.8543479 PR_ED16 15406 0.0174886 0.0295942 4.00783E-13 0.2381776 ----------------------------------------------------------------------- D:\WP60\LECT2.PHD\LOGIST\LOGIST.PROBAB.SAS.WPD Page 5

The SAS program: libname in 'P:\pubdata\gssw'; (or wherever the data is located) data a;set in.psidphd; proc logist descending outest=dd maxiter=100; output out=cc xbeta=xb; model inpov=pubhouse bigcit wagehd edhd; data f;set dd; rename pubhouse=cpubhous bigcit=cbigcit wagehd=cwagehd edhd=cedhd; drop _type_; data g;merge f cc;by a; xb_nopub=xb-cpubhous*pubhouse; xb_pub=xb_nopub+cpubhous; Xb_nobig=xb-cbigcit*bigcit; xb_big=xb_nobig+cbigcit; xb_wag5=xb-cwagehd*wagehd+cwagehd*5; xb_wag10=xb-cwagehd*wagehd+cwagehd*10; xb_wag1=xb-cwagehd*wagehd+cwagehd*1; xb_ed10=xb-cedhd*edhd+cedhd*10; xb_ed12=xb-cedhd*edhd+cedhd*12; xb_ed16=xb-cedhd*edhd+cwagehd*16; pr_nopub=(exp(xb_nopub))/(1+exp(xb_nopub)); pr_pub=(exp(xb_pub))/(1+exp(xb_pub)); pr_nobig=(exp(xb_nobig))/(1+exp(xb_nobig)); pr_big=(exp(xb_big))/(1+exp(xb_big)); pr_wag5=(exp(xb_wag5))/(1+exp(xb_wag5)); pr_wag10=(exp(xb_wag10))/(1+exp(xb_wag10)); pr_wag1=(exp(xb_wag1))/(1+exp(xb_wag1)); pr_ed10=(exp(xb_ed10))/(1+exp(xb_ed10)); pr_ed12=(exp(xb_ed12))/(1+exp(xb_ed12)); pr_ed16=(exp(xb_ed16))/(1+exp(xb_ed16)); proc means;var pr_nopub pr_pub pr_nobig pr_big pr_wag5 pr_wag10 pr_wag1 pr_ed10 pr_ed12 pr_ed16; weight weight; run; D:\WP60\LECT2.PHD\LOGIST\LOGIST.PROBAB.SAS.WPD Page 6