WWS 508b Precept 10. John Palmer. April 27, 2010

Similar documents
[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

Module 4 Bivariate Regressions

Econ Spring 2016 Section 10

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Logistic Regression Analysis

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Final Exam - section 1. Thursday, December hours, 30 minutes

Catherine De Vries, Spyros Kosmidis & Andreas Murr

Example 8.1: Log Wage Equation with Heteroscedasticity-Robust Standard Errors

Lecture 10: Alternatives to OLS with limited dependent variables, part 1. PEA vs APE Logit/Probit

Limited Dependent Variables

West Coast Stata Users Group Meeting, October 25, 2007

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Advanced Econometrics

EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit

Introduction to POL 217

Econometric Methods for Valuation Analysis

Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models

STATA Program for OLS cps87_or.do

You created this PDF from an application that is not licensed to print to novapdf printer (

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Estimating Heterogeneous Choice Models with Stata

Introduction to fractional outcome regression models using the fracreg and betareg commands

Post-Estimation Techniques in Statistical Analysis: Introduction to Clarify and S-Post in Stata

Logit Models for Binary Data

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

CHAPTER 11 Regression with a Binary Dependent Variable. Kazu Matsuda IBEC PHBU 430 Econometrics

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

Quantitative Techniques Term 2

Applied Econometrics. Lectures 13 & 14: Nonlinear Models Beyond Binary Choice: Multinomial Response Models, Corner Solution Models &

ExcelBasics.pdf. Here is the URL for a very good website about Excel basics including the material covered in this primer.

LESSON Preparing an Income Statement. CENTURY 21 ACCOUNTING Thomson/South-Western

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

Discrete-time Event History Analysis PRACTICAL EXERCISES

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education

EC327: Financial Econometrics, Spring Limited dependent variables and sample selection

3. Multinomial response models

ADOPTION OF PURDUE IMPROVED COWPEA STORAGE (PICS) BAG IN NORTHERN NIGERIA

COMPLEMENTARITY ANALYSIS IN MULTINOMIAL

9. Logit and Probit Models For Dichotomous Data

Example 7.1: Hourly Wage Equation Average wage for women

dealing with aging parents

Postestimation commands predict Remarks and examples References Also see

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Problem Set 6 ANSWERS

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

Modeling wages of females in the UK

Allison notes there are two conditions for using fixed effects methods.

U.S. Job Flows and the China Shock

Quant Econ Pset 2: Logit

STATA log file for Time-Varying Covariates (TVC) Duration Model Estimations.

NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Model fit assessment via marginal model plots

Retirement Plans and Prospects for Retirement Income Adequacy

Multinomial Choice (Basic Models)

Morten Frydenberg Wednesday, 12 May 2004

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

The following files (all appended below) should be run in LISSY, in the order provided:

Employer-Provided Health Insurance and Labor Supply of Married Women

Abadie s Semiparametric Difference-in-Difference Estimator

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

May 9, Please put ONLY your ID number on the blue books. Three (3) points will be deducted for each time your name appears in a blue book.

Chapter 6 Part 6. Confidence Intervals chi square distribution binomial distribution

Local Maxima in the Estimation of the ZINB and Sample Selection models

Effect of Education on Wage Earning

Don t worry one bit about multicollinearity, because at the end of the day, you're going to be working with a favorite coefficient model.

The Effects of Income Support Settings on Incentives to Work. Nicolas Hérault, Guyonne Kalb and Justin van de Ven

Econometric Computing Issues with Logit Regression Models: The Case of Observation-Specific and Group Dummy Variables

Analysis of Microdata

ONLINE APPENDIX: INTERNAL SOCIAL CAPITAL AND THE ATTRACTION OF EARLY CONTRIBUTIONS IN CROWDFUNDING

Jet Fuel-Heating Oil Futures Cross Hedging -Classroom Applications Using Bloomberg Terminal

Comparing Odds Ratios and Marginal Effects from Logistic Regression and Linear Probability Models

Generalized Linear Models

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

Percentage of foreclosures in the area is the ratio between the monthly foreclosures and the number of outstanding home-related loans in the Zip code

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

Applied Econometrics for Health Economists

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian. Binary Logit

econstor Make Your Publications Visible.

Internet Appendix. The survey data relies on a sample of Italian clients of a large Italian bank. The survey,

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

EGR 102 Introduction to Engineering Modeling. Lab 09B Recap Regression Analysis & Structured Programming

Final Exam, section 1. Tuesday, December hour, 30 minutes

ONLINE APPENDIX (NOT FOR PUBLICATION) Appendix A: Appendix Figures and Tables

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Chapter 6 Part 3 October 21, Bootstrapping

Does Capitalism Flow to Poor Countries?

u panel_lecture . sum

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

Web Appendix Figure 1. Operational Steps of Experiment

U.S. Women s Labor Force Participation Rates, Children and Change:

1) The Effect of Recent Tax Changes on Taxable Income

Spreadsheet Directions

Panel Data with Binary Dependent Variables

Description Remarks and examples References Also see

Transcription:

WWS 508b Precept 10 John Palmer April 27, 2010

Example: married women s labor force participation The MROZ.dta data set has information on labor force participation and other characteristics of married women in 1975. inlf = 1 if respondent reported working for a wage outside home at some point during the year (1975); zero otherwise. nwifeinc = family income excluding respondent s income (in thousands of dollars). city = 1 if respondent lived in standard metropolitan statistical area; zero otherwise. educ = respondent s education (in years). age = respondent s age. kidslt6 = number of kids less than 6 years old.

One dependent variable How to regress inlf on city in Stata? LPM: regress inlf city, r Logit model: logit inlf city Probit model: probit inlf city

One dependent variable Estimates: ------------------------------------------------------------ (1) (2) (3) LPM logit probit ------------------------------------------------------------ city -0.00638-0.0260-0.0162 (0.0377) (0.154) (0.0959) _cons 0.572*** 0.292* 0.183* (0.0302) (0.123) (0.0769) ------------------------------------------------------------ N 753 753 753 ------------------------------------------------------------ Standard errors in parentheses * p<0.05, ** p<0.01, *** p<0.001

One dependent variable What is the probability of being in the workforce predicted by each model for city-dwellers? Non-city-dwellers? LPM: Pr{inlf = 1 city} = ˆβ 0 + ˆβ 1 city Logit model: Pr{inlf = 1 city} = e ˆβ 0 + ˆβ 1 city 1 + e ˆβ 0 + ˆβ 1 city Probit model: Pr{inlf = 1 city} = Φ( ˆβ 0 + ˆβ 1 city)

One dependent variable So for city-dwellers, we have: LPM: Pr{inlf = 1 city=1} = 0.572 0.00638(1) = 0.5661157 Logit model: Pr{inlf = 1 city=1} = e0.292 0.0260(1) = 0.5661157 1 + e0.292 0.0260(1) Probit model: Pr{inlf = 1 city=1} = Φ(0.183 0.0162(1)) = 0.5661157

One dependent variable To do these calculations in Stata: regress inlf city disp "LPM: Pr{inlf=1 city=1}=" _b[_cons] + _b[city] logit inlf city disp "Logit: Pr{inlf=1 city=1}=" exp(_b[_cons] + _b[city])/(1+exp(_b[_cons] + _b[city])) probit inlf city disp "Probit: Pr{inlf=1 city=1}=" normal(_b[_cons] + _b[city]) (Note that the stuff I place in quotation marks in these commands is optional it s just so that I can keep track of what is being displayed.)

One dependent variable Why are all three results the same? Because we have only one independent variable and it s dichotomous. Note that we could get the same result simply with a two-by-two table:. tab inlf city, col +-------------------+ Key ------------------- frequency column percentage +-------------------+ =1 if in lab frce, =1 if live in SMSA 1975 0 1 Total -----------+----------------------+---------- 0 115 210 325 42.75 43.39 43.16 -----------+----------------------+---------- 1 154 274 428 57.25 56.61 56.84 -----------+----------------------+---------- Total 269 484 753 100.00 100.00 100.00

adding more variables Now try this: regress inlf city nwifeinc educ age kidslt6 estimates store LPM logit inlf city nwifeinc educ age kidslt6 estimates store logit probit inlf city nwifeinc educ age kidslt6 estimates store probit esttab LPM logit probit, se mtitles

. esttab LPM logit probit, se mtitles ------------------------------------------------------------ (1) (2) (3) LPM logit probit ------------------------------------------------------------ city -0.00325 0.0131 0.0101 (0.0364) (0.175) (0.106) nwifeinc -0.00694*** -0.0356*** -0.0214*** (0.00154) (0.00804) (0.00466) educ 0.0534*** 0.262*** 0.158*** (0.00779) (0.0407) (0.0239) age -0.0108*** -0.0522*** -0.0315*** (0.00233) (0.0114) (0.00685) kidslt6-0.294*** -1.458*** -0.881*** (0.0356) (0.195) (0.114) _cons 0.583*** 0.354 0.216 (0.143) (0.691) (0.417) ------------------------------------------------------------ N 753 753 753 ------------------------------------------------------------ Standard errors in parentheses * p<0.05, ** p<0.01, *** p<0.001

What is the partial effect of age in the LPM?

What is the partial effect of age in the LPM? Each additional year of age is associated with approximately a 1 percentage point decrease in the probability of participating in the labor force.

What is the partial effect of age in the LPM? Each additional year of age is associated with approximately a 1 percentage point decrease in the probability of participating in the labor force. What is the partial effect of age in the Logit model?

What is the partial effect of age in the LPM? Each additional year of age is associated with approximately a 1 percentage point decrease in the probability of participating in the labor force. What is the partial effect of age in the Logit model? Each additional year of age decreases the odds of participating in the labor force by 100 (1 e 0.0522 ) 5%.

What is the partial effect of age in the LPM? Each additional year of age is associated with approximately a 1 percentage point decrease in the probability of participating in the labor force. What is the partial effect of age in the Logit model? Each additional year of age decreases the odds of participating in the labor force by 100 (1 e 0.0522 ) 5%. huh?

understanding the Logit interpretation To understand why we can interpret the Logit estimates this way, consider the model in terms of the predicted odds of labor force participation: ln(ôdds) = ˆβ 0 + ˆβ 1 city + ˆβ 2 nwifeinc + ˆβ 3 educ + ˆβ 4 age + ˆβ 5 kidslt6 So that means: ôdds = e ˆβ 0 + ˆβ 1 city+ ˆβ 2 nwifeinc+ ˆβ 3 educ+ ˆβ 4 age+ ˆβ 5 kidslt6 or ôdds = e ˆβ 0 e ˆβ 1 city e ˆβ 2 nwifeinc e ˆβ 3 educ e ˆβ 4 age e ˆβ 5 kidslt6

understanding the Logit interpretation Now compare the ratio of two predicted odds: ôdds 0 with all variables set to any given values, and ôdds 1 with age increased by 1: ôdds 1 = e ˆβ 0 e ˆβ 1 city e ˆβ 2 nwifeinc e ˆβ 3 educ e ˆβ 4 (age+1) e ˆβ 5 kidslt6 ôdds 0 e ˆβ 0e ˆβ1 city e ˆβ 2 nwifeinc e ˆβ 3 educ e ˆβ 4 age e ˆβ 5 kidslt6 Notice that everything cancels out so that we get: ôdds 1 = e ˆβ 4 (age+1) ôdds 0 e ˆβ 4 age = e ˆβ 4 In other words, the odds ratio is equal to e ˆβ 4.

understanding the Logit interpretation How do we get from the odds ratio to talking about a percentage decrease or increase? The odds ratio tells us that when we increase age by 1, we get new predicted odds that are e ˆβ 4 times our initial predicted odds. If ˆβ 4 is negative, e ˆβ 4 is less than one, so we can express the change as a decrease of 100 (1 e ˆβ 4 ) percent. If ˆβ 4 is positive, then e ˆβ 4 is greater than one, so we can express the change as an increase of 100 (e ˆβ 4 1) percent.

------------------------------------------------------------ (1) (2) (3) LPM logit probit ------------------------------------------------------------ city -0.00325 0.0131 0.0101 (0.0364) (0.175) (0.106) nwifeinc -0.00694*** -0.0356*** -0.0214*** (0.00154) (0.00804) (0.00466) educ 0.0534*** 0.262*** 0.158*** (0.00779) (0.0407) (0.0239) age -0.0108*** -0.0522*** -0.0315*** (0.00233) (0.0114) (0.00685) kidslt6-0.294*** -1.458*** -0.881*** (0.0356) (0.195) (0.114) _cons 0.583*** 0.354 0.216 (0.143) (0.691) (0.417) ------------------------------------------------------------ N 753 753 753 ------------------------------------------------------------ Standard errors in parentheses * p<0.05, ** p<0.01, *** p<0.001

What is the partial effect of age in the LPM? Each additional year of age is associated with approximately a 1 percentage point decrease in the probability of participating in the labor force. What is the partial effect of age in the Logit model? Each additional year of age decreases the odds of participating in the labor force by 100 (1 e 0.0522 ) 5%.

What is the partial effect of age in the LPM? Each additional year of age is associated with approximately a 1 percentage point decrease in the probability of participating in the labor force. What is the partial effect of age in the Logit model? Each additional year of age decreases the odds of participating in the labor force by 100 (1 e 0.0522 ) 5%. What is the partial effect of education in the Logit model?

What is the partial effect of age in the LPM? Each additional year of age is associated with approximately a 1 percentage point decrease in the probability of participating in the labor force. What is the partial effect of age in the Logit model? Each additional year of age decreases the odds of participating in the labor force by 100 (1 e 0.0522 ) 5%. What is the partial effect of education in the Logit model? Each additional year of education increases the odds of participating in the labor force by 100 (e 0.262 1) 30%.

Can we interpret the Probit model in terms of odds? Not easily. How about if we want to interpret the Logit or Probit models in terms of the partial effect on probability? Now we need to specify the values of all the variables at which we are interested in the effect. One simple approach is to set all variables to their average values in the sample.

In Stata, to calculate partial effects for each variable with all variables set to the average, use the following command after running the regression: mfx Marginal effects after logit y = Pr(inlf) (predict) =.57420164 ------------------------------------------------------------------------------ variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- city*.0032034.0427 0.08 0.940 -.080481.086888.642762 nwifeinc -.0087062.00197-4.42 0.000 -.012564 -.004848 20.129 educ.0640876.00995 6.44 0.000.044591.083584 12.2869 age -.0127634.00279-4.58 0.000 -.01823 -.007297 42.5378 kidslt6 -.3565846.04782-7.46 0.000 -.450309 -.26286.237716 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1

But often we would prefer to know the partial effects for specific values of certain variables. For instance, in our example, the partial effect with city set to its average isn t particularly useful.

We do this by adding additional the at option. Here will will specify that all partial effects are to be evaluated with city set to 1 and age set to 34. All other variables stay set to their average values. So we are looking at partial effects for 34-year old city-dwellers with average family income, education and number of kids under 6: mfx, at(city=1 age=34) warning: no value assigned in at() for variables nwifeinc educ kidslt6; means used for nwifeinc educ kidslt6 Marginal effects after logit y = Pr(inlf) (predict) =.67904757 ------------------------------------------------------------------------------ variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- city*.0028614.03815 0.08 0.940 -.071903.077626 1 nwifeinc -.0077607.00177-4.38 0.000 -.011234 -.004288 20.129 educ.0571276.00945 6.05 0.000.038614.075641 12.2869 age -.0113773.00209-5.44 0.000 -.015479 -.007276 34 kidslt6 -.3178594.04039-7.87 0.000 -.397024 -.238694.237716 ------------------------------------------------------------------------------ (*) dy/dx is for discrete change of dummy variable from 0 to 1

Another useful approach is to calculate the average partial effects meaning the average of the partial effects predicted at all values within the sample. The questions in the problem set asking you to do this are optional. However, if you are curious, I have included in the.do file posted with these slides an example of how to do it in Stata, drawing on Wooldridge s equations 17.15 and 17.17.

Tobit basics To fit a Tobit model in Stata use same basic syntax but add a comma and specify the lower limit (ll) or upper limit (ul) of the data i.e., where it is censored. For example: tobit hours educ age, ll(0) To test joint significance or linear hypotheses: test educ age test educ + age = 0 To calculate the average partial effect scale factor: gen effect = normal((_b[_cons] + _b[educ]*educ + _b[age]*age)/_b[/sigma]) mean(effect) scalar APEscalar = _b[effect]

Tobit basics To obtain estimates of E(hours x): tobit hours educ age, ll(0) gen hourshat = normal((_b[_cons] + _b[educ]*educ + _b[age]*age)/_b[/sigma])*(_b[_cons] /// + _b[educ]*educ + _b[age]*age) + _b[/sigma]*normalden((_b[_cons] + _b[educ]*educ + /// _b[age]*age)/_b[/sigma]) sum hourshat