tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Similar documents
[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

Final Exam - section 1. Thursday, December hours, 30 minutes

EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit

You created this PDF from an application that is not licensed to print to novapdf printer (

Module 4 Bivariate Regressions

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

Limited Dependent Variables

Advanced Econometrics

u panel_lecture . sum

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

Assignment #5 Solutions: Chapter 14 Q1.

Quantitative Techniques Term 2

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Model fit assessment via marginal model plots

1) The Effect of Recent Tax Changes on Taxable Income

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Time series data: Part 2

Solutions for Session 5: Linear Models

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Heteroskedasticity. . reg wage black exper educ married tenure

Handout seminar 6, ECON4150

The relationship between GDP, labor force and health expenditure in European countries

Logistic Regression Analysis

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

F^3: F tests, Functional Forms and Favorite Coefficient Models

Introduction to fractional outcome regression models using the fracreg and betareg commands

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

Chapter 6 Part 3 October 21, Bootstrapping

Problem Set 9 Heteroskedasticty Answers

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

Problem Set 6 ANSWERS

Econometrics is. The estimation of relationships suggested by economic theory

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Labor Market Returns to Two- and Four- Year Colleges. Paper by Kane and Rouse Replicated by Andreas Kraft

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

The Predictive Power of Financial Blogs

The SAS System 11:03 Monday, November 11,

WWS 508b Precept 10. John Palmer. April 27, 2010

Postestimation commands predict Remarks and examples References Also see

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

ECON Introductory Econometrics Seminar 2, 2015

Modeling wages of females in the UK

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Advanced Industrial Organization I Identi cation of Demand Functions

The Multivariate Regression Model

An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program. on the United Methodist Church in Texas

Testing the Solow Growth Theory

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Nonlinear Econometric Analysis (ECO 722) Answers to Homework 4

Two-stage least squares examples. Angrist: Vietnam Draft Lottery Men, Cohorts. Vietnam era service

Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541

Chapter 6 Part 6. Confidence Intervals chi square distribution binomial distribution

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Catherine De Vries, Spyros Kosmidis & Andreas Murr

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

The impact of cigarette excise taxes on beer consumption

STATA log file for Time-Varying Covariates (TVC) Duration Model Estimations.

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

Determinants of FII Inflows:India

A COMPARATIVE ANALYSIS OF REAL AND PREDICTED INFLATION CONVERGENCE IN CEE COUNTRIES DURING THE ECONOMIC CRISIS

Econometric Methods for Valuation Analysis

Prof. Dr. Ben Jann. University of Bern, Institute of Sociology, Fabrikstrasse 8, CH-3012 Bern

An Introduction to Event History Analysis

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian. Binary Logit

Intro to GLM Day 2: GLM and Maximum Likelihood

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

West Coast Stata Users Group Meeting, October 25, 2007

STATA Program for OLS cps87_or.do

Day 3C Simulation: Maximum Simulated Likelihood

Effect of Education on Wage Earning

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

Allison notes there are two conditions for using fixed effects methods.

Lecture 10: Alternatives to OLS with limited dependent variables, part 1. PEA vs APE Logit/Probit

Impact of Stock Market, Trade and Bank on Economic Growth for Latin American Countries: An Econometrics Approach

Duration Models: Parametric Models

Duration Models: Modeling Strategies

Economics 742 Brief Answers, Homework #2

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

Cross-country comparison using the ECHP Descriptive statistics and Simple Models. Cheti Nicoletti Institute for Social and Economic Research

Testing Capital Asset Pricing Model on KSE Stocks Salman Ahmed Shaikh

Effects of increasing foreign shareholding on competition in telecommunication industry

ARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS

Example 8.1: Log Wage Equation with Heteroscedasticity-Robust Standard Errors

Homework 0 Key (not to be handed in) due? Jan. 10

Rationale. Learning about return and risk from the historical record and beta estimation. T Bills and Inflation

Stat 328, Summer 2005

11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression

Technical Documentation for Household Demographics Projection

. ********** OUTPUT FILE: CARD & KRUEGER (1994)***********.. * STATA 10.0 CODE. * copyright C 2008 by Tito Boeri & Jan van Ours. * "THE ECONOMICS OF

South African Dataset for MAMS

Applied Econometrics. Lectures 13 & 14: Nonlinear Models Beyond Binary Choice: Multinomial Response Models, Corner Solution Models &

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 23

Transcription:

PS 4 Monday August 16 01:00:42 2010 Page 1 tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} log: C:\web\PS4log.smcl log type: smcl opened on: 16 Aug 2010, 00:59:55 1. do "C:\web\PS4dofile.txt" 2. insheet using "C:\web\PS4.txt" (19 vars, 1636 obs) 3. *2 -- Estimate Linear Probability Model of diabetes on bmi and income* 4. regress diabetes bmi income, robust F( 2, 1364) = 15.19 Prob > F = 0.0000 R-squared = 0.0335 Root MSE =.23504 diabetes Coef. Std. Err. t P> t [95% Conf. Interval] bmi.0062544.0015 4.17 0.000.0033119.0091969 income -.0124606.0034855-3.58 0.000 -.0192981 -.0056231 _cons -.0393018.043229-0.91 0.363 -.1241044.0455008 5. *3 -- Do same thing using bmierr which is bmi + a mean 0 variance 100 random > error* 6. regress diabetes bmierr income, robust F( 2, 1364) = 7.60 Prob > F = 0.0005 R-squared = 0.0141 Root MSE =.23738 diabetes Coef. Std. Err. t P> t [95% Conf. Interval] bmierr.0003839.0005335 0.72 0.472 -.0006627.0014304 income -.0135402.0035193-3.85 0.000 -.0204441 -.0066363 _cons.1227672.0255838 4.80 0.000.0725793.1729551 7. *the coefficient here is much closer to zero/ smaller in magnitude* 8. *measurement error in x variables leads to attenuation bias i.e., bias toward > zero* 9. *while measurement error in y simply adds to the noise of the model* 10. *measurement error in x leads to a bias toward zero* 11. *see Wooldridge pp. 318-320 for a formal presentation* 12. *but one intuitive way to think about it is that* 13. *as the noise in the x variable gets big relative to the signal*

PS 4 Monday August 16 01:00:42 2010 Page 2 14. *it's like regressing y on an error term* 15. *which is going to lead to an estimated effect that approaches zero* 16. ****************************** 17. *what would happen if you used bmi measured with error but where the error is > lower variance* 18. *well then the signal gets stronger relative to the noise* 19. *so the bias toward zero will be smaller* 20. ****************************** 21. *4 -- redo #2 with logit* 22. *first #2 again* 23. regress diabetes bmi income, robust F( 2, 1364) = 15.19 Prob > F = 0.0000 R-squared = 0.0335 Root MSE =.23504 diabetes Coef. Std. Err. t P> t [95% Conf. Interval] bmi.0062544.0015 4.17 0.000.0033119.0091969 income -.0124606.0034855-3.58 0.000 -.0192981 -.0056231 _cons -.0393018.043229-0.91 0.363 -.1241044.0455008 24. *now with logit* 25. logit diabetes bmi income, robust Iteration 1: log pseudolikelihood = -296.66849 Iteration 2: log pseudolikelihood = -292.7488 Iteration 3: log pseudolikelihood = -292.7108 Iteration 4: log pseudolikelihood = -292.7108 Logistic regression Number of obs = 1367 Wald chi2( 2) = 44.96 Log pseudolikelihood = -292.7108 Pseudo R2 = 0.0647 diabetes Coef. Std. Err. z P> z [95% Conf. Interval] bmi.0868625.0168344 5.16 0.000.0538678.1198572 income -.2083366.054089-3.85 0.000 -.3143492 -.102324 _cons -4.142655.5736033-7.22 0.000-5.266897-3.018413 26. *remember that we can only compare the sign and significance across* 27. *the models; if we want to compare magnitude size we need to estimate the log > it* 28. *at a certain x vector, namely the mean* 29. *to do that for the logit command, we type mfx*

PS 4 Monday August 16 01:00:42 2010 Page 3 30. mfx Marginal effects after logit y = Pr(diabetes) (predict) =.0501379 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X bmi.0041367.00081 5.08 0.000.00254.005734 26.6061 income -.0099218.00247-4.01 0.000 -.014766 -.005078 5.32772 31. *we could have calculated the effects at any values of the X's that we wanted > * 32. *now the probit* 33. probit diabetes bmi income, robust Iteration 1: log pseudolikelihood = -292.58751 Iteration 2: log pseudolikelihood = -292.10174 Iteration 3: log pseudolikelihood = -292.10142 Probit regression Number of obs = 1367 Wald chi2( 2) = 43.16 Log pseudolikelihood = -292.10142 Pseudo R2 = 0.0666 diabetes Coef. Std. Err. z P> z [95% Conf. Interval] bmi.0452375.0089391 5.06 0.000.0277172.0627578 income -.1023977.0264308-3.87 0.000 -.1542011 -.0505944 _cons -2.296131.2935368-7.82 0.000-2.871453-1.72081 34. *marrginal effects from the probit can be calculated again using the mfx comm > and* 35. *or, dprobit does it directly (if we didn't really care about the underlying* 36. *model parameters, which is usually the case* 37. dprobit diabetes bmi income, robust Iteration 1: log pseudolikelihood = -292.58751 Iteration 2: log pseudolikelihood = -292.10174 Iteration 3: log pseudolikelihood = -292.10142 Probit regression, reporting marginal effects Number of obs = 1367 Wald chi2( 2) = 43.16 Log pseudolikelihood = -292.10142 Pseudo R2 = 0.0666 diabetes df/dx Std. Err. z P> z x-bar [ 95% C.I. ] bmi.0047177.0009442 5.06 0.000 26.6061.002867.006568 income -.0106789.0027017-3.87 0.000 5.32772 -.015974 -.005384 obs. P.0607169 pred. P.0507021 (at x-bar) z and P> z correspond to the test of the underlying coefficient being 0

PS 4 Monday August 16 01:00:42 2010 Page 4 38. *as you can see, all three models are producing comparable effect sizes* 39. *graph predictions from linear probability model, logit, probit* 40. regress diabetes bmi income, robust F( 2, 1364) = 15.19 Prob > F = 0.0000 R-squared = 0.0335 Root MSE =.23504 diabetes Coef. Std. Err. t P> t [95% Conf. Interval] bmi.0062544.0015 4.17 0.000.0033119.0091969 income -.0124606.0034855-3.58 0.000 -.0192981 -.0056231 _cons -.0393018.043229-0.91 0.363 -.1241044.0455008 41. predict LPM (option xb assumed; fitted values) (268 missing values generated) 42. logit diabetes bmi income, robust Iteration 1: log pseudolikelihood = -296.66849 Iteration 2: log pseudolikelihood = -292.7488 Iteration 3: log pseudolikelihood = -292.7108 Iteration 4: log pseudolikelihood = -292.7108 Logistic regression Number of obs = 1367 Wald chi2( 2) = 44.96 Log pseudolikelihood = -292.7108 Pseudo R2 = 0.0647 diabetes Coef. Std. Err. z P> z [95% Conf. Interval] bmi.0868625.0168344 5.16 0.000.0538678.1198572 income -.2083366.054089-3.85 0.000 -.3143492 -.102324 _cons -4.142655.5736033-7.22 0.000-5.266897-3.018413 43. predict logit (option pr assumed; Pr(diabetes)) (268 missing values generated) 44. probit diabetes bmi income, robust Iteration 1: log pseudolikelihood = -292.58751 Iteration 2: log pseudolikelihood = -292.10174 Iteration 3: log pseudolikelihood = -292.10142 Probit regression Number of obs = 1367 Wald chi2( 2) = 43.16 Log pseudolikelihood = -292.10142 Pseudo R2 = 0.0666

PS 4 Monday August 16 01:00:42 2010 Page 5 diabetes Coef. Std. Err. z P> z [95% Conf. Interval] bmi.0452375.0089391 5.06 0.000.0277172.0627578 income -.1023977.0264308-3.87 0.000 -.1542011 -.0505944 _cons -2.296131.2935368-7.82 0.000-2.871453-1.72081 45. predict probit (option pr assumed; Pr(diabetes)) (268 missing values generated) 46. twoway (scatter LPM bmi, sort) 47. twoway (scatter logit bmi, sort) 48. twoway (scatter probit bmi, sort) 49. *regress bmi private which top codes bmi above 35 as 35 on income and educati > on* 50. *we need a censored regression model for a y variable like this* 51. *cnreg will work; check the help to see how it works* 52. *we need to create a variable to tell the computer whether an observation is > censored* 53. *and we need to tell it whether it's censored above or below* 54. *Stata uses 0 for uncensored, 1 for censored above, and -1 for censored below > * 55. generate cens =0 56. replace cens =1 if bmi > 35 (177 real changes made) 57. cnreg bmipriv income educa, robust cens(cens) Censored-normal regression Number of obs = 1414 F( 2, 1412) = 5.86 Prob > F = 0.0029 Log pseudolikelihood = -4074.435 Pseudo R2 = 0.0015 bmipriv Coef. Std. Err. t P> t [95% Conf. Interval] income -.0684913.0791855-0.86 0.387 -.2238252.0868427 educa -.3909632.1502624-2.60 0.009 -.6857247 -.0962017 _cons 29.05579.6871785 42.28 0.000 27.70779 30.40379 /sigma 5.180393.1066678 4.971149 5.389638 Observation summary: 0 left-censored observations 1270 uncensored observations 144 right-censored observations 58. regress bmi income educa, robust F( 2, 1364) = 5.49 Prob > F = 0.0042 R-squared = 0.0078 Root MSE = 5.3431

PS 4 Monday August 16 01:00:42 2010 Page 6 bmi Coef. Std. Err. t P> t [95% Conf. Interval] income -.1000753.0831068-1.20 0.229 -.2631063.0629556 educa -.3308629.1473205-2.25 0.025 -.6198622 -.0418636 _cons 28.73271.6766832 42.46 0.000 27.40526 30.06016 59. *intuition -- well, since you're not using all of the information in a subset > of the data* 60. *that might have an effect* 61. *the censored regression is estimating a smaller in magnitude effect of incom > e* 62. *and a bigger in magnitude effect of educa, but the differential is* 63. *proportionately smaller for education* 64. *so one guess would be that while the > 35 bmi people are systematically lowe > r* 65. *in income than the rest* 66. *they are not too much different than everyone else in education terms* 67. *at least conditional on income* 68. *so let's check that intuition* 69. regress cens income educa Source SS df MS Number of obs = 1414 F( 2, 1411) = 4.56 Model.830980721 2.41549036 Prob > F = 0.0106 Residual 128.504239 1411.091073167 R-squared = 0.0064 Adj R-squared = 0.0050 Total 129.335219 1413.091532356 Root MSE =.30178 cens Coef. Std. Err. t P> t [95% Conf. Interval] income -.0110143.0043164-2.55 0.011 -.0194815 -.0025472 educa -.0028227.008598-0.33 0.743 -.019689.0140436 _cons.1741926.0384623 4.53 0.000.0987431.249642 70. *our intuition is validated* 71. *7 -- run truncated regression model* 72. truncreg bminormal income educa, ll(18.5) ul(25) robust (note: 8 obs. truncated) Fitting full model: Iteration 0: log pseudolikelihood = -1013.2253 Iteration 1: log pseudolikelihood = -986.92835 Iteration 2: log pseudolikelihood = -986.74233 Iteration 3: log pseudolikelihood = -986.54979 Iteration 4: log pseudolikelihood = -986.54833 Iteration 5: log pseudolikelihood = -986.54833 Truncated regression Limit: lower = 18.5 Number of obs = 541 upper = 25 Wald chi2( 2) = 1.82 Log pseudolikelihood = -986.54833 Prob > chi2 = 0.4027

PS 4 Monday August 16 01:00:42 2010 Page 7 bminormal Coef. Std. Err. z P> z [95% Conf. Interval] income -.092955.1153835-0.81 0.420 -.3191025.1331924 educa.3115204.2322346 1.34 0.180 -.143651.7666919 _cons 21.92178.9864519 22.22 0.000 19.98837 23.85519 /sigma 2.789208.3340342 8.35 0.000 2.134513 3.443903 73. regress bmi income educa Source SS df MS Number of obs = 1367 F( 2, 1364) = 5.33 Model 304.479098 2 152.239549 Prob > F = 0.0049 Residual 38939.7992 1364 28.5482398 R-squared = 0.0078 Adj R-squared = 0.0063 Total 39244.2783 1366 28.7293399 Root MSE = 5.3431 bmi Coef. Std. Err. t P> t [95% Conf. Interval] income -.1000753.0779809-1.28 0.200 -.2530508.0529001 educa -.3308629.1551339-2.13 0.033 -.6351898 -.026536 _cons 28.73271.6945347 41.37 0.000 27.37024 30.09518 74. regress bmi income educa, robust F( 2, 1364) = 5.49 Prob > F = 0.0042 R-squared = 0.0078 Root MSE = 5.3431 bmi Coef. Std. Err. t P> t [95% Conf. Interval] income -.1000753.0831068-1.20 0.229 -.2631063.0629556 educa -.3308629.1473205-2.25 0.025 -.6198622 -.0418636 _cons 28.73271.6766832 42.46 0.000 27.40526 30.06016 75. *intuitively, we're getting comparable results for income, which makes sense > since* 76. *while we're cutting off the tails, bmi is basically symmetric with respect t > o income* 77. *while the income effect is twice as big in the upper end of bmi* 78. *it's close to zero in the lower end, so the differences cancel out* 79. *when we chop off the tails* 80. *but because we have less variation in X* 81. *it makes sense that our SEs blow up* 82. *education, however, is a different story*

PS 4 Monday August 16 01:00:42 2010 Page 8 83. *while education is negatively related to bmi, conditional on income* 84. *on average throughout the sample* 85. *in the truncated regression, it comes in with a positive sign* 86. *if you look at the relationship between educatiion and bmi, conditional* 87. *on income in the "normal" range, it is actually positive* 88. *so the negative effect is driven by a larger in magnitude* 89. *negative effect in the tails* 90. *the truncated regression doesn't use the info from the tails* 91. 92. 93. end of do-file