Applied Econometrics for Health Economists

Similar documents
List of figures. I General information 1

Subject index. predictor. C clogit option, or

Quantile Regression in Survival Analysis

Final Exam - section 1. Thursday, December hours, 30 minutes

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Name: 1. Use the data from the following table to answer the questions that follow: (10 points)

Subject index. A abbreviating commands...19 ado-files...9, 446 ado uninstall command...9

Obesity, Disability, and Movement onto the DI Rolls

Discrete Choice Modeling

DYNAMICS OF URBAN INFORMAL

Analysis of Microdata

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Effects of working part-time and full-time on physical and mental health in old age in Europe

Unemployment Duration in the United Kingdom. An Incomplete Data Analysis. Ralf A. Wilke University of Nottingham

Married to Your Health Insurance: The Relationship between Marriage, Divorce and Health Insurance.

Ministry of Health, Labour and Welfare Statistics and Information Department

Allison notes there are two conditions for using fixed effects methods.

STA 4504/5503 Sample questions for exam True-False questions.

Lecture 10: Alternatives to OLS with limited dependent variables, part 1. PEA vs APE Logit/Probit

Post-Estimation Techniques in Statistical Analysis: Introduction to Clarify and S-Post in Stata

Models of Multinomial Qualitative Response

ANALYSIS OF DISCRETE DATA STATA CODES. Standard errors/robust: vce(vcetype): vcetype may be, for example, robust, cluster clustvar or bootstrap.

WesVar uses repeated replication variance estimation methods exclusively and as a result does not offer the Taylor Series Linearization approach.

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

NBER WORKING PAPER SERIES MAKING SENSE OF THE LABOR MARKET HEIGHT PREMIUM: EVIDENCE FROM THE BRITISH HOUSEHOLD PANEL SURVEY

MANAGEMENT SCIENCE doi /mnsc ec

NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

Logistic Regression Analysis

In Debt and Approaching Retirement: Claim Social Security or Work Longer?

Quant Econ Pset 2: Logit

Quantitative Techniques Term 2

The Impact of a $15 Minimum Wage on Hunger in America

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Regression with a binary dependent variable: Logistic regression diagnostic

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Estimating Heterogeneous Choice Models with Stata

CHAPTER 11 Regression with a Binary Dependent Variable. Kazu Matsuda IBEC PHBU 430 Econometrics

Analyzing the Determinants of Project Success: A Probit Regression Approach

West Coast Stata Users Group Meeting, October 25, 2007

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

COMMUNITY ADVANTAGE PANEL SURVEY: DATA COLLECTION UPDATE AND ANALYSIS OF PANEL ATTRITION

Cross- Country Effects of Inflation on National Savings

Module 4 Bivariate Regressions

Modeling Panel Data: Choosing the Correct Strategy. Roberto G. Gutierrez

Odd cases and risky cohorts: Measures of risk and association in observational studies

Survey Sampling, Fall, 2006, Columbia University Homework assignments (2 Sept 2006)

CHAPTER 4 ESTIMATES OF RETIREMENT, SOCIAL SECURITY BENEFIT TAKE-UP, AND EARNINGS AFTER AGE 50

Stat3011: Solution of Midterm Exam One

Cross-country comparison using the ECHP Descriptive statistics and Simple Models. Cheti Nicoletti Institute for Social and Economic Research

Marital Disruption and the Risk of Loosing Health Insurance Coverage. Extended Abstract. James B. Kirby. Agency for Healthcare Research and Quality

Core methodology I: Sector analysis of MDG determinants

The Family Gap phenomenon: does having children impact on parents labour market outcomes?

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

SUPPLEMENTARY ONLINE APPENDIX FOR: TECHNOLOGY AND COLLECTIVE ACTION: THE EFFECT OF CELL PHONE COVERAGE ON POLITICAL VIOLENCE IN AFRICA

Public Economics. Contact Information

Happy Voters. Exploring the Intersections between Economics and Psychology. Federica Liberini 1, Eugenio Proto 2 Michela Redoano 2.

Correcting for Survival Effects in Cross Section Wage Equations Using NBA Data

The Effects of Increasing the Early Retirement Age on Social Security Claims and Job Exits

COMPLEMENTARITY ANALYSIS IN MULTINOMIAL

The Effects of a Conditional Transfer Program on the Labor Market: The Human Development Bonus in Ecuador

An Introduction to Event History Analysis

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Dan Breznitz Munk School of Global Affairs, University of Toronto, 1 Devonshire Place, Toronto, Ontario M5S 3K7 CANADA

Labor supply responses to health shocks in Senegal

Bargaining with Grandma: The Impact of the South African Pension on Household Decision Making

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Investment Platforms Market Study Interim Report: Annex 7 Fund Discounts and Promotions

Friendship at Work: Can Peer Effects Catalyze Female Entrepreneurship? Erica Field, Seema Jayachandran, Rohini Pande, and Natalia Rigol

How exogenous is exogenous income? A longitudinal study of lottery winners in the UK

Subject CS2A Risk Modelling and Survival Analysis Core Principles

Bank Profitability and Risk-Taking in a Low Interest Rate Environment: The Case of Thailand

Supporting Online Material for

Advanced Econometrics

Random Variables CHAPTER 6.3 BINOMIAL AND GEOMETRIC RANDOM VARIABLES

Panel Data with Binary Dependent Variables

Exercise 1. Data from the Journal of Applied Econometrics Archive. This is an unbalanced panel.n = 27326, Group sizes range from 1 to 7, 7293 groups.

The Effect of Financial Constraints, Investment Policy and Product Market Competition on the Value of Cash Holdings

Multinomial Choice (Basic Models)

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Lecture notes on risk management, public policy, and the financial system. Credit portfolios. Allan M. Malz. Columbia University

The relationship between unemployment and health

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Econometric Methods for Valuation Analysis

Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models

The Simple Regression Model

Contents: Appendix 3: Parallel Trends. Appendix

Inequalities in the older population: Evidence from ELSA. James Banks, Carl Emmerson, Alastair Muriel and Gemma Tetlow 18 th November 2008

Summary of Statistical Analysis Tools EDAD 5630

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Home Energy Reporting Program Evaluation Report. June 8, 2015

UPDATED IAA EDUCATION SYLLABUS

Probabilistic Thinking and Early Social Security Claiming

Average Earnings and Long-Term Mortality: Evidence from Administrative Data

DISABILITY AND DEATH PROBABILITY TABLES FOR INSURED WORKERS BORN IN 1995

Econometrics and Economic Data

Why Pay for Paper? An Analysis of the Internet's Effect on Print Newspaper Subscriber Retention

Transcription:

Applied Econometrics for Health Economists Exercise 0 Preliminaries The data file hals1class.dta contains the following variables: age male white aglsch rheuma prheuma ownh breakhot tea teasug coffee age in years dummy for men dummy for ethnic group=white age at which R left school (proxy for education) dummy whether R suffers from arthritis/rheuma dummy whether either of R s parents suffered from arthritis/rheuma self-rated general health how often R eats cooked breakfast how many cups of tea R drinks per day how many spoons of sugar R puts into tea how many cups of coffee R drinks per day Stata syntax needed: use, save 1

Exercise 1 Is rheumatism inherited? 1. Load the data, restrict the sample to respondents aged 40 and over, and compute summary statistics for the first six variables. What percentage suffers from rheuma? 2. Compute a linear probability model (with heteroskedasticity-robust standard errors) for rheuma using age, age squared, sex, ethnic group, education, and parental rheuma as explanatory variables. Interpret your results w.r.t. to all explanatory variables Show the predicted values in a histogram. How do you interpret the predicted values? Which problem of the linear probability model do you see in your histogram? Show the estimated residuals in a histogram. Which OLS assumptions are violated in the linear probability model? 3. Compute the above as a probit model how do you interpret the estimated regression coefficients? 4. Test whether the deterministic part of your model is correctly specified. 5. Compute marginal/average effects at the median of the explanatory variables comment on the size of the average effect for prheuma. 6. Compute the marginal/average effects for aglsch and prheuma as the median of the individual marginal effects is there a difference to the effects computed above? 7. Compute the above as a logit model and report odds ratios how much does the risk of rheumatism change when people get one year older? how much does it change when increase when people get two years older? 8. Assume that prheuma was a genetic marker of rheumatism and you wanted to study its effect Draw a random sample of the general population (aged 40+) of the same size as your sample of respondents who reported to have rheuma Compute a probit model and report marginal effects at the median compare the results with those based on the full HALS sample Compute a logit model and report odds ratios compare the results with those based on the full HALS sample Stata syntax needed: summarize, regress, generate, histogram, predict, probit, fracgen, testparm, mfx, logistic, functions normalden, normal 2

Exercise 2 Who eats a good breakfast? The variable breakhot which indicates how often R have a cooked (English) breakfast (first meal after getting up) 0 never 1 less than once a week 2 once or twice a week 3 most days (3-6) 4 every day 1. Show frequencies of breakhot. The value labels are wrong and there are missing values. Please correct both problems and show the table of frequencies again 2. Estimate a regression model to explain breakfast eating habits by age, age squared, sex, education, and general health which method is appropriate? 3. Compute the full set of marginal effects for education if possible in a forval loop. Where does education have its impact on eating habits? 4. Describe the relationship between age and eating habits when do people have cooked breakfast most often? Stata syntax needed: label define, recode, oprobit, forval 3

Exercise 3 Tea with sugar? The variable tea indicates how many cups of tea R have per day 0 none 1 one or two 2 three or four 3 five or six 4 more than six teasug indicates how many spoons of sugar Rs have in their tea 0 none 1 1 or less 2 over 1, to 2 3 more than 2 Again, note that the value labels are wrong. 1. Using tea and teasug, generate a new variable that has the value 0 if R doesn t drink tea, 1 if R drinks tea without sugar, and 2 if R drinks tea with sugar. Show the distribution of this variable in a table. 2. Estimate a regression model to explain tea drinking habits by age, age squared, sex, education, and whether they have cooked breakfast which method is appropriate? 3. When using non-tea drinkers as baseline category, how do you interpret your regression parameters? 4. Compute the full set of marginal/average effects for all variables again in a forval loop. What happens if you add up marginal effects for a variable across outcomes? How do you interpret the marginal/average effects? 5. In your own words, describe the meaning of the IIA assumption implied by the multinomial logit model for the present application. How do you test this assumption? Explain the logic of the test. Does the IIA assumption hold in the present example? Stata syntax needed: mlogit, hausman, suest, estimates store, test 4

Exercise 4 Tea or coffee? The variable coffee indicates how many cups of coffee R have per day (0=none, 1=one or two, 2=three or four, 3=five or six, 4=more than six). Value labels are again wrong. 1. Generate dummy variables for tea and coffee drinkers and show a crosstabulation of the two variables. Based on this table, would you say that tea and coffee are complements or substitutes? 2. Estimate a regression model explaining the consumption of tea or coffee jointly (using the same explantory variables as before). How does education affect preference for tea and coffee? How do you interpret the parameter rho? 3. Compute marginal effects of education and answer the following questions. How does education affect the probability of drinking...... tea?... neither tea nor coffee?... tea but no coffee?... tea when R drinks coffee? Stata syntax needed: biprobit 5

Exercise 5 Education and Health Continue using hals1class.dta 1. You want to estimate the causal effect of education on health. What are the problems with estimating a simple regression of health on education and how can you solve them? 2. Recode aglsch so that it becomes a binary treatment (low education = up to 14 years, high education = 15 or more years). Show a simple cross-tabulation of self-assessed health and education. What is the average effect of education on health? Confirm your guess with a simple probit regression of health on education. 3. Show a scatterplot of the proportion of respondents with high education by age. Which identification strategy does this graph suggest? 4. Compare the results of a simple probit regression of self-assessed health on age, age squared, sex, ethnicity and education with the results of a recursive bivariate probit model. What is the average treatment effect? What is the average treatment effect on the treated? How do you interpret the parameter ρ? Stata syntax needed: tabulate, biprobit, egen, twoway scatter 6

Exercise 6 Tea and coffee: harmful to your health? The data file hals2class.dta contains the following variables: age2 age in years in wave 2 of HALS visits number of GP visits in last month sah2 self-rated general health in wave 2 dis number of conditions R ever had sym number of health symptoms in last month 1. Merge the HALS wave 2 data with the wave 1 data. How many respondents have been lost between the first and second wave of HALS? 2. Draw spikeplots of the count variables in you data 3. Compute mean and variance for all count variables. Discuss the results. 4. You want to estimate the effect of tea and coffee consumption in wave 1 on number of symptoms in wave 2 using age (in wave 2), sex, education, and ethnicity as control variables. Choose the most appropriate estimator from the following list: OLS, poisson, negbin, zero-inflated negbin. 5. Report and interpret marginal/average effects for the most appropriate model. Stata syntax needed: use, save, merge, spikeplot, summarize, poisson, nbreg, zinb 7

Exercise 7 Is subjective health a good predictor of mortality? The data file hals3class.dta contains the following variables: serno HALS person identifier hals1age Age in years at HALS wave 1 dead Vital status (1=dead) lifespan Survival until wave 3 in years 1. Merge HALS wave 1, 2 and 3 data. For how many wave 1 respondents is the vital status unknown? Drop all cases with unknown vital status. 2. Define the data set as survival data. 3. Compare Kaplan-Meier survival rates by self-rated health in wave 1 and sex. Which group has the highest life expectancy? 4. Compare hazard rates of dying for men and women. Are hazard rated dependent on duration? If so, how? 5. Only looking at respondents who took part in wave 2: estimate a parametric survival model using sex, education, ethnicity, self-rated health in wave 2, the number of diseases, symptoms, and doctor visits. Interpret the results. Comment on the value of subjective health as a predictor of mortality. 6. Re-estimate the previous model using a semi-parametric Cox model. Why is this model called semi-parametric? Stata syntax needed: merge, stset, sts, streg, stcox 8

Exercise 8 The dynamics of weight and obesity The data file halspanel.dta contains the following variables: serno HALS person identifier wave wave identifier: 0=wave 1, 1=wave 2 age age male dummy for sex weightm measured weight in kg heightm measured height in cm (measured in wave 1) aglsch age left school unempl dummy for unemployment 1. Declare that this is panel data. Which variables are time-invariant? Drop all respondents who were younger than 65 in wave 1 2. For each respondent and wave compute the body mass index (weight in kg/squared height in m). Show the distribution in wave 1 in a histogram. 3. In order to estimate regressions of bmi on age, sex, education, and unemployment, generate an indicator variable for balanced panel data. 4. Estimate a pooled OLS model (for the balanced panel) with clustered standard errors. Comment on the results. 5. Estimate random effects and fixed effects models. Use a Hausman test to determine the correct specification. Comment on the results. 6. Generate an indicator variable for obesity (bmi > 30). How has the prevalence of obesity increased over time? Show a transition matrix for obesity. 7. Re-estimate the previous panel model as a random effects probit and conditional logit model. Why does the conditional logit model have so few observations left? Stata syntax needed: xtset, xtreg, xttest0, hausman, xttrans, xtprobit, clogit 9