Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Similar documents
Problem Set 6 ANSWERS

The Multivariate Regression Model

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

You created this PDF from an application that is not licensed to print to novapdf printer (

Two-stage least squares examples. Angrist: Vietnam Draft Lottery Men, Cohorts. Vietnam era service

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

Final Exam - section 1. Thursday, December hours, 30 minutes

u panel_lecture . sum

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education

Time series data: Part 2

Advanced Econometrics

Labor Market Returns to Two- and Four- Year Colleges. Paper by Kane and Rouse Replicated by Andreas Kraft

STATA Program for OLS cps87_or.do

Problem Set 9 Heteroskedasticty Answers

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

Technical Documentation for Household Demographics Projection

Example 7.1: Hourly Wage Equation Average wage for women

. ********** OUTPUT FILE: CARD & KRUEGER (1994)***********.. * STATA 10.0 CODE. * copyright C 2008 by Tito Boeri & Jan van Ours. * "THE ECONOMICS OF

The relationship between GDP, labor force and health expenditure in European countries

Heteroskedasticity. . reg wage black exper educ married tenure

Handout seminar 6, ECON4150

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

The SAS System 11:03 Monday, November 11,

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

F^3: F tests, Functional Forms and Favorite Coefficient Models

Assignment #5 Solutions: Chapter 14 Q1.

Quantitative Techniques Term 2

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

1) The Effect of Recent Tax Changes on Taxable Income

Econometrics is. The estimation of relationships suggested by economic theory

Effect of Education on Wage Earning

Example 8.1: Log Wage Equation with Heteroscedasticity-Robust Standard Errors

Solutions for Session 5: Linear Models

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

(ii) Give the name of the California website used to find the various insurance plans offered under the Affordable care Act (Obamacare).

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit

Chapter 6 Part 3 October 21, Bootstrapping

The FREQ Procedure. Table of Sex by Gym Sex(Sex) Gym(Gym) No Yes Total Male Female Total

Name: 1. Use the data from the following table to answer the questions that follow: (10 points)

Determinants of FII Inflows:India

Visualisierung von Nicht-Linearität bzw. Heteroskedastizität

The impact of cigarette excise taxes on beer consumption

Don t worry one bit about multicollinearity, because at the end of the day, you're going to be working with a favorite coefficient model.

ECON Introductory Econometrics Seminar 2, 2015

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Modeling wages of females in the UK

Impact of Stock Market, Trade and Bank on Economic Growth for Latin American Countries: An Econometrics Approach

Final Exam, section 1. Thursday, May hour, 30 minutes

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Relation between Income Inequality and Economic Growth

One Sample T-Test With Howell Data, IQ of Students in Vermont

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

Stat 328, Summer 2005

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Impact of Household Income on Poverty Levels

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Does Globalization Improve Quality of Life?

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program. on the United Methodist Church in Texas

An analysis of the relationship between economic development and demographic characteristics in the United States

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Model fit assessment via marginal model plots

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

SAS Simple Linear Regression Example

Advanced Industrial Organization I Identi cation of Demand Functions

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

Testing the Solow Growth Theory

6 Multiple Regression

. tsset year, yearly time variable: year, 1959 to 1994 delta: 1 year. . reg lhous ldpi lrealp

Fall 2004 Social Sciences 7418 University of Wisconsin-Madison Problem Set 5 Answers

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

Description Remarks and examples References Also see

Jet Fuel-Heating Oil Futures Cross Hedging -Classroom Applications Using Bloomberg Terminal

Regression Discontinuity Design

Final Exam, section 1. Tuesday, December hour, 30 minutes

> attach(grocery) > boxplot(sales~discount, ylab="sales",xlab="discount")

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Catherine De Vries, Spyros Kosmidis & Andreas Murr

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator.

ECO671, Spring 2014, Sample Questions for First Exam

Testing Capital Asset Pricing Model on KSE Stocks Salman Ahmed Shaikh

Econometric Methods for Valuation Analysis

Example 1 of econometric analysis: the Market Model

Prof. Dr. Ben Jann. University of Bern, Institute of Sociology, Fabrikstrasse 8, CH-3012 Bern

Final Exam, section 2. Tuesday, December hour, 30 minutes

Morten Frydenberg Wednesday, 12 May 2004

11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression

CHAPTER 7 MULTIPLE REGRESSION

Limited Dependent Variables

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Transcription:

Dummy variables Treatment 22 1 1 Control 3 2 Y Y1 0 1 2 Y X X i identifies treatment 1 1 1 1 1 1 0 0 0 X i =1 if in treatment group X i =0 if in control H o : u n =u u Are wages different across union/nonunion jobs Or alternatively H o : d = u n u u = 0 H o : d 0 3 1

cps.dta. gen ln_weekly_earn=ln(weekly_earn). gen union=union_status==1. gen nonwhite=((race==2) (race==3)). * test whether means are the same across two subsamples. ttest weekly_earn, by(union) Two-sample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [% Conf. Interval] ---------+-------------------------------------------------------------------- 0 0 0.3 2.0 2.32..3 1 1.2 2.001.03 0. 20. ---------+-------------------------------------------------------------------- combined.2 1.0 23.. 1.2 ---------+-------------------------------------------------------------------- diff -3.23 3.33-2.1-2.3 diff = mean(0) - mean(1) t = -.1 Ho: diff = 0 degrees of freedom = Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr( T > t ) = 0.0000 Pr(T > t) = 1.0000 ˆ 3. t. 3. tˆ 1. reject null reg weekly_earn union Source SS df MS Number of obs = -------------+------------------------------ F( 1, ) =.3 Model 3.22 1 3.22 Prob > F = 0.0000 Residual 1.e+0 02.2 R-squared = 0.003 -------------+------------------------------ Adj R-squared = 0.003 Total 1.e+0 1. Root MSE = 23.01 weekly_earn Coef. Std. Err. t P> t [% Conf. Interval] union 3.23 3.33. 0.000 2.3 2.1 _cons 0.3 1.03 21.2 0.000. 3.1. Synthetic problem X impacts Y But there are two groups of people in the population: 1 and 2 Average of X and Y is higher for group 2 than 1 Should you add a dummy for group 2? 2

Plot: X vs. Y Plot: X vs. Y Group 2 Group 2 OLS line Y Group 1 Y Group 1 OLS line 0 1 20 2 30 3 0 X 0 1 20 2 30 3 0 X Plot: X vs. Y Plot: X vs. Y Pooled sample OLS line Y 0 1 20 2 30 3 0 0 1 20 2 30 3 0 X Y X 3

Plot: X vs. Y Sort the data by groups Pooled sample OLS line Group 2 OLS line Y Group 1 OLS line sort group by group: reg y x 0 1 20 2 30 3 0 X Run a regression for each of the separate groups 1 - -> group = 1 Source SS df MS Number of obs = 0 -------------+------------------------------ F( 1, ) =. Model 2.2 1 2.2 Prob > F = 0.0000 Residual 21..22033 R-squared = 0.2 -------------+------------------------------ Adj R-squared = 0.20 Total.0.02 Root MSE =.33 y Coef. Std. Err. t P> t [% Conf. Interval] x.31.00. 0.000.0.3 _cons.0.02 2.22 0.000.303.301 - -> group = 2 Source SS df MS Number of obs = 0 -------------+------------------------------ F( 1, ) =. Model 3.02 1 3.02 Prob > F = 0.0000 Residual 20.02.203 R-squared = 0.1 -------------+------------------------------ Adj R-squared = 0.2 Total.003.00 Root MSE =.. reg y x Source SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 1) =.3 Model 33.3031 1 33.3031 Prob > F = 0.0000 Residual 33.10 1 1.2 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total 1.1 1 3.0233 Root MSE = 1.32 y Coef. Std. Err. t P> t [% Conf. Interval] x.222.03.1 0.000.1.231 _cons.01.333. 0.000 3.33.1 When you ignore the fact that group 2 has higher outcomes And higher x s, this overstates the impact on x y Coef. Std. Err. t P> t [% Conf. Interval] x.2.002. 0.000.02.22 _cons..122 3. 0.000.233.022 1 The coefficients on X in both models are pretty similar 1

, E[ ] ˆ x 1 1 2 1 x 2i 0 1 1i i ˆ 0, ˆ 0 and 0 1 1 2 Generate dummy variable for One of the groups using logical operators gen group2=group==2 reg y x group2 0 Run a regression with x and the β 1 variable 1 1 1 Return to tobacco model Source SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 1) =.1 Model.33 2 33.0 Prob > F = 0.0000 Residual 2.30 1.212 R-squared = 0.0 -------------+------------------------------ Adj R-squared = 0.3 Total 1.1 1 3.0233 Root MSE =. y Coef. Std. Err. t P> t [% Conf. Interval] x.1.00 1.0 0.000.03.32 group2 2.003.033 3.03 0.000 2.23 3.0 _cons.3. 1. 0.000.2323.03 Regress ln(per capita consumption) on taxes and a time trend Concern: who are the lowest taxing states? Model subject to an omitted variables bias? 1 20

State rank per capita consumption - 200 21 State Rank Per capita packs/year KY 2 1. VA. TN. NC. SC 2.2 MD 3. US.2 22. * run regression with tax and trend. reg packs_pc real_tax trend. * time trend. gen trend=year-. label var trend "=1 in 1st year, 2 in second, etc"... * tobacco producing state. gen tob_state=(state=="nc" state=="va" state=="sc" state=="ky" state=="md" st > ate=="tn") Two new variables: A time trend, =1 in 1 st year, 2 in second, etc A dummy if the state produces tobacco Source SS df MS Number of obs = 20 -------------+------------------------------ F( 2, 1) = 1.3 Model 320. 2 12.2 Prob > F = 0.0000 Residual 1.2 1 0.002 R-squared = 0.1 -------------+------------------------------ Adj R-squared = 0.03 Total 1.32 1 00.31 Root MSE = 20. packs_pc Coef. Std. Err. t P> t [% Conf. Interval] real_tax -.3.030-1. 0.000 -.2 -.22 trend -1.32.0 -. 0.000-1.22-1.233 _cons. 2..3 0.000 1. 1.30 Each year, tobacco consumption falls 1. packs/person Every cent increase in the tax reduces consumption by. packs 23 2

. ttest packs_pc, by(tob_state) Two-sample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [% Conf. Interval] ---------+-------------------------------------------------------------------- 0 00 3.31.32 2.31 1..1 1 0 0.32 2.0 2.3.022.33 ---------+-------------------------------------------------------------------- combined 20.021.2 2.23.3.30 ---------+-------------------------------------------------------------------- diff -2.3 2. -32.0-21.0 diff = mean(0) - mean(1) t = -.22 Ho: diff = 0 degrees of freedom = 1 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr( T > t ) = 0.0000 Pr(T > t) = 1.0000 Tobacco producing states have substantially higher consumption Than non-tobacco states. * correlation between tax and other variables. reg real_tax trend tob_state Source SS df MS Number of obs = 20 -------------+------------------------------ F( 2, 1) = 2.0 Model.32 2 22.1 Prob > F = 0.0000 Residual 230.2 1 22.3022 R-squared = 0.33 -------------+------------------------------ Adj R-squared = 0.33 Total 31.0 1 30.1 Root MSE = 1.0 real_tax Coef. Std. Err. t P> t [% Conf. Interval] trend 1.3.0 1. 0.000 1.231 1.3 tob_state -22.330 1.3332-1.2 0.000-2.2023-1.3 _cons.3.20.3 0.000 3.33. Tobacco producing states have substantially lower taxes that Other states 2 2 Know two facts Consumption is higher in tobacco producing states Taxes are lower in tobacco producing states What should happen to the tax coefficient when the tob_state dummy is added to the model?, E[ ] ˆ x 1 1 2 1 x 1i 0 1 2i i ˆ 0, ˆ 0 and 0 1 1 2 2 1 β 1 0 2

. * add tobacco producing state dummy. reg packs_pc real_tax trend tob_state Source SS df MS Number of obs = 20 -------------+------------------------------ F( 3, 1) = 2.22 Model 32.1 3.0 Prob > F = 0.0000 Residual 330. 1 2.223 R-squared = 0.1 -------------+------------------------------ Adj R-squared = 0. Total 1.32 1 00.31 Root MSE = 20. packs_pc Coef. Std. Err. t P> t [% Conf. Interval] real_tax -.22.023-1. 0.000 -.1 -.2 trend -1.. -. 0.000-1.01-1.3 tob_state. 2.223.2 0.000.2 1.3 _cons 1.03 2.3332. 0.000.3.1 - storage display value variable name type format label variable label - male float %.0g dummy variable, =1 of male business float %.0g dummy variable, =1 if business major engineer float %.0g dummy variable, =1 if engineer greek float %.0g dummy variable, =1 if in sor/fraternity college_gpa float %.0g college GPA,.0 scale hs_gpa float %.0g high school GPA,.0 scale act float %.0g act score, 1-3 pc float %.0g dummy variable, =1 if own a PC - Sorted by: 2 30. * run regression. reg college_gpa hs_gpa act male greek business engineer pc Source SS df MS Number of obs = 11 -------------+------------------------------ F(, 3) =.1 Model.12.3012 Prob > F = 0.0000 Residual 1. 3.3 R-squared = 0.222 -------------+------------------------------ Adj R-squared = 0.2 Total 1.00 10.1 Root MSE =.33031 college_gpa Coef. Std. Err. t P> t [% Conf. Interval] hs_gpa.1.0. 0.000.21.1 act.001.02 0.0 0.2 -.03.0302 male.0.0 0.33 0.0 -.0.33 greek.0322331.001 0. 0.3 -.033. business.01.0 0. 0. -.02.2021 engineer -.21.1-1. 0.02 -.20.03 pc.1.02 3.0 0.003.031.223 _cons 1.1.333 3.2 0.001.02 1.32 cps.dta. gen ln_weekly_earn=ln(weekly_earn). gen union=union_status==1. gen nonwhite=((race==2) (race==3)) 31 32

. * run basic regression. * ln(weekly earnings) on age, educ, union nonwhite. reg ln_weekly age years_educ union nonwhite Source SS df MS Number of obs = -------------+------------------------------ F(, 1) = 12.0 Model 1.03 3.200 Prob > F = 0.0000 Residual 32.3 1.23 R-squared = 0.21 -------------+------------------------------ Adj R-squared = 0.21 Total 23.33.2321 Root MSE =.31 ln_weekly_~n Coef. Std. Err. t P> t [% Conf. Interval] age.002.0002. 0.000.023.0 years_educ.0022.00133.32 0.000.03.02 union.10.0002 1. 0.000.223. nonwhite -.2.0000-1.0 0.000 -.12 -.230 _cons.02.012 23. 0.000.022.21 Now change the reference group. gen non_union=union_status==2. gen white=race==1. * no change the reference groups for the. * dummy variables, adding non_union and white. * to the model. * ln(weekly earnings) on age, educ, nonunion white. reg ln_weekly age years_educ non_union white 33 3 Notice that changing the reference groups on the DVs does not change R2 or the coef s on other parameters. * ln(weekly earnings) on age, educ, nonunion white. reg ln_weekly age years_educ non_union white Source SS df MS Number of obs = -------------+------------------------------ F(, 1) = 12.0 Model 1.03 3.200 Prob > F = 0.0000 Residual 32.3 1.23 R-squared = 0.21 -------------+------------------------------ Adj R-squared = 0.21 Total 23.33.2321 Root MSE =.31 ln_weekly_~n Coef. Std. Err. t P> t [% Conf. Interval] age.002.0002. 0.000.023.0 years_educ.0022.00133.32 0.000.03.02 non_union -.10.0002-1. 0.000 -. -.223 white.2.0000 1.0 0.000.230.12 _cons.03.01 230. 0.000.21.0 Notice that the only thing that has changed is that the sign on the DVs has flipped 3. * generate regional dummy variables. gen region1=region==1. gen region2=region==2. gen region3=region==3. gen region=region== Generate dummies for each region of the country 3

Do something silly include all four dummy variables In the model --. * do something dumb -- include all dummy variables. reg ln_weekly age years_educ union nonwhite region1-region Source SS df MS Number of obs = -------------+------------------------------ F(, 1) = 1.3 Model 1.22 21.0322 Prob > F = 0.0000 Residual 31. 1.13 R-squared = 0.20 -------------+------------------------------ Adj R-squared = 0.2 Total 23.33.2321 Root MSE =.331 ln_weekly_~n Coef. Std. Err. t P> t [% Conf. Interval] age.0003.0002. 0.000.02.0 years_educ.032.0012. 0.000.0301.0 union.003.00 1.1 0.000.322.2 nonwhite -.323.003-1.3 0.000 -.123 -.13 region1 -.0021.001-0.0 0.3 -.021.003 region2 -.032.0023 -.3 0.000 -.0 -.033 region3 -.01.000 -. 0.000 -.02 -.021 region (dropped) _cons.33.0203 22. 0.000.3.3 STATA will remind you cannot run a model with all the Dummies included 3. * run model with regional dummmy variables. reg ln_weekly age years_educ union nonwhite region2-region Source SS df MS Number of obs = -------------+------------------------------ F(, 1) = 1.3 Model 1.22 21.0322 Prob > F = 0.0000 Residual 31. 1.13 R-squared = 0.20 -------------+------------------------------ Adj R-squared = 0.2 Total 23.33.2321 Root MSE =.331 ln_weekly_~n Coef. Std. Err. t P> t [% Conf. Interval] age.0003.0002. 0.000.02.0 years_educ.032.0012. 0.000.0301.0 union.003.00 1.1 0.000.322.2 nonwhite -.323.003-1.3 0.000 -.123 -.13 region2 -.031.00 -.21 0.000 -.02 -.02 region3 -.0.003 -.1 0.000 -.0 -.03 region.0021.001 0.0 0.3 -.003.021 _cons.30.020 22.0 0.000.1.01 Difference between region 3 and region : -0.0-0.002 = -0.0 Difference between region 2 and region : -0.0 0.002 = -0.03 3 degrees of freedom in denominator % Critical values of F-Distribution Degrees of Freedom in numerator 1 2 3.. 3.1 3. 3.33 3.22. 3. 3. 3.3 3.20 3.0. 3. 3. 3.2 3. 3.00. 3.1 3.1 3.1 3.03 2.2 1.0 3. 3.3 3. 2. 2. 1. 3. 3.2 3.0 2.0 2. 1. 3.3 3.2 3.01 2. 2. 1. 3. 3.20 2. 2.1 2.0 1.1 3. 3.1 2.3 2. 2. 1.3 3.2 3. 2.0 2. 2.3 20.3 3. 3. 2. 2.1 2.0 21.32 3. 3.0 2. 2. 2. 22.30 3. 3.0 2.2 2. 2. 23.2 3.2 3.03 2.0 2. 2.3 2.2 3.0 3.01 2. 2.2 2.1 30.1 3.32 2.2 2. 2.3 2.2 0.0 3.23 2. 2.1 2. 2.3 0.00 3.1 2. 2.3 2.3 2.2 0 3. 3. 2.1 2. 2.32 2.20 0 3.2 3.0 2. 2. 2.2 2.1 infinity 3. 3.00 2.1 2.3 2.21 2. 3. *test whether the regional effects are all zero. test region2 region3 region ( 1) region2 = 0 ( 2) region3 = 0 ( 3) region = 0 F( 3, 1) = 3. Prob > F = 0.0000 0

The coef s on the other parameters stay the same. Notice The the SSE, SSM, R2 do not change at all Change the reference group from region 1 to region All the coefficients are now in relation to the omitted group # E.g., The coefficient on region 3 is now the difference between region 3 and 1. *change the reference group from region1 to region. reg ln_weekly age years_educ union nonwhite region1-region3 Source SS df MS Number of obs = -------------+------------------------------ F(, 1) = 1.3 Model 1.22 21.0322 Prob > F = 0.0000 Residual 31. 1.13 R-squared = 0.20 -------------+------------------------------ Adj R-squared = 0.2 Total 23.33.2321 Root MSE =.331 ln_weekly_~n Coef. Std. Err. t P> t [% Conf. Interval] age.0003.0002. 0.000.02.0 years_educ.032.0012. 0.000.0301.0 union.003.00 1.1 0.000.322.2 nonwhite -.323.003-1.3 0.000 -.123 -.13 region1 -.0021.001-0.0 0.3 -.021.003 region2 -.032.0023 -.3 0.000 -.0 -.033 region3 -.01.000 -. 0.000 -.02 -.021 _cons.33.0203 22. 0.000.3.3 Coef on Region 1 is negative of the coef on region from previous model. Coef on regions 2 and 3 exactly as we would 2 expect Definitions Obesity based on Body Mass Index BMI = weight (kg)/(height in cm) 2 = 03 x weight (pounds)/(height in inches) 2 BMI < 20 Underweight 20 BMI < 2 Ideal 2 BMI < 30 overweight 30 BMI obese Obesity Rates Over Time Obesity Overweight Group / 1/00 / 1/00 All 1. 30... Males.2 2...0 Females 1. 3.0 1.1 2.0 Black F. 2. 0. 0..0 3

Contains data from bmi1.dta obs: 1,2 vars: 2 Sep 200 0: size: 33,3 (.% of memory free) - storage display value. * generate race dummy variables;. gen black=race==2. gen other_race=race==3. gen hispanic=race==. label var black "=1 of black, non hispanic". label var other_race "=1 if other race, non hispanic". label var hispanic "=1 if hispanic"... * generate overweight dummy. gen overweight=bmi>=2. label var overweight "dummy, =1 if overweight" variable name type format label variable label - age byte %.0g age in years sex byte %.0g =1 if male, =2 if female income int %.0g annual family income educ byte %.0g years of education srhealth byte %.0g self report health,1=excel,2=vgood,3=good, =fair,=poor bmi float %.0g body mass index totalexp long %.0g total annual expenditures on medical care smoker byte %.0g dummy variable, =1 if current smoker race float %.0g =1 if white non-hisp,=2 if black nonhisp,=3 other race,=hispanic -. reg overweight age educ incomel male black hispanic other_race smoker. * get table of overweight. tab overweight dummy, =1 if overweight Freq. Percent Cum. ------------+----------------------------------- 0 3 2. 2. 1 2 0.0 0.00 ------------+----------------------------------- Total 1,2 0.00 Source SS df MS Number of obs = -------------+------------------------------ F(, 0) =. Model 1. 2.3 Prob > F = 0.0000 Residual 2. 0.120 R-squared = 0.001 -------------+------------------------------ Adj R-squared = 0.01 Total 2..200 Root MSE =.32 overweight Coef. Std. Err. t P> t [% Conf. Interval] age.002.0013.2 0.000.00.003 educ -.021.00333-3.00 0.003 -.0213 -.00 incomel -.03.031-0.0 0.0 -.031.023 male -.1.0222 -.3 0.000 -.13 -.02 black.32.03. 0.000..20 hispanic.12.03301 3.2 0.001.03.1 other_race -.0.030-1.01 0.31 -.1230.022 smoker -.01.03121-0. 0.1 -.03.000 _cons.21.3 1.3 0.03 -.03 1.23