STATA Program for OLS cps87_or.do

Size: px

Start display at page:

Download "STATA Program for OLS cps87_or.do"

Michael Powell
5 years ago
Views:

1 STATA Program for OLS cps87_or.do * the data for this project is a small subsample; * of full time (30 or more hours) male workers; * aged from the out going rotation; * samples of the 1987 current population survey; * this line defines the semicolon as the ; * end of line delimiter; # delimit ; * set memork for 10 meg; set memory 10m; * write results to a log file; * the replace options writes over old; * log files; log using cps87_or.log,replace; * open stata data set; use c:\bill\stata\cps87_or; * list variables and labels in data set; desc; * generate new variables; * lines 1-2 illustrate basic math functoins; * lines 3-4 line illustrate logical operators; * line 5 illustrate the OR statement; * line 6 illustrates the AND statement; * after you construct new variables, compress the data again; gen age2=age*age; gen earnwkl=ln(earnwke); gen union=unionm==1; gen topcode=earnwke==999; gen nonwhite=((race==2) (race==3)); gen big_ne=((region==1)&(smsa==1)); * label the data; label var age2 "age squared"; label var earnwkl "log earnings per week"; label var topcode "=1 if earnwkl is topcoded"; label var union "1=in union, 0 otherwise"; label var nonwhite "1=nonwhite, 0=white" ; label var big_ne "1= live in big smsa from northeast, 0=otherwsie"; * get descriptive statistics; sum; * get detailed descriptcs for continuous variables; sum earnwke, detail; 167

2 * get frequencies of discrete variables; tabulate unionm; tabulate race; * get two-way table of frequencies; tabulate region smsa, row column cell; *run simple regression; reg earnwkl age age2 educ nonwhite union; * run regression addinf smsa, region and race fixed-effects; * the xi command constructs the dummies for you; * the lowest numbered dummy is usually the; * omitted variable; xi: reg earnwkl age age2 educ union i.race i.region i.smsa; more; * close log file; log close; 168

3 STATA Results for OLS cps87_do.log log: c:\bill\stata\cps87_or.log log type: text opened on: 6 Nov 2004, 08:14:10. * open stata data set;. use c:\bill\stata\cps87_or;. * list variables and labels in data set;. desc; Contains data from c:\bill\stata\cps87_or.dta obs: 19,906 vars: 7 6 Nov :11 size: 636,992 (93.9% of memory free) > - storage display value variable name type format label variable label > - age float %9.0g age in years race float %9.0g 1=white, non-hisp, 2=place, n.h, 3=hisp educ float %9.0g years of education unionm float %9.0g 1=union member, 2=otherwise smsa float %9.0g 1=live in 19 largest smsa, 2=other smsa, 3=non smsa region float %9.0g 1=east, 2=midwest, 3=south, 4=west earnwke float %9.0g usual weekly earnings > - Sorted by:. * generate new variables;. * lines 1-2 illustrate basic math functoins;. * lines 3-4 line illustrate logical operators;. * line 5 illustrate the OR statement;. * line 6 illustrates the AND statement;. * after you construct new variables, compress the data again;. gen age2=age*age;. gen earnwkl=ln(earnwke);. gen union=unionm==1;. gen topcode=earnwke==999;. gen nonwhite=((race==2) (race==3));. gen big_ne=((region==1)&(smsa==1)); 169

4 . * label the data;. label var age2 "age squared";. label var earnwkl "log earnings per week";. label var topcode "=1 if earnwkl is topcoded";. label var union "1=in union, 0 otherwise";. label var nonwhite "1=nonwhite, 0=white" ;. label var big_ne "1= live in big smsa from northeast, 0=otherwsie";. compress; age was float now byte race was float now byte educ was float now byte unionm was float now byte smsa was float now byte region was float now byte earnwke was float now int age2 was float now int union was float now byte topcode was float now byte nonwhite was float now byte big_ne was float now byte. more;. * get descriptive statistics;. sum; Variable Obs Mean Std. Dev. Min Max age race educ unionm smsa region earnwke age earnwkl union topcode nonwhite big_ne * get detailed descriptics for continuous variables;. sum earnwke, detail; usual weekly earnings Percentiles Smallest 1%

5 5% % Obs % Sum of Wgt % 449 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis more;. * get frequencies of discrete variables;. tabulate unionm; 1=union member, 2=otherwise Freq. Percent Cum , , Total 19, tabulate race; 1=white, non-hisp, 2=place, n.h, 3=hisp Freq. Percent Cum , , , Total 19, more;. * get two-way table of frequencies;. tabulate region smsa, row column cell; Key frequency row percentage column percentage cell percentage =east, 2=midwest, 1=live in 19 largest smsa, 3=south, 2=other smsa, 3=non smsa 4=west Total

6 1 2,806 1, , ,501 1,742 1,592 4, ,501 2,542 1,904 5, ,487 1,507 1,133 4, Total 7,295 7,140 5,471 19, more;. *run simple regression;. reg earnwkl age age2 educ nonwhite union; Source SS df MS Number of obs = F( 5, 19900) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = earnwkl Coef. Std. Err. t P> t [95% Conf. Interval] age age educ nonwhite union _cons more;. * run regression addinf smsa, region and race fixed-effects;. * the xi command constructs the dummies for you;. * the lowest numbered dummy is usually the;. * omitted variable;. xi: reg earnwkl age age2 educ union i.race i.region i.smsa; i.race _Irace_1-3 (naturally coded; _Irace_1 omitted) 172

7 i.region _Iregion_1-4 (naturally coded; _Iregion_1 omitted) i.smsa _Ismsa_1-3 (naturally coded; _Ismsa_1 omitted) Source SS df MS Number of obs = F( 11, 19894) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = earnwkl Coef. Std. Err. t P> t [95% Conf. Interval] age age educ union _Irace_ _Irace_ _Iregion_ _Iregion_ _Iregion_ _Ismsa_ _Ismsa_ _cons more;. * close log file;. log close; log: c:\bill\stata\cps87_or.log log type: text closed on: 6 Nov 2004, 08:14:19 173

8 STATA Program for Probit/Logit Models workplace.do * this data for this program are a random sample; * of 10k observations from the data used in; * evans, farrelly and montgomery, aer, 1999; * the data are indoor workers in the 1991 and 1993; * national health interview survey. the survey; * identifies whether the worker smoked and whether; * the worker faces a workplace smoking ban; * set semi colon as the end of line; # delimit; * ask it NOT to pause; set more off; * open log file; log using c:\bill\jpsm\workplace1.log,replace; * use the workplace data set; use c:\bill\jpsm\workplace1; * print out variable labels; desc; * get summary statistics; sum; * run a linear probability model for comparison purposes; * estimate white standard errors to control for heteroskedasticity; reg smoker age incomel male black hispanic hsgrad somecol college worka, robust; * run probit model; probit smoker age incomel male black hispanic hsgrad somecol college worka; *predict probability of smoking; predict pred_prob_smoke; * get detailed descriptive data about predicted prob; sum pred_prob, detail; * predict binary outcome with 50% cutoff; gen pred_smoke1=pred_prob_smoke>=.5; label variable pred_smoke1 "predicted smoking, 50% cutoff"; * compare actual values; tab smoker pred_smoke1, row col cell; * ask for marginal effects/treatment effects; mfx compute; 174

9 * the same type of variables can be produced with; * prchange. this command is however more flexible; * in that you can change the reference individual; prchange, help; * get marginal effect/treatment effects for specific person; * male, age 40, college educ, white, without workplace smoking ban; * if a variable is not specified, its value is assumed to be; * the sample mean. in this case, the only variable i am not; * listing is mean log income; prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0); * using a wald test, test the null hypothesis that; * all the education coefficients are zero; test hsgrad somecol college; * how to run the same tets with a -2 log like test; * estimate the unresticted model and save the estimates ; * in urmodel; probit smoker age incomel male black hispanic hsgrad somecol college worka; estimates store urmodel; * estimate the restricted model. save results in rmodel; probit smoker age incomel male black hispanic worka; estimates store rmodel; lrtest urmodel rmodel; * run logit model; logit smoker age incomel male black hispanic hsgrad somecol college worka; * ask for marginal effects/treatment effects; * logit model; mfx compute; log close; 175

10 STATA Results for Probit/Logit Models workplace.log log: c:\bill\jpsm\workplace1.log log type: text opened on: 4 Nov 2004, 07:29:21. * use the workplace data set;. use c:\bill\jpsm\workplace1;. * print out variable labels;. desc; Contains data from c:\bill\jpsm\workplace1.dta obs: 16,258 vars: Oct :27 size: 325,160 (96.9% of memory free) > - storage display value variable name type format label variable label > - smoker byte %9.0g is current smoking worka byte %9.0g has workplace smoking bans age byte %9.0g age in years male byte %9.0g male black byte %9.0g black hispanic byte %9.0g hispanic incomel float %9.0g log income hsgrad byte %9.0g is hs graduate somecol byte %9.0g has some college college float %9.0g > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max smoker worka age male black hispanic incomel hsgrad somecol college * run a linear probability model for comparison purposes; 176

11 . * estimate white standard errors to control for heteroskedasticity;. reg smoker age incomel male black hispanic > hsgrad somecol college worka, robust; Regression with robust standard errors Number of obs = F( 9, 16248) = Prob > F = R-squared = Root MSE = Robust smoker Coef. Std. Err. t P> t [95% Conf. Interval] age incomel male black hispanic hsgrad somecol college worka _cons * run probit model;. probit smoker age incomel male black hispanic > hsgrad somecol college worka; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Probit estimates Number of obs = LR chi2(9) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker Coef. Std. Err. z P> z [95% Conf. Interval] age incomel male black hispanic hsgrad somecol college worka _cons *predict probability of smoking;. predict pred_prob_smoke; 177

12 (option p assumed; Pr(smoker)). * get detailed descriptive data about predicted prob;. sum pred_prob, detail; Pr(smoker) Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis * predict binary outcome with 50% cutoff;. gen pred_smoke1=pred_prob_smoke>=.5;. label variable pred_smoke1 "predicted smoking, 50% cutoff";. * compare actual values;. tab smoker pred_smoke1, row col cell; Key frequency row percentage column percentage cell percentage predicted smoking, is current 50% cutoff smoking 0 1 Total , , , , Total 16, ,

13 . * ask for marginal effects/treatment effects;. mfx compute; Marginal effects after probit y = Pr(smoker) (predict) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age incomel male* black* hispanic* hsgrad* somecol* college* worka* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * the same type of variables can be produced with;. * prchange. this command is however more flexible;. * in that you can change the reference individual;. prchange, help; probit: Changes in Predicted Probabilities for smoker min->max 0->1 -+1/2 -+sd/2 MargEfct age incomel male black hispanic hsgrad somecol college worka Pr(y x) age incomel male black hispanic hsgrad somecol x= sd(x)= college worka x= sd(x)= Pr(y x): probability of observing each y for specified x values Avg Chg : average of absolute value of the change across categories Min->Max: change in predicted probability as x changes from its minimum to its maximum 0->1: change in predicted probability as x changes from 0 to 1 -+1/2: change in predicted probability as x changes from 1/2 unit below base value to 1/2 unit above 179

14 -+sd/2: change in predicted probability as x changes from 1/2 standard dev below base to 1/2 standard dev above MargEfct: the partial derivative of the predicted probability/rate with respect to a given independent variable. * get marginal effect/treatment effects for specific person;. * male, age 40, college educ, white, without workplace smoking ban;. * if a variable is not specified, its value is assumed to be;. * the sample mean. in this case, the only variable i am not;. * listing is mean log income;. prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0); probit: Changes in Predicted Probabilities for smoker min->max 0->1 -+1/2 -+sd/2 MargEfct age incomel male black hispanic hsgrad somecol college worka Pr(y x) age incomel male black hispanic hsgrad somecol x= sd(x)= college worka x= sd(x)= * using a wald test, test the null hypothesis that;. * all the education coefficients are zero;. test hsgrad somecol college; ( 1) hsgrad = 0 ( 2) somecol = 0 ( 3) college = 0 chi2( 3) = Prob > chi2 = * how to run the same tets with a -2 log like test;. * estimate the unresticted model and save the estimates ;. * in urmodel;. probit smoker age incomel male black hispanic > hsgrad somecol college worka; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood =

15 Probit estimates Number of obs = LR chi2(9) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker Coef. Std. Err. z P> z [95% Conf. Interval] age incomel male black hispanic hsgrad somecol college worka _cons estimates store urmodel;. * estimate the restricted model. save results in rmodel;. probit smoker age incomel male black hispanic > worka; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Probit estimates Number of obs = LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker Coef. Std. Err. z P> z [95% Conf. Interval] age incomel male black hispanic worka _cons estimates store rmodel;. lrtest urmodel rmodel; likelihood-ratio test LR chi2(3) = (Assumption: rmodel nested in urmodel) Prob > chi2 = * run logit model;. logit smoker age incomel male black hispanic 181

16 > hsgrad somecol college worka; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Logit estimates Number of obs = LR chi2(9) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker Coef. Std. Err. z P> z [95% Conf. Interval] age incomel male black hispanic hsgrad somecol college worka _cons * ask for marginal effects/treatment effects;. * logit model;. mfx compute; Marginal effects after logit y = Pr(smoker) (predict) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age incomel male* black* hispanic* hsgrad* somecol* college* worka* (*) dy/dx is for discrete change of dummy variable from 0 to 1. log close; log: c:\bill\jpsm\workplace1.log log type: text closed on: 4 Nov 2004, 07:30:16 182

17 STATA Program for Odds Ratio in Logit Models natal95.do * this data set is a small.005 % random sample; * of observations from the 1995 natality detail; * data. we will examine the impack of smoking: * on birth weight. two large states, NY and CA, do not; * record mothers smoking status. therefore, of the ; * 4 million births in the US, only 3 million have all; * the necessary data so there should be 3 million*.005; * or roughly 15,000 obs; * set semi colon as the end of line; # delimit; * ask it NOT to pause; set more off; * open log file; log using c:\bill\jpsm\natal95.log,replace; * use the natality detail data set; use c:\bill\jpsm\natal95; * print out variable labels; desc; * construct indicator for low birth weight; gen lowbw=birthw<=2500; label variable lowbw "dummy variable, =1 ifbw<2500 grams"; * get frequencies; tab lowbw smoked, col row cell; * run a logit model; xi: logit lowbw smoked age married i.educ5 i.race4; * get marginal effects; mfx compute; * run a logit but report the odds ratios instead; xi: logistic lowbw smoked age married i.educ5 i.race4; log close; 183

18 STATA Results for Odds Ratio in Logit Models natal95.log log: c:\bill\jpsm\natal95.log log type: text opened on: 4 Nov 2004, 05:48:05. * use the natality detail data set;. use c:\bill\jpsm\natal95;. * print out variable labels;. desc; Contains data from c:\bill\jpsm\natal95.dta obs: 14,230 vars: 7 27 Oct :58 size: 170,760 (98.4% of memory free) > - storage display value variable name type format label variable label > - birthw int %9.0g birth weight in grams smoked byte %9.0g =1 if mom smoked during pregnancy age byte %9.0g moms age at birth married byte %9.0g =1 if married race4 byte %9.0g 1=white,2=black,3=asian,4=other educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15, 5=16+ visits byte %9.0g prenatal visits > - Sorted by:. * construct indicator for low birth weight;. gen lowbw=birthw<=2500;. label variable lowbw "dummy variable, =1 ifbw<2500 grams";. * get frequencies;. tab lowbw smoked, col row cell; Key frequency row percentage column percentage cell percentage dummy variable, 184

19 =1 =1 if mom smoked ifbw<2500 during pregnancy grams 0 1 Total ,626 1,745 13, Total 12,285 1,945 14, * run a logit model;. xi: logit lowbw smoked age married i.educ5 i.race4; i.educ5 _Ieduc5_1-5 (naturally coded; _Ieduc5_1 omitted) i.race4 _Irace4_1-4 (naturally coded; _Irace4_1 omitted) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Logit estimates Number of obs = LR chi2(10) = Prob > chi2 = Log likelihood = Pseudo R2 = lowbw Coef. Std. Err. z P> z [95% Conf. Interval] smoked age married _Ieduc5_ _Ieduc5_ _Ieduc5_ _Ieduc5_ _Irace4_ _Irace4_ _Irace4_ _cons * get marginal effects;. mfx compute; Marginal effects after logit y = Pr(lowbw) (predict) 185

20 = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X smoked* age married* _Ieduc~2* _Ieduc~3* _Ieduc~4* _Ieduc~5* _Irace~2* _Irace~3* _Irace~4* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * run a logit but report the odds ratios instead;. xi: logistic lowbw smoked age married i.educ5 i.race4; i.educ5 _Ieduc5_1-5 (naturally coded; _Ieduc5_1 omitted) i.race4 _Irace4_1-4 (naturally coded; _Irace4_1 omitted) Logistic regression Number of obs = LR chi2(10) = Prob > chi2 = Log likelihood = Pseudo R2 = lowbw Odds Ratio Std. Err. z P> z [95% Conf. Interval] smoked age married _Ieduc5_ _Ieduc5_ _Ieduc5_ _Ieduc5_ _Irace4_ _Irace4_ _Irace4_ log close; log: c:\bill\jpsm\natal95.log log type: text closed on: 4 Nov 2004, 05:48:39 * this example is attributed to jeff smith from; * the economics department at michigan. the data; * set contains a sample of 1500 females who; * participated in the job training partnership act program; * each respondent could have received one of 4 job training; * services. 1=classroom training. 2=on the job training; * 3= job search assistance, 4=other; 186

21 STATA Program for Ordered Probit Models sr_health_status.do * this data for this example are adults, 18-64; * who answered the cancer control supplement to; * the 1994 national health interview survey; * the key outcome is self reported health status; * coded 1-5, poor, fair, good, very good, excellent; * a ke covariate is current smoking status and whether; * one smoked 5 years ago; # delimit; set memory 20m; set matsize 200; set more off; log using c:\bill\jpsm\sr_health_status.log,replace; * load up sas data set; use c:\bill\jpsm\sr_health_status; * get contents of data file; desc; * get summary statistics; sum; * get tabulation of sr_health; tab sr_health; * run OLS models, just to look at the raw correlations in data; reg sr_health male age educ famincl black othrace smoke smoke5; * do ordered probit, self reported health status; oprobit sr_health male age educ famincl black othrace smoke smoke5; * get marginal effects, evaluated at y=5 (excellent); mfx compute, predict(outcome(5)); * get marginal effects, evaluated at y=3 (good); mfx compute, predict(outcome(3)); * use prchange, evaluate marginal effects for; * 40 year old white female with a college degree; * never smoked with average log income; prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16); log close; 187

22 STATA Results for Ordered Probit Models sr_health_status.log log: c:\bill\iadb\sr_health_status.log log type: text opened on: 1 Nov 2004, 12:06:56. * load up sas data set;. use sr_health_status;. * get contents of data file;. desc; Contains data from sr_health_status.dta obs: 12,900 vars: 9 1 Nov :51 size: 322,500 (98.5% of memory free) > - storage display value variable name type format label variable label > - male byte %9.0g =1 if male age byte %9.0g age in years educ byte %9.0g years of education smoke byte %9.0g current smoker smoke5 byte %9.0g smoked in past 5 years black float %9.0g =1 if respondent is black othrace float %9.0g =1 if other race (white is ref) sr_health float %9.0g 1-5 self reported health, 5=excel, 1=poor famincl float %9.0g log family income > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max male age educ smoke smoke black othrace sr_health famincl * get tabulation of sr_health;. tab sr_health; 188

23 1-5 self reported health, 5=excel, 1=poor Freq. Percent Cum , , , Total 12, * run OLS models, just to look at the raw correlations in data;. reg sr_health male age educ famincl black othrace smoke smoke5; Source SS df MS Number of obs = F( 8, 12891) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = sr_health Coef. Std. Err. t P> t [95% Conf. Interval] male age educ famincl black othrace smoke smoke _cons * do ordered probit, self reported health status;. oprobit sr_health male age educ famincl black othrace smoke smoke5; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Ordered probit estimates Number of obs = LR chi2(8) = Prob > chi2 = Log likelihood = Pseudo R2 = sr_health Coef. Std. Err. z P> z [95% Conf. Interval] male age

24 educ famincl black othrace smoke smoke _cut (Ancillary parameters) _cut _cut _cut * get marginal effects, evaluated at y=5 (excellent);. mfx compute, predict(outcome(5)); Marginal effects after oprobit y = Pr(sr_health==5) (predict, outcome(5)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X male* age educ famincl black* othrace* smoke* smoke5* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * get marginal effects, evaluated at y=3 (good);. mfx compute, predict(outcome(3)); Marginal effects after oprobit y = Pr(sr_health==3) (predict, outcome(3)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X male* age educ famincl black* othrace* smoke* smoke5* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * use prchange, evaluate marginal effects for;. * 40 year old white female with a college degree;. * never smoked with average log income;. prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16); 190

25 oprobit: Changes in Predicted Probabilities for sr_health male Avg Chg > > age Avg Chg Min->Max / sd/ MargEfct Min->Max / sd/ MargEfct educ Avg Chg Min->Max / sd/ MargEfct Min->Max / sd/ MargEfct famincl Avg Chg Min->Max / sd/ MargEfct Min->Max / sd/ MargEfct black Avg Chg > othrace 5 0->

26 Avg Chg > > smoke Avg Chg > > smoke5 Avg Chg > > Pr(y x) male age educ famincl black othrace smoke x= sd(x)= smoke5 x= 0 sd(x)= log close; log: c:\bill\iadb\sr_health_status.log log type: text closed on: 1 Nov 2004, 12:07:40 192

27 STATA Program for Multinomial Logit Model Job_training_example.do * set end of line marker; # delimit; set more off; * increase memory; set memory 20m; * write results to file; log using c:\bill\jpsm\job_training_example.log,replace; * load up sas data set; use c:\bill\jpsm\job_training_example; * get contents of data file; desc; * get summary statistics; sum; * get frequency of choice variable; tab choice; * run multinomial logit. omitted groups are; * whites, those with > 12 years of ed, those w/ work experience; * base(#) tells STATA what category should be the reference option; * base(4) is using other as the reference group; mlogit choice age black hisp nvrwrk lths hsgrad, base(4); * get marginal effects for the 4 options, on the job training; mfx compute, predict(outcome(1)); mfx compute, predict(outcome(2)); mfx compute, predict(outcome(3)); mfx compute, predict(outcome(4)); * test for IIA using the Hausam test; * the program eliminates one choice at ; * a time then compares the unrestricted; * estimates to the restricted ones; mlogtest, hausman; log close; 193

28 STATA Reults for Multinomial Logit Model Job_training_example.log log: c:\bill\jpsm\job_training_example.log log type: text opened on: 27 May 2006, 06:15:58. * load up sas data set;. use c:\bill\jpsm\job_training_example;. * get contents of data file;. desc; Contains data from c:\bill\jpsm\job_training_example.dta obs: 1,500 vars: 9 17 May :09 size: 24,000 (99.9% of memory free) > - storage display value variable name type format label variable label > - pid long %10.0g personal ID number age byte %4.0f age in years lths byte %9.0g =1 if education < hs grad hsgrad byte %9.0g =1 if education is 12 years gths byte %9.0g =1 of education is > 12 years black byte %9.0g =1 if black, =0 otherwise hisp byte %9.0g =1 if hispanic, =0 otherwise nvrwrk byte %9.0g =1 if never worked, =0 otherwise choice byte %9.0g > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max pid age lths hsgrad gths black hisp nvrwrk choice * get frequency of choice variable;. tab choice; 194

29 choice Freq. Percent Cum Total 1, * run multinomial logit. omitted groups are;. * whites, those with > 12 years of ed, those w/ work experience;. * base(#) tells STATA what category should be the reference option;. * base(4) is using other as the reference group;. mlogit choice age black hisp nvrwrk lths hsgrad, base(4); Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Multinomial logistic regression Number of obs = 1500 LR chi2(18) = Prob > chi2 = Log likelihood = Pseudo R2 = choice Coef. Std. Err. z P> z [95% Conf. Interval] 1 age black hisp nvrwrk lths hsgrad _cons age black hisp nvrwrk lths hsgrad _cons age black hisp nvrwrk lths hsgrad _cons (Outcome choice==4 is the comparison group) 195

30 . * get marginal effects for the 4 options, on the job training;. mfx compute, predict(outcome(1)); Marginal effects after mlogit y = Pr(choice==1) (predict, outcome(1)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age black* hisp* nvrwrk* lths* hsgrad* (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(2)); Marginal effects after mlogit y = Pr(choice==2) (predict, outcome(2)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age black* hisp* nvrwrk* lths* hsgrad* (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(3)); Marginal effects after mlogit y = Pr(choice==3) (predict, outcome(3)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age black* hisp* nvrwrk* lths* hsgrad* (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(4)); Marginal effects after mlogit y = Pr(choice==4) (predict, outcome(4)) 196

31 = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age black* hisp* nvrwrk* lths* hsgrad* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * test for IIA using the Hausam test;. * the program eliminates one choice at ;. * a time then compares the unrestricted;. * estimates to the restricted ones;. mlogtest, hausman; **** Hausman tests of IIA assumption Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted chi2 df P>chi2 evidence for Ho for Ho for Ho log close; log: c:\bill\jpsm\job_training_example.log log type: text closed on: 27 May 2006, 06:17:03 197

32 STATA Program for Conditional and Mixed Logit Models Travel_choice_example.do * set end of line marker; # delimit; set more off; * increase memory; set memory 20m; * write results to file; log using c:\bill\jpsm\travel_choice_example.log,replace; * load up sas data set; use c:\bill\jpsm\travel_choice_example; * get contents of data file; desc; * get summary statistics; sum; * get freqency of options; tab choice; * construct dummy variables for intercepts; * with j choices, need j-1 options; gen air=mode==1; gen train=mode==2; gen bus=mode==3; gen car=mode==4; * interact hhinc and group size with choice dummies; gen hhinc_air=air*hhinc; gen hhinc_train=train*hhinc; gen hhinc_bus=bus*hhinc; * if mode of transportation is a car, costs are costs; * if mode is bus/train/air, costs are grp_size x costs; gen group_costs=car*costs+(1-car)*groupsize*costs; * get means by choices; sum time group_costs if mode==1; sum time group_costs if mode==2; sum time group_costs if mode==3; sum time group_costs if mode==4; * run mcfaddens choice model. for covariates add; * a) j-1 option dummies; * c) variables that vary by choice; 198

33 clogit choice air train bus time group_costs, group(hhid); * run another model but add; * c) income and interacted w/ choice dummies; clogit choice air train bus time group_costs hhinc_*, group(hhid); * print out odds ratios; listcoef; * in this section we simulate the change in the; * choices if we increase the travel time; * by car by 30 minutes; * get the predicted probabilities given original; * values of Xs; predict pred0; * for mode=4, add 30 minutes; replace time=time+30 if mode==4; * get new predicted probabilities with new time; predict pred30; * change in probabilities; gen change_p=pred30-pred0; * get means of change in probs; sum change_p if mode==1; sum change_p if mode==2; sum change_p if mode==3; sum change_p if mode==4; * before you forget, change time back to; * original value; replace time=time-30 if mode==4; log close; 199

34 STATA Results for Conditional and Mixed Logit Models Travel_choice_example.log log: c:\bill\jpsm\travel_choice_example.log log type: text opened on: 27 May 2006, 07:42:17. * load up sas data set;. use c:\bill\jpsm\travel_choice_example;. * get contents of data file;. desc; Contains data from c:\bill\jpsm\travel_choice_example.dta obs: 840 vars: 7 17 May :08 size: 11,760 (99.9% of memory free) > - storage display value variable name type format label variable label > - hhid int %8.0g household ID mode byte %8.0g 1=air, 2=train, 3=bus, 4=car choice byte %8.0g =1 if choice, =0 otherwise time int %8.0g travel time in minutes costs int %8.0g travel costs in dollars hhinc byte %8.0g household income (x1000) groupsize byte %8.0g # of people in traveling party > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max hhid mode choice time costs hhinc groupsize * get freqency of options;. tab choice; =1 if choice, =0 otherwise Freq. Percent Cum

35 Total * construct dummy variables for intercepts;. * with j choices, need j-1 options;. gen air=mode==1;. gen train=mode==2;. gen bus=mode==3;. gen car=mode==4;. * interact hhinc and group size with choice dummies;. gen hhinc_air=air*hhinc;. gen hhinc_train=train*hhinc;. gen hhinc_bus=bus*hhinc;. * if mode of transportation is a car, costs are costs;. * if mode is bus/train/air, costs are grp_size x costs;. gen group_costs=car*costs+(1-car)*groupsize*costs;. * get means by choices;. sum time group_costs if mode==1; Variable Obs Mean Std. Dev. Min Max time group_costs sum time group_costs if mode==2; Variable Obs Mean Std. Dev. Min Max time group_costs sum time group_costs if mode==3; Variable Obs Mean Std. Dev. Min Max time group_costs sum time group_costs if mode==4; Variable Obs Mean Std. Dev. Min Max time group_costs * run mcfaddens choice model. for covariates add;. * a) j-1 option dummies; 201

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment Dummy variables Treatment 22 1 1 Control 3 2 Y Y1 0 1 2 Y X X i identifies treatment 1 1 1 1 1 1 0 0 0 X i =1 if in treatment group X i =0 if in control H o : u n =u u Are wages different across union/nonunion