STATA Program for OLS cps87_or.do

STATA Program for OLS cps87_or.do * the data for this project is a small subsample; * of full time (30 or more hours) male workers; * aged 21-64 from the out going rotation; * samples of the 1987 current population survey; * this line defines the semicolon as the ; * end of line delimiter; # delimit ; * set memork for 10 meg; set memory 10m; * write results to a log file; * the replace options writes over old; * log files; log using cps87_or.log,replace; * open stata data set; use c:\bill\stata\cps87_or; * list variables and labels in data set; desc; * generate new variables; * lines 1-2 illustrate basic math functoins; * lines 3-4 line illustrate logical operators; * line 5 illustrate the OR statement; * line 6 illustrates the AND statement; * after you construct new variables, compress the data again; gen age2=age*age; gen earnwkl=ln(earnwke); gen union=unionm==1; gen topcode=earnwke==999; gen nonwhite=((race==2) (race==3)); gen big_ne=((region==1)&(smsa==1)); * label the data; label var age2 "age squared"; label var earnwkl "log earnings per week"; label var topcode "=1 if earnwkl is topcoded"; label var union "1=in union, 0 otherwise"; label var nonwhite "1=nonwhite, 0=white" ; label var big_ne "1= live in big smsa from northeast, 0=otherwsie"; * get descriptive statistics; sum; * get detailed descriptcs for continuous variables; sum earnwke, detail; 167

* get frequencies of discrete variables; tabulate unionm; tabulate race; * get two-way table of frequencies; tabulate region smsa, row column cell; *run simple regression; reg earnwkl age age2 educ nonwhite union; * run regression addinf smsa, region and race fixed-effects; * the xi command constructs the dummies for you; * the lowest numbered dummy is usually the; * omitted variable; xi: reg earnwkl age age2 educ union i.race i.region i.smsa; more; * close log file; log close; 168

STATA Results for OLS cps87_do.log log: c:\bill\stata\cps87_or.log log type: text opened on: 6 Nov 2004, 08:14:10. * open stata data set;. use c:\bill\stata\cps87_or;. * list variables and labels in data set;. desc; Contains data from c:\bill\stata\cps87_or.dta obs: 19,906 vars: 7 6 Nov 2004 08:11 size: 636,992 (93.9% of memory free) > - storage display value variable name type format label variable label > - age float %9.0g age in years race float %9.0g 1=white, non-hisp, 2=place, n.h, 3=hisp educ float %9.0g years of education unionm float %9.0g 1=union member, 2=otherwise smsa float %9.0g 1=live in 19 largest smsa, 2=other smsa, 3=non smsa region float %9.0g 1=east, 2=midwest, 3=south, 4=west earnwke float %9.0g usual weekly earnings > - Sorted by:. * generate new variables;. * lines 1-2 illustrate basic math functoins;. * lines 3-4 line illustrate logical operators;. * line 5 illustrate the OR statement;. * line 6 illustrates the AND statement;. * after you construct new variables, compress the data again;. gen age2=age*age;. gen earnwkl=ln(earnwke);. gen union=unionm==1;. gen topcode=earnwke==999;. gen nonwhite=((race==2) (race==3));. gen big_ne=((region==1)&(smsa==1)); 169

. * label the data;. label var age2 "age squared";. label var earnwkl "log earnings per week";. label var topcode "=1 if earnwkl is topcoded";. label var union "1=in union, 0 otherwise";. label var nonwhite "1=nonwhite, 0=white" ;. label var big_ne "1= live in big smsa from northeast, 0=otherwsie";. compress; age was float now byte race was float now byte educ was float now byte unionm was float now byte smsa was float now byte region was float now byte earnwke was float now int age2 was float now int union was float now byte topcode was float now byte nonwhite was float now byte big_ne was float now byte. more;. * get descriptive statistics;. sum; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age 19906 37.96619 11.15348 21 64 race 19906 1.199136.525493 1 3 educ 19906 13.16126 2.795234 0 18 unionm 19906 1.769065.4214418 1 2 smsa 19906 1.908369.7955814 1 3 -------------+-------------------------------------------------------- region 19906 2.462373 1.079514 1 4 earnwke 19906 488.264 236.4713 60 999 age2 19906 1565.826 912.4383 441 4096 earnwkl 19906 6.067307.513047 4.094345 6.906755 union 19906.2309354.4214418 0 1 -------------+-------------------------------------------------------- topcode 19906.0719381.2583919 0 1 nonwhite 19906.1408118.3478361 0 1 big_ne 19906.1409625.3479916 0 1. * get detailed descriptics for continuous variables;. sum earnwke, detail; usual weekly earnings ------------------------------------------------------------- Percentiles Smallest 1% 128 60 170

5% 178 60 10% 210 60 Obs 19906 25% 300 63 Sum of Wgt. 19906 50% 449 Mean 488.264 Largest Std. Dev. 236.4713 75% 615 999 90% 865 999 Variance 55918.7 95% 999 999 Skewness.668646 99% 999 999 Kurtosis 2.632356. more;. * get frequencies of discrete variables;. tabulate unionm; 1=union member, 2=otherwise Freq. Percent Cum. ------------+----------------------------------- 1 4,597 23.09 23.09 2 15,309 76.91 100.00 ------------+----------------------------------- Total 19,906 100.00. tabulate race; 1=white, non-hisp, 2=place, n.h, 3=hisp Freq. Percent Cum. ------------+----------------------------------- 1 17,103 85.92 85.92 2 1,642 8.25 94.17 3 1,161 5.83 100.00 ------------+----------------------------------- Total 19,906 100.00. more;. * get two-way table of frequencies;. tabulate region smsa, row column cell; +-------------------+ Key ------------------- frequency row percentage column percentage cell percentage +-------------------+ 1=east, 2=midwest, 1=live in 19 largest smsa, 3=south, 2=other smsa, 3=non smsa 4=west 1 2 3 Total -----------+---------------------------------+---------- 171

1 2,806 1,349 842 4,997 56.15 27.00 16.85 100.00 38.46 18.89 15.39 25.10 14.10 6.78 4.23 25.10 -----------+---------------------------------+---------- 2 1,501 1,742 1,592 4,835 31.04 36.03 32.93 100.00 20.58 24.40 29.10 24.29 7.54 8.75 8.00 24.29 -----------+---------------------------------+---------- 3 1,501 2,542 1,904 5,947 25.24 42.74 32.02 100.00 20.58 35.60 34.80 29.88 7.54 12.77 9.56 29.88 -----------+---------------------------------+---------- 4 1,487 1,507 1,133 4,127 36.03 36.52 27.45 100.00 20.38 21.11 20.71 20.73 7.47 7.57 5.69 20.73 -----------+---------------------------------+---------- Total 7,295 7,140 5,471 19,906 36.65 35.87 27.48 100.00 100.00 100.00 100.00 100.00 36.65 35.87 27.48 100.00. more;. *run simple regression;. reg earnwkl age age2 educ nonwhite union; Source SS df MS Number of obs = 19906 -------------+------------------------------ F( 5, 19900) = 1775.70 Model 1616.39963 5 323.279927 Prob > F = 0.0000 Residual 3622.93905 19900.182057239 R-squared = 0.3085 -------------+------------------------------ Adj R-squared = 0.3083 Total 5239.33869 19905.263217216 Root MSE =.42668 earnwkl Coef. Std. Err. t P> t [95% Conf. Interval] age.0679808.0020033 33.93 0.000.0640542.0719075 age2 -.0006778.0000245-27.69 0.000 -.0007258 -.0006299 educ.069219.0011256 61.50 0.000.0670127.0714252 nonwhite -.1716133.0089118-19.26 0.000 -.1890812 -.1541453 union.1301547.0072923 17.85 0.000.1158613.1444481 _cons 3.630805.0394126 92.12 0.000 3.553553 3.708057. more;. * run regression addinf smsa, region and race fixed-effects;. * the xi command constructs the dummies for you;. * the lowest numbered dummy is usually the;. * omitted variable;. xi: reg earnwkl age age2 educ union i.race i.region i.smsa; i.race _Irace_1-3 (naturally coded; _Irace_1 omitted) 172

i.region _Iregion_1-4 (naturally coded; _Iregion_1 omitted) i.smsa _Ismsa_1-3 (naturally coded; _Ismsa_1 omitted) Source SS df MS Number of obs = 19906 -------------+------------------------------ F( 11, 19894) = 920.86 Model 1767.66908 11 160.697189 Prob > F = 0.0000 Residual 3471.66961 19894.174508375 R-squared = 0.3374 -------------+------------------------------ Adj R-squared = 0.3370 Total 5239.33869 19905.263217216 Root MSE =.41774 earnwkl Coef. Std. Err. t P> t [95% Conf. Interval] age.070194.0019645 35.73 0.000.0663435.0740446 age2 -.0007052.000024-29.37 0.000 -.0007522 -.0006581 educ.0643064.0011285 56.98 0.000.0620944.0665184 union.1131485.007257 15.59 0.000.0989241.1273729 _Irace_2 -.2329794.0110958-21.00 0.000 -.254728 -.2112308 _Irace_3 -.1795253.0134073-13.39 0.000 -.2058047 -.1532458 _Iregion_2 -.0088962.0085926-1.04 0.301 -.0257383.007946 _Iregion_3 -.0281747.008443-3.34 0.001 -.0447238 -.0116257 _Iregion_4.0318053.0089802 3.54 0.000.0142034.0494071 _Ismsa_2 -.1225607.0072078-17.00 0.000 -.1366886 -.1084328 _Ismsa_3 -.2054124.0078651-26.12 0.000 -.2208287 -.1899961 _cons 3.76812.0391241 96.31 0.000 3.691434 3.844807. more;. * close log file;. log close; log: c:\bill\stata\cps87_or.log log type: text closed on: 6 Nov 2004, 08:14:19 173

STATA Program for Probit/Logit Models workplace.do * this data for this program are a random sample; * of 10k observations from the data used in; * evans, farrelly and montgomery, aer, 1999; * the data are indoor workers in the 1991 and 1993; * national health interview survey. the survey; * identifies whether the worker smoked and whether; * the worker faces a workplace smoking ban; * set semi colon as the end of line; # delimit; * ask it NOT to pause; set more off; * open log file; log using c:\bill\jpsm\workplace1.log,replace; * use the workplace data set; use c:\bill\jpsm\workplace1; * print out variable labels; desc; * get summary statistics; sum; * run a linear probability model for comparison purposes; * estimate white standard errors to control for heteroskedasticity; reg smoker age incomel male black hispanic hsgrad somecol college worka, robust; * run probit model; probit smoker age incomel male black hispanic hsgrad somecol college worka; *predict probability of smoking; predict pred_prob_smoke; * get detailed descriptive data about predicted prob; sum pred_prob, detail; * predict binary outcome with 50% cutoff; gen pred_smoke1=pred_prob_smoke>=.5; label variable pred_smoke1 "predicted smoking, 50% cutoff"; * compare actual values; tab smoker pred_smoke1, row col cell; * ask for marginal effects/treatment effects; mfx compute; 174

* the same type of variables can be produced with; * prchange. this command is however more flexible; * in that you can change the reference individual; prchange, help; * get marginal effect/treatment effects for specific person; * male, age 40, college educ, white, without workplace smoking ban; * if a variable is not specified, its value is assumed to be; * the sample mean. in this case, the only variable i am not; * listing is mean log income; prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0); * using a wald test, test the null hypothesis that; * all the education coefficients are zero; test hsgrad somecol college; * how to run the same tets with a -2 log like test; * estimate the unresticted model and save the estimates ; * in urmodel; probit smoker age incomel male black hispanic hsgrad somecol college worka; estimates store urmodel; * estimate the restricted model. save results in rmodel; probit smoker age incomel male black hispanic worka; estimates store rmodel; lrtest urmodel rmodel; * run logit model; logit smoker age incomel male black hispanic hsgrad somecol college worka; * ask for marginal effects/treatment effects; * logit model; mfx compute; log close; 175

STATA Results for Probit/Logit Models workplace.log log: c:\bill\jpsm\workplace1.log log type: text opened on: 4 Nov 2004, 07:29:21. * use the workplace data set;. use c:\bill\jpsm\workplace1;. * print out variable labels;. desc; Contains data from c:\bill\jpsm\workplace1.dta obs: 16,258 vars: 10 28 Oct 2004 05:27 size: 325,160 (96.9% of memory free) > - storage display value variable name type format label variable label > - smoker byte %9.0g is current smoking worka byte %9.0g has workplace smoking bans age byte %9.0g age in years male byte %9.0g male black byte %9.0g black hispanic byte %9.0g hispanic incomel float %9.0g log income hsgrad byte %9.0g is hs graduate somecol byte %9.0g has some college college float %9.0g > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- smoker 16258.25163.433963 0 1 worka 16258.6851396.4644745 0 1 age 16258 38.54742 11.96189 18 87 male 16258.3947595.488814 0 1 black 16258.1119449.3153083 0 1 -------------+-------------------------------------------------------- hispanic 16258.0607086.2388023 0 1 incomel 16258 10.42097.7624525 6.214608 11.22524 hsgrad 16258.3355271.4721889 0 1 somecol 16258.2685447.4432161 0 1 college 16258.3293763.4700012 0 1. * run a linear probability model for comparison purposes; 176

. * estimate white standard errors to control for heteroskedasticity;. reg smoker age incomel male black hispanic > hsgrad somecol college worka, robust; Regression with robust standard errors Number of obs = 16258 F( 9, 16248) = 99.26 Prob > F = 0.0000 R-squared = 0.0488 Root MSE =.42336 Robust smoker Coef. Std. Err. t P> t [95% Conf. Interval] age -.0004776.0002806-1.70 0.089 -.0010276.0000725 incomel -.0287361.0047823-6.01 0.000 -.03811 -.0193621 male.0168615.0069542 2.42 0.015.0032305.0304926 black -.0356723.0110203-3.24 0.001 -.0572732 -.0140714 hispanic -.070582.0136691-5.16 0.000 -.097375 -.043789 hsgrad -.0661429.0162279-4.08 0.000 -.0979514 -.0343345 somecol -.1312175.0164726-7.97 0.000 -.1635056 -.0989293 college -.2406109.0162568-14.80 0.000 -.272476 -.2087459 worka -.066076.0074879-8.82 0.000 -.080753 -.051399 _cons.7530714.0494255 15.24 0.000.6561919.8499509. * run probit model;. probit smoker age incomel male black hispanic > hsgrad somecol college worka; Iteration 0: log likelihood = -9171.443 Iteration 1: log likelihood = -8764.068 Iteration 2: log likelihood = -8761.7211 Iteration 3: log likelihood = -8761.7208 Probit estimates Number of obs = 16258 LR chi2(9) = 819.44 Prob > chi2 = 0.0000 Log likelihood = -8761.7208 Pseudo R2 = 0.0447 smoker Coef. Std. Err. z P> z [95% Conf. Interval] age -.0012684.0009316-1.36 0.173 -.0030943.0005574 incomel -.092812.0151496-6.13 0.000 -.1225047 -.0631193 male.0533213.0229297 2.33 0.020.0083799.0982627 black -.1060518.034918-3.04 0.002 -.17449 -.0376137 hispanic -.2281468.0475128-4.80 0.000 -.3212701 -.1350235 hsgrad -.1748765.0436392-4.01 0.000 -.2604078 -.0893453 somecol -.363869.0451757-8.05 0.000 -.4524118 -.2753262 college -.7689528.0466418-16.49 0.000 -.860369 -.6775366 worka -.2093287.0231425-9.05 0.000 -.2546873 -.1639702 _cons.870543.154056 5.65 0.000.5685989 1.172487. *predict probability of smoking;. predict pred_prob_smoke; 177

(option p assumed; Pr(smoker)). * get detailed descriptive data about predicted prob;. sum pred_prob, detail; Pr(smoker) ------------------------------------------------------------- Percentiles Smallest 1%.0959301.0615221 5%.1155022.0622963 10%.1237434.0633929 Obs 16258 25%.1620851.0733495 Sum of Wgt. 16258 50%.2569962 Mean.2516653 Largest Std. Dev..0960007 75%.3187975.5619798 90%.3795704.5655878 Variance.0092161 95%.4039573.5684112 Skewness.1520254 99%.4672697.6203823 Kurtosis 2.149247. * predict binary outcome with 50% cutoff;. gen pred_smoke1=pred_prob_smoke>=.5;. label variable pred_smoke1 "predicted smoking, 50% cutoff";. * compare actual values;. tab smoker pred_smoke1, row col cell; +-------------------+ Key ------------------- frequency row percentage column percentage cell percentage +-------------------+ predicted smoking, is current 50% cutoff smoking 0 1 Total -----------+----------------------+---------- 0 12,153 14 12,167 99.88 0.12 100.00 74.93 35.90 74.84 74.75 0.09 74.84 -----------+----------------------+---------- 1 4,066 25 4,091 99.39 0.61 100.00 25.07 64.10 25.16 25.01 0.15 25.16 -----------+----------------------+---------- Total 16,219 39 16,258 99.76 0.24 100.00 100.00 100.00 100.00 99.76 0.24 100.00 178

. * ask for marginal effects/treatment effects;. mfx compute; Marginal effects after probit y = Pr(smoker) (predict) =.24093439 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age -.0003951.00029-1.36 0.173 -.000964.000174 38.5474 incomel -.0289139.00472-6.13 0.000 -.03816 -.019668 10.421 male*.0166757.0072 2.32 0.021.002568.030783.39476 black* -.0320621.01023-3.13 0.002 -.052111 -.012013.111945 hispanic* -.0658551.01259-5.23 0.000 -.090536 -.041174.060709 hsgrad* -.053335.01302-4.10 0.000 -.07885 -.02782.335527 somecol* -.1062358.01228-8.65 0.000 -.130308 -.082164.268545 college* -.2149199.01146-18.76 0.000 -.237378 -.192462.329376 worka* -.0668959.00756-8.84 0.000 -.08172 -.052072.68514 (*) dy/dx is for discrete change of dummy variable from 0 to 1. * the same type of variables can be produced with;. * prchange. this command is however more flexible;. * in that you can change the reference individual;. prchange, help; probit: Changes in Predicted Probabilities for smoker min->max 0->1 -+1/2 -+sd/2 MargEfct age -0.0269-0.0004-0.0004-0.0047-0.0004 incomel -0.1589-0.0361-0.0289-0.0220-0.0289 male 0.0167 0.0167 0.0166 0.0081 0.0166 black -0.0321-0.0321-0.0330-0.0104-0.0330 hispanic -0.0659-0.0659-0.0710-0.0170-0.0711 hsgrad -0.0533-0.0533-0.0544-0.0257-0.0545 somecol -0.1062-0.1062-0.1130-0.0502-0.1134 college -0.2149-0.2149-0.2366-0.1123-0.2396 worka -0.0669-0.0669-0.0652-0.0303-0.0652 0 1 Pr(y x) 0.7591 0.2409 age incomel male black hispanic hsgrad somecol x= 38.5474 10.421.39476.111945.060709.335527.268545 sd(x)= 11.9619.762452.488814.315308.238802.472189.443216 college worka x=.329376.68514 sd(x)=.470001.464475 Pr(y x): probability of observing each y for specified x values Avg Chg : average of absolute value of the change across categories Min->Max: change in predicted probability as x changes from its minimum to its maximum 0->1: change in predicted probability as x changes from 0 to 1 -+1/2: change in predicted probability as x changes from 1/2 unit below base value to 1/2 unit above 179

-+sd/2: change in predicted probability as x changes from 1/2 standard dev below base to 1/2 standard dev above MargEfct: the partial derivative of the predicted probability/rate with respect to a given independent variable. * get marginal effect/treatment effects for specific person;. * male, age 40, college educ, white, without workplace smoking ban;. * if a variable is not specified, its value is assumed to be;. * the sample mean. in this case, the only variable i am not;. * listing is mean log income;. prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0); probit: Changes in Predicted Probabilities for smoker min->max 0->1 -+1/2 -+sd/2 MargEfct age -0.0323-0.0005-0.0005-0.0056-0.0005 incomel -0.1795-0.0320-0.0344-0.0263-0.0345 male 0.0198 0.0198 0.0198 0.0097 0.0198 black -0.0385-0.0385-0.0394-0.0124-0.0394 hispanic -0.0804-0.0804-0.0845-0.0202-0.0847 hsgrad -0.0625-0.0625-0.0648-0.0306-0.0649 somecol -0.1235-0.1235-0.1344-0.0598-0.1351 college -0.2644-0.2644-0.2795-0.1335-0.2854 worka -0.0742-0.0742-0.0776-0.0361-0.0777 0 1 Pr(y x) 0.6479 0.3521 age incomel male black hispanic hsgrad somecol x= 40 10.421.39476 0 0 0 0 sd(x)= 11.9619.762452.488814.315308.238802.472189.443216 college worka x=.329376 0 sd(x)=.470001.464475. * using a wald test, test the null hypothesis that;. * all the education coefficients are zero;. test hsgrad somecol college; ( 1) hsgrad = 0 ( 2) somecol = 0 ( 3) college = 0 chi2( 3) = 504.78 Prob > chi2 = 0.0000. * how to run the same tets with a -2 log like test;. * estimate the unresticted model and save the estimates ;. * in urmodel;. probit smoker age incomel male black hispanic > hsgrad somecol college worka; Iteration 0: log likelihood = -9171.443 Iteration 1: log likelihood = -8764.068 Iteration 2: log likelihood = -8761.7211 Iteration 3: log likelihood = -8761.7208 180

Probit estimates Number of obs = 16258 LR chi2(9) = 819.44 Prob > chi2 = 0.0000 Log likelihood = -8761.7208 Pseudo R2 = 0.0447 smoker Coef. Std. Err. z P> z [95% Conf. Interval] age -.0012684.0009316-1.36 0.173 -.0030943.0005574 incomel -.092812.0151496-6.13 0.000 -.1225047 -.0631193 male.0533213.0229297 2.33 0.020.0083799.0982627 black -.1060518.034918-3.04 0.002 -.17449 -.0376137 hispanic -.2281468.0475128-4.80 0.000 -.3212701 -.1350235 hsgrad -.1748765.0436392-4.01 0.000 -.2604078 -.0893453 somecol -.363869.0451757-8.05 0.000 -.4524118 -.2753262 college -.7689528.0466418-16.49 0.000 -.860369 -.6775366 worka -.2093287.0231425-9.05 0.000 -.2546873 -.1639702 _cons.870543.154056 5.65 0.000.5685989 1.172487. estimates store urmodel;. * estimate the restricted model. save results in rmodel;. probit smoker age incomel male black hispanic > worka; Iteration 0: log likelihood = -9171.443 Iteration 1: log likelihood = -9022.2473 Iteration 2: log likelihood = -9022.1031 Probit estimates Number of obs = 16258 LR chi2(6) = 298.68 Prob > chi2 = 0.0000 Log likelihood = -9022.1031 Pseudo R2 = 0.0163 smoker Coef. Std. Err. z P> z [95% Conf. Interval] age.0003514.0009163 0.38 0.701 -.0014445.0021473 incomel -.1802868.0143242-12.59 0.000 -.2083617 -.152212 male -.0117546.0223519-0.53 0.599 -.0555635.0320543 black -.0650982.0345516-1.88 0.060 -.1328181.0026217 hispanic -.152071.0465132-3.27 0.001 -.2432351 -.0609069 worka -.2501544.0227794-10.98 0.000 -.2948012 -.2055076 _cons 1.37729.1472574 9.35 0.000 1.08867 1.665909. estimates store rmodel;. lrtest urmodel rmodel; likelihood-ratio test LR chi2(3) = 520.76 (Assumption: rmodel nested in urmodel) Prob > chi2 = 0.0000. * run logit model;. logit smoker age incomel male black hispanic 181

> hsgrad somecol college worka; Iteration 0: log likelihood = -9171.443 Iteration 1: log likelihood = -8770.6512 Iteration 2: log likelihood = -8760.9282 Iteration 3: log likelihood = -8760.9112 Logit estimates Number of obs = 16258 LR chi2(9) = 821.06 Prob > chi2 = 0.0000 Log likelihood = -8760.9112 Pseudo R2 = 0.0448 smoker Coef. Std. Err. z P> z [95% Conf. Interval] age -.0026236.0015594-1.68 0.092 -.0056799.0004327 incomel -.1518663.0251899-6.03 0.000 -.2012376 -.102495 male.0942472.0390171 2.42 0.016.0177751.1707192 black -.196468.0598366-3.28 0.001 -.3137456 -.0791904 hispanic -.4024453.0825043-4.88 0.000 -.5641507 -.2407399 hsgrad -.2906189.0707661-4.11 0.000 -.429318 -.1519199 somecol -.6092455.073822-8.25 0.000 -.7539339 -.4645571 college -1.325203.0780572-16.98 0.000-1.478192-1.172214 worka -.3508271.0389286-9.01 0.000 -.4271257 -.2745285 _cons 1.467936.255991 5.73 0.000.9662025 1.969669. * ask for marginal effects/treatment effects;. * logit model;. mfx compute; Marginal effects after logit y = Pr(smoker) (predict) =.23812502 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age -.000476.00028-1.68 0.092 -.00103.000078 38.5474 incomel -.0275518.00457-6.03 0.000 -.0365 -.018604 10.421 male*.0171866.00715 2.40 0.016.003174.0312.39476 black* -.0342102.00998-3.43 0.001 -.053765 -.014655.111945 hispanic* -.0661959.01217-5.44 0.000 -.090044 -.042347.060709 hsgrad* -.0513887.01219-4.22 0.000 -.075278 -.0275.335527 somecol* -.102284.01141-8.97 0.000 -.124644 -.079924.268545 college* -.2120833.0108-19.64 0.000 -.233248 -.190919.329376 worka* -.0657566.0075-8.76 0.000 -.080464 -.05105.68514 (*) dy/dx is for discrete change of dummy variable from 0 to 1. log close; log: c:\bill\jpsm\workplace1.log log type: text closed on: 4 Nov 2004, 07:30:16 182

STATA Program for Odds Ratio in Logit Models natal95.do * this data set is a small.005 % random sample; * of observations from the 1995 natality detail; * data. we will examine the impack of smoking: * on birth weight. two large states, NY and CA, do not; * record mothers smoking status. therefore, of the ; * 4 million births in the US, only 3 million have all; * the necessary data so there should be 3 million*.005; * or roughly 15,000 obs; * set semi colon as the end of line; # delimit; * ask it NOT to pause; set more off; * open log file; log using c:\bill\jpsm\natal95.log,replace; * use the natality detail data set; use c:\bill\jpsm\natal95; * print out variable labels; desc; * construct indicator for low birth weight; gen lowbw=birthw<=2500; label variable lowbw "dummy variable, =1 ifbw<2500 grams"; * get frequencies; tab lowbw smoked, col row cell; * run a logit model; xi: logit lowbw smoked age married i.educ5 i.race4; * get marginal effects; mfx compute; * run a logit but report the odds ratios instead; xi: logistic lowbw smoked age married i.educ5 i.race4; log close; 183

STATA Results for Odds Ratio in Logit Models natal95.log log: c:\bill\jpsm\natal95.log log type: text opened on: 4 Nov 2004, 05:48:05. * use the natality detail data set;. use c:\bill\jpsm\natal95;. * print out variable labels;. desc; Contains data from c:\bill\jpsm\natal95.dta obs: 14,230 vars: 7 27 Oct 2004 14:58 size: 170,760 (98.4% of memory free) > - storage display value variable name type format label variable label > - birthw int %9.0g birth weight in grams smoked byte %9.0g =1 if mom smoked during pregnancy age byte %9.0g moms age at birth married byte %9.0g =1 if married race4 byte %9.0g 1=white,2=black,3=asian,4=other educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15, 5=16+ visits byte %9.0g prenatal visits > - Sorted by:. * construct indicator for low birth weight;. gen lowbw=birthw<=2500;. label variable lowbw "dummy variable, =1 ifbw<2500 grams";. * get frequencies;. tab lowbw smoked, col row cell; +-------------------+ Key ------------------- frequency row percentage column percentage cell percentage +-------------------+ dummy variable, 184

=1 =1 if mom smoked ifbw<2500 during pregnancy grams 0 1 Total -----------+----------------------+---------- 0 11,626 1,745 13,371 86.95 13.05 100.00 94.64 89.72 93.96 81.70 12.26 93.96 -----------+----------------------+---------- 1 659 200 859 76.72 23.28 100.00 5.36 10.28 6.04 4.63 1.41 6.04 -----------+----------------------+---------- Total 12,285 1,945 14,230 86.33 13.67 100.00 100.00 100.00 100.00 86.33 13.67 100.00. * run a logit model;. xi: logit lowbw smoked age married i.educ5 i.race4; i.educ5 _Ieduc5_1-5 (naturally coded; _Ieduc5_1 omitted) i.race4 _Irace4_1-4 (naturally coded; _Irace4_1 omitted) Iteration 0: log likelihood = -3244.039 Iteration 1: log likelihood = -3149.3534 Iteration 2: log likelihood = -3137.0703 Iteration 3: log likelihood = -3136.9913 Iteration 4: log likelihood = -3136.9912 Logit estimates Number of obs = 14230 LR chi2(10) = 214.10 Prob > chi2 = 0.0000 Log likelihood = -3136.9912 Pseudo R2 = 0.0330 lowbw Coef. Std. Err. z P> z [95% Conf. Interval] smoked.6740651.0897869 7.51 0.000.4980861.8500441 age.0080537.006791 1.19 0.236 -.0052564.0213638 married -.3954044.0882471-4.48 0.000 -.5683654 -.2224433 _Ieduc5_2 -.1949335.1626502-1.20 0.231 -.5137221.1238551 _Ieduc5_3 -.1925099.1543239-1.25 0.212 -.4949791.1099594 _Ieduc5_4 -.4057382.1676759-2.42 0.016 -.7343769 -.0770994 _Ieduc5_5 -.3569715.1780322-2.01 0.045 -.7059081 -.0080349 _Irace4_2.7072894.0875125 8.08 0.000.5357681.8788107 _Irace4_3.386623.307062 1.26 0.208 -.2152075.9884535 _Irace4_4.3095536.2047899 1.51 0.131 -.0918271.7109344 _cons -2.755971.2104916-13.09 0.000-3.168527-2.343415. * get marginal effects;. mfx compute; Marginal effects after logit y = Pr(lowbw) (predict) 185

=.05465609 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- smoked*.0436744.00706 6.18 0.000.029834.057514.136683 age.0004161.00035 1.19 0.236 -.000271.001104 26.6564 married* -.0218806.0052-4.21 0.000 -.032074 -.011687.683204 _Ieduc~2* -.0095123.00749-1.27 0.204 -.024188.005164.165495 _Ieduc~3* -.0096965.00758-1.28 0.201 -.024554.005161.345397 _Ieduc~4* -.0190499.00714-2.67 0.008 -.033043 -.005057.22319 _Ieduc~5* -.0169077.00771-2.19 0.028 -.032027 -.001788.216093 _Irace~2*.0453844.00675 6.72 0.000.032148.058621.17168 _Irace~3*.0236917.02204 1.07 0.282 -.019506.06689.010401 _Irace~4*.018225.01363 1.34 0.181 -.008488.044938.031694 (*) dy/dx is for discrete change of dummy variable from 0 to 1. * run a logit but report the odds ratios instead;. xi: logistic lowbw smoked age married i.educ5 i.race4; i.educ5 _Ieduc5_1-5 (naturally coded; _Ieduc5_1 omitted) i.race4 _Irace4_1-4 (naturally coded; _Irace4_1 omitted) Logistic regression Number of obs = 14230 LR chi2(10) = 214.10 Prob > chi2 = 0.0000 Log likelihood = -3136.9912 Pseudo R2 = 0.0330 lowbw Odds Ratio Std. Err. z P> z [95% Conf. Interval] smoked 1.962198.1761796 7.51 0.000 1.645569 2.33975 age 1.008086.0068459 1.19 0.236.9947574 1.021594 married.6734077.0594262-4.48 0.000.5664506.8005604 _Ieduc5_2.8228894.1338431-1.20 0.231.5982646 1.131852 _Ieduc5_3.8248862.1272996-1.25 0.212.6095837 1.116233 _Ieduc5_4.6664847.1117534-2.42 0.016.4798043.9257979 _Ieduc5_5.6997924.1245856-2.01 0.045.4936601.9919973 _Irace4_2 2.028485.1775178 8.08 0.000 1.70876 2.408034 _Irace4_3 1.472001.4519957 1.26 0.208.8063741 2.687076 _Irace4_4 1.362817.2790911 1.51 0.131.9122628 2.035893. log close; log: c:\bill\jpsm\natal95.log log type: text closed on: 4 Nov 2004, 05:48:39 * this example is attributed to jeff smith from; * the economics department at michigan. the data; * set contains a sample of 1500 females who; * participated in the job training partnership act program; * each respondent could have received one of 4 job training; * services. 1=classroom training. 2=on the job training; * 3= job search assistance, 4=other; 186

STATA Program for Ordered Probit Models sr_health_status.do * this data for this example are adults, 18-64; * who answered the cancer control supplement to; * the 1994 national health interview survey; * the key outcome is self reported health status; * coded 1-5, poor, fair, good, very good, excellent; * a ke covariate is current smoking status and whether; * one smoked 5 years ago; # delimit; set memory 20m; set matsize 200; set more off; log using c:\bill\jpsm\sr_health_status.log,replace; * load up sas data set; use c:\bill\jpsm\sr_health_status; * get contents of data file; desc; * get summary statistics; sum; * get tabulation of sr_health; tab sr_health; * run OLS models, just to look at the raw correlations in data; reg sr_health male age educ famincl black othrace smoke smoke5; * do ordered probit, self reported health status; oprobit sr_health male age educ famincl black othrace smoke smoke5; * get marginal effects, evaluated at y=5 (excellent); mfx compute, predict(outcome(5)); * get marginal effects, evaluated at y=3 (good); mfx compute, predict(outcome(3)); * use prchange, evaluate marginal effects for; * 40 year old white female with a college degree; * never smoked with average log income; prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16); log close; 187

STATA Results for Ordered Probit Models sr_health_status.log log: c:\bill\iadb\sr_health_status.log log type: text opened on: 1 Nov 2004, 12:06:56. * load up sas data set;. use sr_health_status;. * get contents of data file;. desc; Contains data from sr_health_status.dta obs: 12,900 vars: 9 1 Nov 2004 11:51 size: 322,500 (98.5% of memory free) > - storage display value variable name type format label variable label > - male byte %9.0g =1 if male age byte %9.0g age in years educ byte %9.0g years of education smoke byte %9.0g current smoker smoke5 byte %9.0g smoked in past 5 years black float %9.0g =1 if respondent is black othrace float %9.0g =1 if other race (white is ref) sr_health float %9.0g 1-5 self reported health, 5=excel, 1=poor famincl float %9.0g log family income > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- male 12900.438062.4961681 0 1 age 12900 39.84124 11.60603 21 64 educ 12900 13.24016 2.73325 0 18 smoke 12900.2891473.453384 0 1 smoke5 12900.0813953.2734519 0 1 -------------+-------------------------------------------------------- black 12900.1242636.3298948 0 1 othrace 12900.0412403.1988532 0 1 sr_health 12900 3.888992 1.063713 1 5 famincl 12900 10.21313.95086 6.214608 11.22524. * get tabulation of sr_health;. tab sr_health; 188

1-5 self reported health, 5=excel, 1=poor Freq. Percent Cum. ------------+----------------------------------- 1 342 2.65 2.65 2 991 7.68 10.33 3 3,068 23.78 34.12 4 3,855 29.88 64.00 5 4,644 36.00 100.00 ------------+----------------------------------- Total 12,900 100.00. * run OLS models, just to look at the raw correlations in data;. reg sr_health male age educ famincl black othrace smoke smoke5; Source SS df MS Number of obs = 12900 -------------+------------------------------ F( 8, 12891) = 350.85 Model 2609.62058 8 326.202572 Prob > F = 0.0000 Residual 11985.4163 12891.929750704 R-squared = 0.1788 -------------+------------------------------ Adj R-squared = 0.1783 Total 14595.0369 12899 1.13148592 Root MSE =.96424 sr_health Coef. Std. Err. t P> t [95% Conf. Interval] male.1033877.0172399 6.00 0.000.0695949.1371804 age -.0189687.0007472-25.39 0.000 -.0204333 -.0175041 educ.074539.0033897 21.99 0.000.0678946.0811833 famincl.2299388.0099542 23.10 0.000.2104271.2494504 black -.2127016.0265726-8.00 0.000 -.2647878 -.1606153 othrace -.2120907.0429632-4.94 0.000 -.2963049 -.1278765 smoke -.1800193.0196221-9.17 0.000 -.2184815 -.1415572 smoke5 -.1356116.0317119-4.28 0.000 -.1977716 -.0734515 _cons 1.362405.1005616 13.55 0.000 1.165289 1.55952. * do ordered probit, self reported health status;. oprobit sr_health male age educ famincl black othrace smoke smoke5; Iteration 0: log likelihood = -17591.791 Iteration 1: log likelihood = -16403.785 Iteration 2: log likelihood = -16401.987 Iteration 3: log likelihood = -16401.987 Ordered probit estimates Number of obs = 12900 LR chi2(8) = 2379.61 Prob > chi2 = 0.0000 Log likelihood = -16401.987 Pseudo R2 = 0.0676 sr_health Coef. Std. Err. z P> z [95% Conf. Interval] male.1281241.0195747 6.55 0.000.0897583.1664899 age -.0202308.0008499-23.80 0.000 -.0218966 -.018565 189

educ.0827086.0038547 21.46 0.000.0751535.0902637 famincl.2398957.0112206 21.38 0.000.2179037.2618878 black -.221508.029528-7.50 0.000 -.2793818 -.1636341 othrace -.2425083.0480047-5.05 0.000 -.3365958 -.1484208 smoke -.2086096.0219779-9.49 0.000 -.2516855 -.1655337 smoke5 -.1529619.0357995-4.27 0.000 -.2231277 -.0827961 _cut1.4858634.113179 (Ancillary parameters) _cut2 1.269036.11282 _cut3 2.247251.1138171 _cut4 3.094606.1145781. * get marginal effects, evaluated at y=5 (excellent);. mfx compute, predict(outcome(5)); Marginal effects after oprobit y = Pr(sr_health==5) (predict, outcome(5)) =.34103717 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- male*.0471251.00722 6.53 0.000.03298.06127.438062 age -.0074214.00031-23.77 0.000 -.008033 -.00681 39.8412 educ.0303405.00142 21.42 0.000.027565.033116 13.2402 famincl.0880025.00412 21.37 0.000.07993.096075 10.2131 black* -.0781411.00996-7.84 0.000 -.097665 -.058617.124264 othrace* -.0843227.01567-5.38 0.000 -.115043 -.053602.04124 smoke* -.0749785.00773-9.71 0.000 -.09012 -.059837.289147 smoke5* -.0545062.01235-4.41 0.000 -.078719 -.030294.081395 (*) dy/dx is for discrete change of dummy variable from 0 to 1. * get marginal effects, evaluated at y=3 (good);. mfx compute, predict(outcome(3)); Marginal effects after oprobit y = Pr(sr_health==3) (predict, outcome(3)) =.25239744 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- male* -.0276959.00425-6.51 0.000 -.036029 -.019363.438062 age.0043717.0002 21.81 0.000.003979.004765 39.8412 educ -.0178727.00089-20.02 0.000 -.019623 -.016123 13.2402 famincl -.0518395.00261-19.85 0.000 -.056959 -.04672 10.2131 black*.0464219.00599 7.75 0.000.034675.058169.124264 othrace*.0501493.00934 5.37 0.000.031834.068464.04124 smoke*.0443735.00464 9.56 0.000.035272.053476.289147 smoke5*.0323707.00739 4.38 0.000.017882.04686.081395 (*) dy/dx is for discrete change of dummy variable from 0 to 1. * use prchange, evaluate marginal effects for;. * 40 year old white female with a college degree;. * never smoked with average log income;. prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16); 190

oprobit: Changes in Predicted Probabilities for sr_health male Avg Chg 1 2 3 4 0->1.0203868 -.0020257 -.00886671 -.02677558 -.01329902 5 0->1.05096698 age Avg Chg 1 2 3 4 Min->Max.13358317.0184785.06797072.17686112.07064757 -+1/2.00321942.00032518.00141642.00424452.00206241 -+sd/2.03728014.00382077.01648743.04910323.0237889 MargEfct.00321947.00032515.00141639.00424462.00206252 5 Min->Max -.33395794 -+1/2 -.00804856 -+sd/2 -.09320036 MargEfct -.00804868 educ Avg Chg 1 2 3 4 Min->Max.21397413 -.10945692 -.19725057 -.22822781.07974288 -+1/2.01315829 -.00133136 -.00579271 -.01734608 -.00842556 -+sd/2.03589903 -.0036753 -.01587057 -.04728749 -.02291423 MargEfct.01316202 -.0013293 -.00579057 -.01735309 -.00843208 5 Min->Max.45519245 -+1/2.03289571 -+sd/2.08974758 MargEfct.03290504 famincl Avg Chg 1 2 3 4 Min->Max.16759798 -.05486112 -.13623201 -.22790183.00276569 -+1/2.03808549 -.00390581 -.01684746 -.05016185 -.02429861 -+sd/2.03622223 -.0037093 -.01601486 -.04771243 -.02311897 MargEfct.03817633 -.00385563 -.0167955 -.05033251 -.02445719 5 Min->Max.41622926 -+1/2.09521371 -+sd/2.09055558 MargEfct.09544083 black Avg Chg 1 2 3 4 0->1.03467907.00473166.01835598.04779626.01581377 othrace 5 0->1 -.08669767 191

Avg Chg 1 2 3 4 0->1.03787661.00532324.02040636.05239134.0165706 5 0->1 -.09469151 smoke Avg Chg 1 2 3 4 0->1.03270518.00438228.01712416.04497364.01528287 5 0->1 -.08176297 smoke5 Avg Chg 1 2 3 4 0->1.02411037.00299019.012047.03281575.01242298 5 0->1 -.06027591 1 2 3 4 5 Pr(y x).00563112.03431748.17979275.30986777.47039089 male age educ famincl black othrace smoke x=.438062 40 16 10.2131 0 0 0 sd(x)=.496168 11.606 2.73325.95086.329895.198853.453384 smoke5 x= 0 sd(x)=.273452. log close; log: c:\bill\iadb\sr_health_status.log log type: text closed on: 1 Nov 2004, 12:07:40 192

STATA Program for Multinomial Logit Model Job_training_example.do * set end of line marker; # delimit; set more off; * increase memory; set memory 20m; * write results to file; log using c:\bill\jpsm\job_training_example.log,replace; * load up sas data set; use c:\bill\jpsm\job_training_example; * get contents of data file; desc; * get summary statistics; sum; * get frequency of choice variable; tab choice; * run multinomial logit. omitted groups are; * whites, those with > 12 years of ed, those w/ work experience; * base(#) tells STATA what category should be the reference option; * base(4) is using other as the reference group; mlogit choice age black hisp nvrwrk lths hsgrad, base(4); * get marginal effects for the 4 options, on the job training; mfx compute, predict(outcome(1)); mfx compute, predict(outcome(2)); mfx compute, predict(outcome(3)); mfx compute, predict(outcome(4)); * test for IIA using the Hausam test; * the program eliminates one choice at ; * a time then compares the unrestricted; * estimates to the restricted ones; mlogtest, hausman; log close; 193

STATA Reults for Multinomial Logit Model Job_training_example.log log: c:\bill\jpsm\job_training_example.log log type: text opened on: 27 May 2006, 06:15:58. * load up sas data set;. use c:\bill\jpsm\job_training_example;. * get contents of data file;. desc; Contains data from c:\bill\jpsm\job_training_example.dta obs: 1,500 vars: 9 17 May 2006 15:09 size: 24,000 (99.9% of memory free) > - storage display value variable name type format label variable label > - pid long %10.0g personal ID number age byte %4.0f age in years lths byte %9.0g =1 if education < hs grad hsgrad byte %9.0g =1 if education is 12 years gths byte %9.0g =1 of education is > 12 years black byte %9.0g =1 if black, =0 otherwise hisp byte %9.0g =1 if hispanic, =0 otherwise nvrwrk byte %9.0g =1 if never worked, =0 otherwise choice byte %9.0g > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- pid 1500 129138 19201.99 100139 167859 age 1500 32.904 9.241558 22 73 lths 1500.3806667.4857127 0 1 hsgrad 1500.4393333.4964714 0 1 gths 1500.18.3843156 0 1 -------------+-------------------------------------------------------- black 1500.296.4566432 0 1 hisp 1500.1113333.3146494 0 1 nvrwrk 1500.1533333.3604287 0 1 choice 1500 2.195333 1.19029 1 4. * get frequency of choice variable;. tab choice; 194

choice Freq. Percent Cum. ------------+----------------------------------- 1 642 42.80 42.80 2 225 15.00 57.80 3 331 22.07 79.87 4 302 20.13 100.00 ------------+----------------------------------- Total 1,500 100.00. * run multinomial logit. omitted groups are;. * whites, those with > 12 years of ed, those w/ work experience;. * base(#) tells STATA what category should be the reference option;. * base(4) is using other as the reference group;. mlogit choice age black hisp nvrwrk lths hsgrad, base(4); Iteration 0: log likelihood = -1955.8922 Iteration 1: log likelihood = -1889.2935 Iteration 2: log likelihood = -1888.2987 Iteration 3: log likelihood = -1888.2957 Iteration 4: log likelihood = -1888.2957 Multinomial logistic regression Number of obs = 1500 LR chi2(18) = 135.19 Prob > chi2 = 0.0000 Log likelihood = -1888.2957 Pseudo R2 = 0.0346 choice Coef. Std. Err. z P> z [95% Conf. Interval] 1 age.0071385.0081098 0.88 0.379 -.0087564.0230334 black 1.219628.1833561 6.65 0.000.8602566 1.578999 hisp.0372041.2238755 0.17 0.868 -.4015838.475992 nvrwrk.0747461.190311 0.39 0.694 -.2982567.4477489 lths -.0084065.2065292-0.04 0.968 -.4131964.3963833 hsgrad.3780081.2079569 1.82 0.069 -.0295799.785596 _cons.0295614.3287135 0.09 0.928 -.6147052.6738279 2 age.008348.0099828 0.84 0.403 -.011218.0279139 black.5236467.2263064 2.31 0.021.0800942.9671992 hisp -.8671109.3589538-2.42 0.016-1.570647 -.1635743 nvrwrk -.704571.2840205-2.48 0.013-1.261241 -.1479011 lths -.3472458.2454952-1.41 0.157 -.8284075.1339159 hsgrad -.0812244.2454501-0.33 0.741 -.5622979.399849 _cons -.3362433.3981894-0.84 0.398-1.11668.4441936 3 age.030957.0087291 3.55 0.000.0138483.0480657 black.835996.2102365 3.98 0.000.4239399 1.248052 hisp.5933104.2372465 2.50 0.012.1283157 1.058305 nvrwrk -.6829221.2432276-2.81 0.005-1.159639 -.2062047 lths -.4399217.2281054-1.93 0.054 -.887.0071566 hsgrad.1041374.2248972 0.46 0.643 -.3366529.5449278 _cons -.9863286.3613369-2.73 0.006-1.694536 -.2781213 (Outcome choice==4 is the comparison group) 195

. * get marginal effects for the 4 options, on the job training;. mfx compute, predict(outcome(1)); Marginal effects after mlogit y = Pr(choice==1) (predict, outcome(1)) =.43659091 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age -.0017587.00146-1.21 0.228 -.004618.001101 32.904 black*.179935.03034 5.93 0.000.120472.239398.296 hisp* -.0204535.04343-0.47 0.638 -.105568.064661.111333 nvrwrk*.1209001.03702 3.27 0.001.048352.193448.153333 lths*.0615804.03864 1.59 0.111 -.014162.137323.380667 hsgrad*.0881309.03679 2.40 0.017.016015.160247.439333 (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(2)); Marginal effects after mlogit y = Pr(choice==2) (predict, outcome(2)) =.14782959 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age -.0004167.00102-0.41 0.683 -.002415.001582 32.904 black* -.0422033.01899-2.22 0.026 -.079433 -.004974.296 hisp* -.1000902.02168-4.62 0.000 -.142578 -.057603.111333 nvrwrk* -.0648702.02278-2.85 0.004 -.109524 -.020217.153333 lths* -.0287375.02424-1.19 0.236 -.076244.018769.380667 hsgrad* -.0376757.02394-1.57 0.115 -.084588.009237.439333 (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(3)); Marginal effects after mlogit y = Pr(choice==3) (predict, outcome(3)) =.22017632 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age.0043574.00112 3.87 0.000.002153.006561 32.904 black*.0006449.02521 0.03 0.980 -.048765.050055.296 hisp*.1365429.04163 3.28 0.001.054948.218138.111333 nvrwrk* -.0932408.02627-3.55 0.000 -.144732 -.04175.153333 lths* -.0621007.02926-2.12 0.034 -.119449 -.004752.380667 hsgrad* -.0161374.02891-0.56 0.577 -.072798.040523.439333 (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(4)); Marginal effects after mlogit y = Pr(choice==4) (predict, outcome(4)) 196

=.19540318 variable dy/dx Std. Err. z P> z [ 95% C.I. ] X ---------+-------------------------------------------------------------------- age -.002182.00116-1.88 0.060 -.004459.000095 32.904 black* -.1383767.02096-6.60 0.000 -.179454 -.097299.296 hisp* -.0159992.02986-0.54 0.592 -.074524.042525.111333 nvrwrk*.0372109.0308 1.21 0.227 -.023149.09757.153333 lths*.0292578.03014 0.97 0.332 -.029808.088324.380667 hsgrad* -.0343177.02938-1.17 0.243 -.0919.023264.439333 (*) dy/dx is for discrete change of dummy variable from 0 to 1. * test for IIA using the Hausam test;. * the program eliminates one choice at ;. * a time then compares the unrestricted;. * estimates to the restricted ones;. mlogtest, hausman; **** Hausman tests of IIA assumption Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted chi2 df P>chi2 evidence ---------+------------------------------------ 1-5.283 14 1.000 for Ho 2 0.353 14 1.000 for Ho 3 2.041 14 1.000 for Ho ----------------------------------------------. log close; log: c:\bill\jpsm\job_training_example.log log type: text closed on: 27 May 2006, 06:17:03 197

STATA Program for Conditional and Mixed Logit Models Travel_choice_example.do * set end of line marker; # delimit; set more off; * increase memory; set memory 20m; * write results to file; log using c:\bill\jpsm\travel_choice_example.log,replace; * load up sas data set; use c:\bill\jpsm\travel_choice_example; * get contents of data file; desc; * get summary statistics; sum; * get freqency of options; tab choice; * construct dummy variables for intercepts; * with j choices, need j-1 options; gen air=mode==1; gen train=mode==2; gen bus=mode==3; gen car=mode==4; * interact hhinc and group size with choice dummies; gen hhinc_air=air*hhinc; gen hhinc_train=train*hhinc; gen hhinc_bus=bus*hhinc; * if mode of transportation is a car, costs are costs; * if mode is bus/train/air, costs are grp_size x costs; gen group_costs=car*costs+(1-car)*groupsize*costs; * get means by choices; sum time group_costs if mode==1; sum time group_costs if mode==2; sum time group_costs if mode==3; sum time group_costs if mode==4; * run mcfaddens choice model. for covariates add; * a) j-1 option dummies; * c) variables that vary by choice; 198

clogit choice air train bus time group_costs, group(hhid); * run another model but add; * c) income and interacted w/ choice dummies; clogit choice air train bus time group_costs hhinc_*, group(hhid); * print out odds ratios; listcoef; * in this section we simulate the change in the; * choices if we increase the travel time; * by car by 30 minutes; * get the predicted probabilities given original; * values of Xs; predict pred0; * for mode=4, add 30 minutes; replace time=time+30 if mode==4; * get new predicted probabilities with new time; predict pred30; * change in probabilities; gen change_p=pred30-pred0; * get means of change in probs; sum change_p if mode==1; sum change_p if mode==2; sum change_p if mode==3; sum change_p if mode==4; * before you forget, change time back to; * original value; replace time=time-30 if mode==4; log close; 199

STATA Results for Conditional and Mixed Logit Models Travel_choice_example.log log: c:\bill\jpsm\travel_choice_example.log log type: text opened on: 27 May 2006, 07:42:17. * load up sas data set;. use c:\bill\jpsm\travel_choice_example;. * get contents of data file;. desc; Contains data from c:\bill\jpsm\travel_choice_example.dta obs: 840 vars: 7 17 May 2006 14:08 size: 11,760 (99.9% of memory free) > - storage display value variable name type format label variable label > - hhid int %8.0g household ID mode byte %8.0g 1=air, 2=train, 3=bus, 4=car choice byte %8.0g =1 if choice, =0 otherwise time int %8.0g travel time in minutes costs int %8.0g travel costs in dollars hhinc byte %8.0g household income (x1000) groupsize byte %8.0g # of people in traveling party > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- hhid 840 1105.5 60.65721 1001 1210 mode 840 2.5 1.1187 1 4 choice 840.25.4332707 0 1 time 840 520.7548 294.0959 80 1440 costs 840 110.8798 47.97835 30 269 -------------+-------------------------------------------------------- hhinc 840 34.54762 19.67604 2 72 groupsize 840 1.742857 1.01035 1 6. * get freqency of options;. tab choice; =1 if choice, =0 otherwise Freq. Percent Cum. ------------+----------------------------------- 200

0 630 75.00 75.00 1 210 25.00 100.00 ------------+----------------------------------- Total 840 100.00. * construct dummy variables for intercepts;. * with j choices, need j-1 options;. gen air=mode==1;. gen train=mode==2;. gen bus=mode==3;. gen car=mode==4;. * interact hhinc and group size with choice dummies;. gen hhinc_air=air*hhinc;. gen hhinc_train=train*hhinc;. gen hhinc_bus=bus*hhinc;. * if mode of transportation is a car, costs are costs;. * if mode is bus/train/air, costs are grp_size x costs;. gen group_costs=car*costs+(1-car)*groupsize*costs;. * get means by choices;. sum time group_costs if mode==1; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- time 210 194.719 53.04284 80 397 group_costs 210 174.1905 100.7172 58 495. sum time group_costs if mode==2; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- time 210 643.9762 255.2972 265 1148 group_costs 210 237.1667 195.2864 42 1015. sum time group_costs if mode==3; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- time 210 671.119 236.9175 255 1150 group_costs 210 212.3952 172.224 45 910. sum time group_costs if mode==4; Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- time 210 573.2048 274.8547 180 1440 group_costs 210 95.41429 46.82743 30 238. * run mcfaddens choice model. for covariates add;. * a) j-1 option dummies; 201