STATA Program for OLS cps87_or.do

Size: px
Start display at page:

Download "STATA Program for OLS cps87_or.do"

Transcription

1 STATA Program for OLS cps87_or.do * the data for this project is a small subsample; * of full time (30 or more hours) male workers; * aged from the out going rotation; * samples of the 1987 current population survey; * this line defines the semicolon as the ; * end of line delimiter; # delimit ; * set memork for 10 meg; set memory 10m; * write results to a log file; * the replace options writes over old; * log files; log using cps87_or.log,replace; * open stata data set; use c:\bill\stata\cps87_or; * list variables and labels in data set; desc; * generate new variables; * lines 1-2 illustrate basic math functoins; * lines 3-4 line illustrate logical operators; * line 5 illustrate the OR statement; * line 6 illustrates the AND statement; * after you construct new variables, compress the data again; gen age2=age*age; gen earnwkl=ln(earnwke); gen union=unionm==1; gen topcode=earnwke==999; gen nonwhite=((race==2) (race==3)); gen big_ne=((region==1)&(smsa==1)); * label the data; label var age2 "age squared"; label var earnwkl "log earnings per week"; label var topcode "=1 if earnwkl is topcoded"; label var union "1=in union, 0 otherwise"; label var nonwhite "1=nonwhite, 0=white" ; label var big_ne "1= live in big smsa from northeast, 0=otherwsie"; * get descriptive statistics; sum; * get detailed descriptcs for continuous variables; sum earnwke, detail; 167

2 * get frequencies of discrete variables; tabulate unionm; tabulate race; * get two-way table of frequencies; tabulate region smsa, row column cell; *run simple regression; reg earnwkl age age2 educ nonwhite union; * run regression addinf smsa, region and race fixed-effects; * the xi command constructs the dummies for you; * the lowest numbered dummy is usually the; * omitted variable; xi: reg earnwkl age age2 educ union i.race i.region i.smsa; more; * close log file; log close; 168

3 STATA Results for OLS cps87_do.log log: c:\bill\stata\cps87_or.log log type: text opened on: 6 Nov 2004, 08:14:10. * open stata data set;. use c:\bill\stata\cps87_or;. * list variables and labels in data set;. desc; Contains data from c:\bill\stata\cps87_or.dta obs: 19,906 vars: 7 6 Nov :11 size: 636,992 (93.9% of memory free) > - storage display value variable name type format label variable label > - age float %9.0g age in years race float %9.0g 1=white, non-hisp, 2=place, n.h, 3=hisp educ float %9.0g years of education unionm float %9.0g 1=union member, 2=otherwise smsa float %9.0g 1=live in 19 largest smsa, 2=other smsa, 3=non smsa region float %9.0g 1=east, 2=midwest, 3=south, 4=west earnwke float %9.0g usual weekly earnings > - Sorted by:. * generate new variables;. * lines 1-2 illustrate basic math functoins;. * lines 3-4 line illustrate logical operators;. * line 5 illustrate the OR statement;. * line 6 illustrates the AND statement;. * after you construct new variables, compress the data again;. gen age2=age*age;. gen earnwkl=ln(earnwke);. gen union=unionm==1;. gen topcode=earnwke==999;. gen nonwhite=((race==2) (race==3));. gen big_ne=((region==1)&(smsa==1)); 169

4 . * label the data;. label var age2 "age squared";. label var earnwkl "log earnings per week";. label var topcode "=1 if earnwkl is topcoded";. label var union "1=in union, 0 otherwise";. label var nonwhite "1=nonwhite, 0=white" ;. label var big_ne "1= live in big smsa from northeast, 0=otherwsie";. compress; age was float now byte race was float now byte educ was float now byte unionm was float now byte smsa was float now byte region was float now byte earnwke was float now int age2 was float now int union was float now byte topcode was float now byte nonwhite was float now byte big_ne was float now byte. more;. * get descriptive statistics;. sum; Variable Obs Mean Std. Dev. Min Max age race educ unionm smsa region earnwke age earnwkl union topcode nonwhite big_ne * get detailed descriptics for continuous variables;. sum earnwke, detail; usual weekly earnings Percentiles Smallest 1%

5 5% % Obs % Sum of Wgt % 449 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis more;. * get frequencies of discrete variables;. tabulate unionm; 1=union member, 2=otherwise Freq. Percent Cum , , Total 19, tabulate race; 1=white, non-hisp, 2=place, n.h, 3=hisp Freq. Percent Cum , , , Total 19, more;. * get two-way table of frequencies;. tabulate region smsa, row column cell; Key frequency row percentage column percentage cell percentage =east, 2=midwest, 1=live in 19 largest smsa, 3=south, 2=other smsa, 3=non smsa 4=west Total

6 1 2,806 1, , ,501 1,742 1,592 4, ,501 2,542 1,904 5, ,487 1,507 1,133 4, Total 7,295 7,140 5,471 19, more;. *run simple regression;. reg earnwkl age age2 educ nonwhite union; Source SS df MS Number of obs = F( 5, 19900) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = earnwkl Coef. Std. Err. t P> t [95% Conf. Interval] age age educ nonwhite union _cons more;. * run regression addinf smsa, region and race fixed-effects;. * the xi command constructs the dummies for you;. * the lowest numbered dummy is usually the;. * omitted variable;. xi: reg earnwkl age age2 educ union i.race i.region i.smsa; i.race _Irace_1-3 (naturally coded; _Irace_1 omitted) 172

7 i.region _Iregion_1-4 (naturally coded; _Iregion_1 omitted) i.smsa _Ismsa_1-3 (naturally coded; _Ismsa_1 omitted) Source SS df MS Number of obs = F( 11, 19894) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = earnwkl Coef. Std. Err. t P> t [95% Conf. Interval] age age educ union _Irace_ _Irace_ _Iregion_ _Iregion_ _Iregion_ _Ismsa_ _Ismsa_ _cons more;. * close log file;. log close; log: c:\bill\stata\cps87_or.log log type: text closed on: 6 Nov 2004, 08:14:19 173

8 STATA Program for Probit/Logit Models workplace.do * this data for this program are a random sample; * of 10k observations from the data used in; * evans, farrelly and montgomery, aer, 1999; * the data are indoor workers in the 1991 and 1993; * national health interview survey. the survey; * identifies whether the worker smoked and whether; * the worker faces a workplace smoking ban; * set semi colon as the end of line; # delimit; * ask it NOT to pause; set more off; * open log file; log using c:\bill\jpsm\workplace1.log,replace; * use the workplace data set; use c:\bill\jpsm\workplace1; * print out variable labels; desc; * get summary statistics; sum; * run a linear probability model for comparison purposes; * estimate white standard errors to control for heteroskedasticity; reg smoker age incomel male black hispanic hsgrad somecol college worka, robust; * run probit model; probit smoker age incomel male black hispanic hsgrad somecol college worka; *predict probability of smoking; predict pred_prob_smoke; * get detailed descriptive data about predicted prob; sum pred_prob, detail; * predict binary outcome with 50% cutoff; gen pred_smoke1=pred_prob_smoke>=.5; label variable pred_smoke1 "predicted smoking, 50% cutoff"; * compare actual values; tab smoker pred_smoke1, row col cell; * ask for marginal effects/treatment effects; mfx compute; 174

9 * the same type of variables can be produced with; * prchange. this command is however more flexible; * in that you can change the reference individual; prchange, help; * get marginal effect/treatment effects for specific person; * male, age 40, college educ, white, without workplace smoking ban; * if a variable is not specified, its value is assumed to be; * the sample mean. in this case, the only variable i am not; * listing is mean log income; prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0); * using a wald test, test the null hypothesis that; * all the education coefficients are zero; test hsgrad somecol college; * how to run the same tets with a -2 log like test; * estimate the unresticted model and save the estimates ; * in urmodel; probit smoker age incomel male black hispanic hsgrad somecol college worka; estimates store urmodel; * estimate the restricted model. save results in rmodel; probit smoker age incomel male black hispanic worka; estimates store rmodel; lrtest urmodel rmodel; * run logit model; logit smoker age incomel male black hispanic hsgrad somecol college worka; * ask for marginal effects/treatment effects; * logit model; mfx compute; log close; 175

10 STATA Results for Probit/Logit Models workplace.log log: c:\bill\jpsm\workplace1.log log type: text opened on: 4 Nov 2004, 07:29:21. * use the workplace data set;. use c:\bill\jpsm\workplace1;. * print out variable labels;. desc; Contains data from c:\bill\jpsm\workplace1.dta obs: 16,258 vars: Oct :27 size: 325,160 (96.9% of memory free) > - storage display value variable name type format label variable label > - smoker byte %9.0g is current smoking worka byte %9.0g has workplace smoking bans age byte %9.0g age in years male byte %9.0g male black byte %9.0g black hispanic byte %9.0g hispanic incomel float %9.0g log income hsgrad byte %9.0g is hs graduate somecol byte %9.0g has some college college float %9.0g > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max smoker worka age male black hispanic incomel hsgrad somecol college * run a linear probability model for comparison purposes; 176

11 . * estimate white standard errors to control for heteroskedasticity;. reg smoker age incomel male black hispanic > hsgrad somecol college worka, robust; Regression with robust standard errors Number of obs = F( 9, 16248) = Prob > F = R-squared = Root MSE = Robust smoker Coef. Std. Err. t P> t [95% Conf. Interval] age incomel male black hispanic hsgrad somecol college worka _cons * run probit model;. probit smoker age incomel male black hispanic > hsgrad somecol college worka; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Probit estimates Number of obs = LR chi2(9) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker Coef. Std. Err. z P> z [95% Conf. Interval] age incomel male black hispanic hsgrad somecol college worka _cons *predict probability of smoking;. predict pred_prob_smoke; 177

12 (option p assumed; Pr(smoker)). * get detailed descriptive data about predicted prob;. sum pred_prob, detail; Pr(smoker) Percentiles Smallest 1% % % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance % Skewness % Kurtosis * predict binary outcome with 50% cutoff;. gen pred_smoke1=pred_prob_smoke>=.5;. label variable pred_smoke1 "predicted smoking, 50% cutoff";. * compare actual values;. tab smoker pred_smoke1, row col cell; Key frequency row percentage column percentage cell percentage predicted smoking, is current 50% cutoff smoking 0 1 Total , , , , Total 16, ,

13 . * ask for marginal effects/treatment effects;. mfx compute; Marginal effects after probit y = Pr(smoker) (predict) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age incomel male* black* hispanic* hsgrad* somecol* college* worka* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * the same type of variables can be produced with;. * prchange. this command is however more flexible;. * in that you can change the reference individual;. prchange, help; probit: Changes in Predicted Probabilities for smoker min->max 0->1 -+1/2 -+sd/2 MargEfct age incomel male black hispanic hsgrad somecol college worka Pr(y x) age incomel male black hispanic hsgrad somecol x= sd(x)= college worka x= sd(x)= Pr(y x): probability of observing each y for specified x values Avg Chg : average of absolute value of the change across categories Min->Max: change in predicted probability as x changes from its minimum to its maximum 0->1: change in predicted probability as x changes from 0 to 1 -+1/2: change in predicted probability as x changes from 1/2 unit below base value to 1/2 unit above 179

14 -+sd/2: change in predicted probability as x changes from 1/2 standard dev below base to 1/2 standard dev above MargEfct: the partial derivative of the predicted probability/rate with respect to a given independent variable. * get marginal effect/treatment effects for specific person;. * male, age 40, college educ, white, without workplace smoking ban;. * if a variable is not specified, its value is assumed to be;. * the sample mean. in this case, the only variable i am not;. * listing is mean log income;. prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0); probit: Changes in Predicted Probabilities for smoker min->max 0->1 -+1/2 -+sd/2 MargEfct age incomel male black hispanic hsgrad somecol college worka Pr(y x) age incomel male black hispanic hsgrad somecol x= sd(x)= college worka x= sd(x)= * using a wald test, test the null hypothesis that;. * all the education coefficients are zero;. test hsgrad somecol college; ( 1) hsgrad = 0 ( 2) somecol = 0 ( 3) college = 0 chi2( 3) = Prob > chi2 = * how to run the same tets with a -2 log like test;. * estimate the unresticted model and save the estimates ;. * in urmodel;. probit smoker age incomel male black hispanic > hsgrad somecol college worka; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood =

15 Probit estimates Number of obs = LR chi2(9) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker Coef. Std. Err. z P> z [95% Conf. Interval] age incomel male black hispanic hsgrad somecol college worka _cons estimates store urmodel;. * estimate the restricted model. save results in rmodel;. probit smoker age incomel male black hispanic > worka; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Probit estimates Number of obs = LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker Coef. Std. Err. z P> z [95% Conf. Interval] age incomel male black hispanic worka _cons estimates store rmodel;. lrtest urmodel rmodel; likelihood-ratio test LR chi2(3) = (Assumption: rmodel nested in urmodel) Prob > chi2 = * run logit model;. logit smoker age incomel male black hispanic 181

16 > hsgrad somecol college worka; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Logit estimates Number of obs = LR chi2(9) = Prob > chi2 = Log likelihood = Pseudo R2 = smoker Coef. Std. Err. z P> z [95% Conf. Interval] age incomel male black hispanic hsgrad somecol college worka _cons * ask for marginal effects/treatment effects;. * logit model;. mfx compute; Marginal effects after logit y = Pr(smoker) (predict) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age incomel male* black* hispanic* hsgrad* somecol* college* worka* (*) dy/dx is for discrete change of dummy variable from 0 to 1. log close; log: c:\bill\jpsm\workplace1.log log type: text closed on: 4 Nov 2004, 07:30:16 182

17 STATA Program for Odds Ratio in Logit Models natal95.do * this data set is a small.005 % random sample; * of observations from the 1995 natality detail; * data. we will examine the impack of smoking: * on birth weight. two large states, NY and CA, do not; * record mothers smoking status. therefore, of the ; * 4 million births in the US, only 3 million have all; * the necessary data so there should be 3 million*.005; * or roughly 15,000 obs; * set semi colon as the end of line; # delimit; * ask it NOT to pause; set more off; * open log file; log using c:\bill\jpsm\natal95.log,replace; * use the natality detail data set; use c:\bill\jpsm\natal95; * print out variable labels; desc; * construct indicator for low birth weight; gen lowbw=birthw<=2500; label variable lowbw "dummy variable, =1 ifbw<2500 grams"; * get frequencies; tab lowbw smoked, col row cell; * run a logit model; xi: logit lowbw smoked age married i.educ5 i.race4; * get marginal effects; mfx compute; * run a logit but report the odds ratios instead; xi: logistic lowbw smoked age married i.educ5 i.race4; log close; 183

18 STATA Results for Odds Ratio in Logit Models natal95.log log: c:\bill\jpsm\natal95.log log type: text opened on: 4 Nov 2004, 05:48:05. * use the natality detail data set;. use c:\bill\jpsm\natal95;. * print out variable labels;. desc; Contains data from c:\bill\jpsm\natal95.dta obs: 14,230 vars: 7 27 Oct :58 size: 170,760 (98.4% of memory free) > - storage display value variable name type format label variable label > - birthw int %9.0g birth weight in grams smoked byte %9.0g =1 if mom smoked during pregnancy age byte %9.0g moms age at birth married byte %9.0g =1 if married race4 byte %9.0g 1=white,2=black,3=asian,4=other educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15, 5=16+ visits byte %9.0g prenatal visits > - Sorted by:. * construct indicator for low birth weight;. gen lowbw=birthw<=2500;. label variable lowbw "dummy variable, =1 ifbw<2500 grams";. * get frequencies;. tab lowbw smoked, col row cell; Key frequency row percentage column percentage cell percentage dummy variable, 184

19 =1 =1 if mom smoked ifbw<2500 during pregnancy grams 0 1 Total ,626 1,745 13, Total 12,285 1,945 14, * run a logit model;. xi: logit lowbw smoked age married i.educ5 i.race4; i.educ5 _Ieduc5_1-5 (naturally coded; _Ieduc5_1 omitted) i.race4 _Irace4_1-4 (naturally coded; _Irace4_1 omitted) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Logit estimates Number of obs = LR chi2(10) = Prob > chi2 = Log likelihood = Pseudo R2 = lowbw Coef. Std. Err. z P> z [95% Conf. Interval] smoked age married _Ieduc5_ _Ieduc5_ _Ieduc5_ _Ieduc5_ _Irace4_ _Irace4_ _Irace4_ _cons * get marginal effects;. mfx compute; Marginal effects after logit y = Pr(lowbw) (predict) 185

20 = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X smoked* age married* _Ieduc~2* _Ieduc~3* _Ieduc~4* _Ieduc~5* _Irace~2* _Irace~3* _Irace~4* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * run a logit but report the odds ratios instead;. xi: logistic lowbw smoked age married i.educ5 i.race4; i.educ5 _Ieduc5_1-5 (naturally coded; _Ieduc5_1 omitted) i.race4 _Irace4_1-4 (naturally coded; _Irace4_1 omitted) Logistic regression Number of obs = LR chi2(10) = Prob > chi2 = Log likelihood = Pseudo R2 = lowbw Odds Ratio Std. Err. z P> z [95% Conf. Interval] smoked age married _Ieduc5_ _Ieduc5_ _Ieduc5_ _Ieduc5_ _Irace4_ _Irace4_ _Irace4_ log close; log: c:\bill\jpsm\natal95.log log type: text closed on: 4 Nov 2004, 05:48:39 * this example is attributed to jeff smith from; * the economics department at michigan. the data; * set contains a sample of 1500 females who; * participated in the job training partnership act program; * each respondent could have received one of 4 job training; * services. 1=classroom training. 2=on the job training; * 3= job search assistance, 4=other; 186

21 STATA Program for Ordered Probit Models sr_health_status.do * this data for this example are adults, 18-64; * who answered the cancer control supplement to; * the 1994 national health interview survey; * the key outcome is self reported health status; * coded 1-5, poor, fair, good, very good, excellent; * a ke covariate is current smoking status and whether; * one smoked 5 years ago; # delimit; set memory 20m; set matsize 200; set more off; log using c:\bill\jpsm\sr_health_status.log,replace; * load up sas data set; use c:\bill\jpsm\sr_health_status; * get contents of data file; desc; * get summary statistics; sum; * get tabulation of sr_health; tab sr_health; * run OLS models, just to look at the raw correlations in data; reg sr_health male age educ famincl black othrace smoke smoke5; * do ordered probit, self reported health status; oprobit sr_health male age educ famincl black othrace smoke smoke5; * get marginal effects, evaluated at y=5 (excellent); mfx compute, predict(outcome(5)); * get marginal effects, evaluated at y=3 (good); mfx compute, predict(outcome(3)); * use prchange, evaluate marginal effects for; * 40 year old white female with a college degree; * never smoked with average log income; prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16); log close; 187

22 STATA Results for Ordered Probit Models sr_health_status.log log: c:\bill\iadb\sr_health_status.log log type: text opened on: 1 Nov 2004, 12:06:56. * load up sas data set;. use sr_health_status;. * get contents of data file;. desc; Contains data from sr_health_status.dta obs: 12,900 vars: 9 1 Nov :51 size: 322,500 (98.5% of memory free) > - storage display value variable name type format label variable label > - male byte %9.0g =1 if male age byte %9.0g age in years educ byte %9.0g years of education smoke byte %9.0g current smoker smoke5 byte %9.0g smoked in past 5 years black float %9.0g =1 if respondent is black othrace float %9.0g =1 if other race (white is ref) sr_health float %9.0g 1-5 self reported health, 5=excel, 1=poor famincl float %9.0g log family income > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max male age educ smoke smoke black othrace sr_health famincl * get tabulation of sr_health;. tab sr_health; 188

23 1-5 self reported health, 5=excel, 1=poor Freq. Percent Cum , , , Total 12, * run OLS models, just to look at the raw correlations in data;. reg sr_health male age educ famincl black othrace smoke smoke5; Source SS df MS Number of obs = F( 8, 12891) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = sr_health Coef. Std. Err. t P> t [95% Conf. Interval] male age educ famincl black othrace smoke smoke _cons * do ordered probit, self reported health status;. oprobit sr_health male age educ famincl black othrace smoke smoke5; Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Ordered probit estimates Number of obs = LR chi2(8) = Prob > chi2 = Log likelihood = Pseudo R2 = sr_health Coef. Std. Err. z P> z [95% Conf. Interval] male age

24 educ famincl black othrace smoke smoke _cut (Ancillary parameters) _cut _cut _cut * get marginal effects, evaluated at y=5 (excellent);. mfx compute, predict(outcome(5)); Marginal effects after oprobit y = Pr(sr_health==5) (predict, outcome(5)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X male* age educ famincl black* othrace* smoke* smoke5* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * get marginal effects, evaluated at y=3 (good);. mfx compute, predict(outcome(3)); Marginal effects after oprobit y = Pr(sr_health==3) (predict, outcome(3)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X male* age educ famincl black* othrace* smoke* smoke5* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * use prchange, evaluate marginal effects for;. * 40 year old white female with a college degree;. * never smoked with average log income;. prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16); 190

25 oprobit: Changes in Predicted Probabilities for sr_health male Avg Chg > > age Avg Chg Min->Max / sd/ MargEfct Min->Max / sd/ MargEfct educ Avg Chg Min->Max / sd/ MargEfct Min->Max / sd/ MargEfct famincl Avg Chg Min->Max / sd/ MargEfct Min->Max / sd/ MargEfct black Avg Chg > othrace 5 0->

26 Avg Chg > > smoke Avg Chg > > smoke5 Avg Chg > > Pr(y x) male age educ famincl black othrace smoke x= sd(x)= smoke5 x= 0 sd(x)= log close; log: c:\bill\iadb\sr_health_status.log log type: text closed on: 1 Nov 2004, 12:07:40 192

27 STATA Program for Multinomial Logit Model Job_training_example.do * set end of line marker; # delimit; set more off; * increase memory; set memory 20m; * write results to file; log using c:\bill\jpsm\job_training_example.log,replace; * load up sas data set; use c:\bill\jpsm\job_training_example; * get contents of data file; desc; * get summary statistics; sum; * get frequency of choice variable; tab choice; * run multinomial logit. omitted groups are; * whites, those with > 12 years of ed, those w/ work experience; * base(#) tells STATA what category should be the reference option; * base(4) is using other as the reference group; mlogit choice age black hisp nvrwrk lths hsgrad, base(4); * get marginal effects for the 4 options, on the job training; mfx compute, predict(outcome(1)); mfx compute, predict(outcome(2)); mfx compute, predict(outcome(3)); mfx compute, predict(outcome(4)); * test for IIA using the Hausam test; * the program eliminates one choice at ; * a time then compares the unrestricted; * estimates to the restricted ones; mlogtest, hausman; log close; 193

28 STATA Reults for Multinomial Logit Model Job_training_example.log log: c:\bill\jpsm\job_training_example.log log type: text opened on: 27 May 2006, 06:15:58. * load up sas data set;. use c:\bill\jpsm\job_training_example;. * get contents of data file;. desc; Contains data from c:\bill\jpsm\job_training_example.dta obs: 1,500 vars: 9 17 May :09 size: 24,000 (99.9% of memory free) > - storage display value variable name type format label variable label > - pid long %10.0g personal ID number age byte %4.0f age in years lths byte %9.0g =1 if education < hs grad hsgrad byte %9.0g =1 if education is 12 years gths byte %9.0g =1 of education is > 12 years black byte %9.0g =1 if black, =0 otherwise hisp byte %9.0g =1 if hispanic, =0 otherwise nvrwrk byte %9.0g =1 if never worked, =0 otherwise choice byte %9.0g > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max pid age lths hsgrad gths black hisp nvrwrk choice * get frequency of choice variable;. tab choice; 194

29 choice Freq. Percent Cum Total 1, * run multinomial logit. omitted groups are;. * whites, those with > 12 years of ed, those w/ work experience;. * base(#) tells STATA what category should be the reference option;. * base(4) is using other as the reference group;. mlogit choice age black hisp nvrwrk lths hsgrad, base(4); Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Multinomial logistic regression Number of obs = 1500 LR chi2(18) = Prob > chi2 = Log likelihood = Pseudo R2 = choice Coef. Std. Err. z P> z [95% Conf. Interval] 1 age black hisp nvrwrk lths hsgrad _cons age black hisp nvrwrk lths hsgrad _cons age black hisp nvrwrk lths hsgrad _cons (Outcome choice==4 is the comparison group) 195

30 . * get marginal effects for the 4 options, on the job training;. mfx compute, predict(outcome(1)); Marginal effects after mlogit y = Pr(choice==1) (predict, outcome(1)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age black* hisp* nvrwrk* lths* hsgrad* (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(2)); Marginal effects after mlogit y = Pr(choice==2) (predict, outcome(2)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age black* hisp* nvrwrk* lths* hsgrad* (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(3)); Marginal effects after mlogit y = Pr(choice==3) (predict, outcome(3)) = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age black* hisp* nvrwrk* lths* hsgrad* (*) dy/dx is for discrete change of dummy variable from 0 to 1. mfx compute, predict(outcome(4)); Marginal effects after mlogit y = Pr(choice==4) (predict, outcome(4)) 196

31 = variable dy/dx Std. Err. z P> z [ 95% C.I. ] X age black* hisp* nvrwrk* lths* hsgrad* (*) dy/dx is for discrete change of dummy variable from 0 to 1. * test for IIA using the Hausam test;. * the program eliminates one choice at ;. * a time then compares the unrestricted;. * estimates to the restricted ones;. mlogtest, hausman; **** Hausman tests of IIA assumption Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted chi2 df P>chi2 evidence for Ho for Ho for Ho log close; log: c:\bill\jpsm\job_training_example.log log type: text closed on: 27 May 2006, 06:17:03 197

32 STATA Program for Conditional and Mixed Logit Models Travel_choice_example.do * set end of line marker; # delimit; set more off; * increase memory; set memory 20m; * write results to file; log using c:\bill\jpsm\travel_choice_example.log,replace; * load up sas data set; use c:\bill\jpsm\travel_choice_example; * get contents of data file; desc; * get summary statistics; sum; * get freqency of options; tab choice; * construct dummy variables for intercepts; * with j choices, need j-1 options; gen air=mode==1; gen train=mode==2; gen bus=mode==3; gen car=mode==4; * interact hhinc and group size with choice dummies; gen hhinc_air=air*hhinc; gen hhinc_train=train*hhinc; gen hhinc_bus=bus*hhinc; * if mode of transportation is a car, costs are costs; * if mode is bus/train/air, costs are grp_size x costs; gen group_costs=car*costs+(1-car)*groupsize*costs; * get means by choices; sum time group_costs if mode==1; sum time group_costs if mode==2; sum time group_costs if mode==3; sum time group_costs if mode==4; * run mcfaddens choice model. for covariates add; * a) j-1 option dummies; * c) variables that vary by choice; 198

33 clogit choice air train bus time group_costs, group(hhid); * run another model but add; * c) income and interacted w/ choice dummies; clogit choice air train bus time group_costs hhinc_*, group(hhid); * print out odds ratios; listcoef; * in this section we simulate the change in the; * choices if we increase the travel time; * by car by 30 minutes; * get the predicted probabilities given original; * values of Xs; predict pred0; * for mode=4, add 30 minutes; replace time=time+30 if mode==4; * get new predicted probabilities with new time; predict pred30; * change in probabilities; gen change_p=pred30-pred0; * get means of change in probs; sum change_p if mode==1; sum change_p if mode==2; sum change_p if mode==3; sum change_p if mode==4; * before you forget, change time back to; * original value; replace time=time-30 if mode==4; log close; 199

34 STATA Results for Conditional and Mixed Logit Models Travel_choice_example.log log: c:\bill\jpsm\travel_choice_example.log log type: text opened on: 27 May 2006, 07:42:17. * load up sas data set;. use c:\bill\jpsm\travel_choice_example;. * get contents of data file;. desc; Contains data from c:\bill\jpsm\travel_choice_example.dta obs: 840 vars: 7 17 May :08 size: 11,760 (99.9% of memory free) > - storage display value variable name type format label variable label > - hhid int %8.0g household ID mode byte %8.0g 1=air, 2=train, 3=bus, 4=car choice byte %8.0g =1 if choice, =0 otherwise time int %8.0g travel time in minutes costs int %8.0g travel costs in dollars hhinc byte %8.0g household income (x1000) groupsize byte %8.0g # of people in traveling party > - Sorted by:. * get summary statistics;. sum; Variable Obs Mean Std. Dev. Min Max hhid mode choice time costs hhinc groupsize * get freqency of options;. tab choice; =1 if choice, =0 otherwise Freq. Percent Cum

35 Total * construct dummy variables for intercepts;. * with j choices, need j-1 options;. gen air=mode==1;. gen train=mode==2;. gen bus=mode==3;. gen car=mode==4;. * interact hhinc and group size with choice dummies;. gen hhinc_air=air*hhinc;. gen hhinc_train=train*hhinc;. gen hhinc_bus=bus*hhinc;. * if mode of transportation is a car, costs are costs;. * if mode is bus/train/air, costs are grp_size x costs;. gen group_costs=car*costs+(1-car)*groupsize*costs;. * get means by choices;. sum time group_costs if mode==1; Variable Obs Mean Std. Dev. Min Max time group_costs sum time group_costs if mode==2; Variable Obs Mean Std. Dev. Min Max time group_costs sum time group_costs if mode==3; Variable Obs Mean Std. Dev. Min Max time group_costs sum time group_costs if mode==4; Variable Obs Mean Std. Dev. Min Max time group_costs * run mcfaddens choice model. for covariates add;. * a) j-1 option dummies; 201

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment Dummy variables Treatment 22 1 1 Control 3 2 Y Y1 0 1 2 Y X X i identifies treatment 1 1 1 1 1 1 0 0 0 X i =1 if in treatment group X i =0 if in control H o : u n =u u Are wages different across union/nonunion

More information

Final Exam - section 1. Thursday, December hours, 30 minutes

Final Exam - section 1. Thursday, December hours, 30 minutes Econometrics, ECON312 San Francisco State University Michael Bar Fall 2013 Final Exam - section 1 Thursday, December 19 1 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.

More information

You created this PDF from an application that is not licensed to print to novapdf printer (http://www.novapdf.com)

You created this PDF from an application that is not licensed to print to novapdf printer (http://www.novapdf.com) Monday October 3 10:11:57 2011 Page 1 (R) / / / / / / / / / / / / Statistics/Data Analysis Education Box and save these files in a local folder. name:

More information

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA] Tutorial #3 This example uses data in the file 16.09.2011.dta under Tutorial folder. It contains 753 observations from a sample PSID data on the labor force status of married women in the U.S in 1975.

More information

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213. Econ 371 Problem Set #4 Answer Sheet 6.2 This question asks you to use the results from column (1) in the table on page 213. a. The first part of this question asks whether workers with college degrees

More information

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt. Categorical Outcomes Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Nominal Ordinal 28/11/2017 R by C Table: Example Categorical,

More information

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} PS 4 Monday August 16 01:00:42 2010 Page 1 tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6} log: C:\web\PS4log.smcl log type: smcl opened on:

More information

Advanced Econometrics

Advanced Econometrics Advanced Econometrics Instructor: Takashi Yamano 11/14/2003 Due: 11/21/2003 Homework 5 (30 points) Sample Answers 1. (16 points) Read Example 13.4 and an AER paper by Meyer, Viscusi, and Durbin (1995).

More information

Problem Set 6 ANSWERS

Problem Set 6 ANSWERS Economics 20 Part I. Problem Set 6 ANSWERS Prof. Patricia M. Anderson The first 5 questions are based on the following information: Suppose a researcher is interested in the effect of class attendance

More information

Two-stage least squares examples. Angrist: Vietnam Draft Lottery Men, Cohorts. Vietnam era service

Two-stage least squares examples. Angrist: Vietnam Draft Lottery Men, Cohorts. Vietnam era service Two-stage least squares examples Angrist: Vietnam Draft Lottery 1 2 Vietnam era service 1980 Men, 1940-1952 Cohorts Defined as 1964-1975 Estimated 8.7 million served during era 3.4 million were in SE Asia

More information

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1 Module 9: Single-level and Multilevel Models for Ordinal Responses Pre-requisites Modules 5, 6 and 7 Stata Practical 1 George Leckie, Tim Morris & Fiona Steele Centre for Multilevel Modelling If you find

More information

u panel_lecture . sum

u panel_lecture . sum u panel_lecture sum Variable Obs Mean Std Dev Min Max datastre 639 9039644 6369418 900228 926665 year 639 1980 2584012 1976 1984 total_sa 639 9377839 3212313 682 441e+07 tot_fixe 639 5214385 1988422 642

More information

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods 1 SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 Lecture 10: Multinomial regression baseline category extension of binary What if we have multiple possible

More information

Nonlinear Econometric Analysis (ECO 722) Answers to Homework 4

Nonlinear Econometric Analysis (ECO 722) Answers to Homework 4 Nonlinear Econometric Analysis (ECO 722) Answers to Homework 4 1 Greene and Hensher (1997) report estimates of a model of travel mode choice for travel between Sydney and Melbourne, Australia The dataset

More information

Heteroskedasticity. . reg wage black exper educ married tenure

Heteroskedasticity. . reg wage black exper educ married tenure Heteroskedasticity. reg Source SS df MS Number of obs = 2,380 -------------+---------------------------------- F(2, 2377) = 72.38 Model 14.4018246 2 7.20091231 Prob > F = 0.0000 Residual 236.470024 2,377.099482551

More information

Chapter 6 Part 3 October 21, Bootstrapping

Chapter 6 Part 3 October 21, Bootstrapping Chapter 6 Part 3 October 21, 2008 Bootstrapping From the internet: The bootstrap involves repeated re-estimation of a parameter using random samples with replacement from the original data. Because the

More information

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey. 1. Using a probit model and data from the 2008 March Current Population Survey, I estimated a probit model of the determinants of pension coverage. Three specifications were estimated. The first included

More information

3. Multinomial response models

3. Multinomial response models 3. Multinomial response models 3.1 General model approaches Multinomial dependent variables in a microeconometric analysis: These qualitative variables have more than two possible mutually exclusive categories

More information

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1 *1A Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1 Variable Obs Mean Std Dev Min Max --- housereg 21 2380952

More information

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 This is adapted heavily from Menard s Applied Logistic Regression

More information

EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit

EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit. summarize work age married children education Variable Obs Mean Std. Dev. Min Max work 2000.6715.4697852 0 1 age 2000 36.208

More information

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement İnsan TUNALI 8 November 2018 Econ 511: Econometrics I ASSIGNMENT 7 STATA Supplement. use "F:\COURSES\GRADS\ECON511\SHARE\wages1.dta", clear. generate =ln(wage). scatter sch Q. Do you see a relationship

More information

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014 Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014 In class, Lecture 11, we used a new dataset to examine labor force participation and wages across groups.

More information

Module 4 Bivariate Regressions

Module 4 Bivariate Regressions AGRODEP Stata Training April 2013 Module 4 Bivariate Regressions Manuel Barron 1 and Pia Basurto 2 1 University of California, Berkeley, Department of Agricultural and Resource Economics 2 University of

More information

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998 Economics 312 Sample Project Report Jeffrey Parker Introduction This project is based on Exercise 2.12 on page 81 of the Hill, Griffiths, and Lim text. It examines how the sale price of houses in Stockton,

More information

West Coast Stata Users Group Meeting, October 25, 2007

West Coast Stata Users Group Meeting, October 25, 2007 Estimating Heterogeneous Choice Models with Stata Richard Williams, Notre Dame Sociology, rwilliam@nd.edu oglm support page: http://www.nd.edu/~rwilliam/oglm/index.html West Coast Stata Users Group Meeting,

More information

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children Longitudinal Logistic Regression: Breastfeeding of Nepalese Children Scientific Question Determine whether the breastfeeding of Nepalese children varies with child age and/or sex of child. Data: Nepal

More information

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian. Binary Logit

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian. Binary Logit Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian Binary Logit Binary models deal with binary (0/1, yes/no) dependent variables. OLS is inappropriate for this kind of dependent

More information

Catherine De Vries, Spyros Kosmidis & Andreas Murr

Catherine De Vries, Spyros Kosmidis & Andreas Murr APPLIED STATISTICS FOR POLITICAL SCIENTISTS WEEK 8: DEPENDENT CATEGORICAL VARIABLES II Catherine De Vries, Spyros Kosmidis & Andreas Murr Topic: Logistic regression. Predicted probabilities. STATA commands

More information

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions 1. I estimated a multinomial logit model of employment behavior using data from the 2006 Current Population Survey. The three possible outcomes for a person are employed (outcome=1), unemployed (outcome=2)

More information

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Sociology Exam 3 Answer Key - DRAFT May 8, 2007 Sociology 63993 Exam 3 Answer Key - DRAFT May 8, 2007 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. The odds of an event occurring

More information

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8 ECON4150 - Introductory Econometrics Seminar 4 Stock and Watson Chapter 8 empirical exercise E8.2: Data 2 In this exercise we use the data set CPS12.dta Each month the Bureau of Labor Statistics in the

More information

Quantitative Techniques Term 2

Quantitative Techniques Term 2 Quantitative Techniques Term 2 Laboratory 7 2 March 2006 Overview The objective of this lab is to: Estimate a cost function for a panel of firms; Calculate returns to scale; Introduce the command cluster

More information

The relationship between GDP, labor force and health expenditure in European countries

The relationship between GDP, labor force and health expenditure in European countries Econometrics-Term paper The relationship between GDP, labor force and health expenditure in European countries Student: Nguyen Thu Ha Contents 1. Background:... 2 2. Discussion:... 2 3. Regression equation

More information

List of figures. I General information 1

List of figures. I General information 1 List of figures Preface xix xxi I General information 1 1 Introduction 7 1.1 What is this book about?........................ 7 1.2 Which models are considered?...................... 8 1.3 Whom is this

More information

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No (Your online answer will be used to verify your response.) Directions There are two parts to the final exam.

More information

Technical Documentation for Household Demographics Projection

Technical Documentation for Household Demographics Projection Technical Documentation for Household Demographics Projection REMI Household Forecast is a tool to complement the PI+ demographic model by providing comprehensive forecasts of a variety of household characteristics.

More information

Handout seminar 6, ECON4150

Handout seminar 6, ECON4150 Handout seminar 6, ECON4150 Herman Kruse March 17, 2013 Introduction - list of commands This week, we need a couple of new commands in order to solve all the problems. hist var1 if var2, options - creates

More information

Logistic Regression Analysis

Logistic Regression Analysis Revised July 2018 Logistic Regression Analysis This set of notes shows how to use Stata to estimate a logistic regression equation. It assumes that you have set Stata up on your computer (see the Getting

More information

Solutions for Session 5: Linear Models

Solutions for Session 5: Linear Models Solutions for Session 5: Linear Models 30/10/2018. do solution.do. global basedir http://personalpages.manchester.ac.uk/staff/mark.lunt. global datadir $basedir/stats/5_linearmodels1/data. use $datadir/anscombe.

More information

Problem Set 9 Heteroskedasticty Answers

Problem Set 9 Heteroskedasticty Answers Problem Set 9 Heteroskedasticty Answers /* INVESTIGATION OF HETEROSKEDASTICITY */ First graph data. u hetdat2. gra manuf gdp, s([country].) xlab ylab 300000 manufacturing output (US$ miilio 200000 100000

More information

Description Remarks and examples References Also see

Description Remarks and examples References Also see Title stata.com example 41g Two-level multinomial logistic regression (multilevel) Description Remarks and examples References Also see Description We demonstrate two-level multinomial logistic regression

More information

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17 Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17 Answer all questions in the space provided on the exam. Total of 36 points (and worth 22.5% of final grade). Read each question carefully,

More information

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education 1 Stata Textbook Examples Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge (1st & 2d eds.) Chapter 2 - The Simple Regression Model Example 2.3: CEO Salary and Return on Equity summ

More information

The Multivariate Regression Model

The Multivariate Regression Model The Multivariate Regression Model Example Determinants of College GPA Sample of 4 Freshman Collect data on College GPA (4.0 scale) Look at importance of ACT Consider the following model CGPA ACT i 0 i

More information

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

Chapter 11 Part 6. Correlation Continued. LOWESS Regression Chapter 11 Part 6 Correlation Continued LOWESS Regression February 17, 2009 Goal: To review the properties of the correlation coefficient. To introduce you to the various tools that can be used to decide

More information

STATA log file for Time-Varying Covariates (TVC) Duration Model Estimations.

STATA log file for Time-Varying Covariates (TVC) Duration Model Estimations. STATA log file for Time-Varying Covariates (TVC) Duration Model Estimations. This STATA 8.0 log file reports estimations in which CDER Staff Aggregates and PDUFA variable are assigned to drug-months of

More information

Modeling wages of females in the UK

Modeling wages of females in the UK International Journal of Business and Social Science Vol. 2 No. 11 [Special Issue - June 2011] Modeling wages of females in the UK Saadia Irfan NUST Business School National University of Sciences and

More information

Morten Frydenberg Wednesday, 12 May 2004

Morten Frydenberg Wednesday, 12 May 2004 " $% " * +, " --. / ",, 2 ", $, % $ 4 %78 % / "92:8/- 788;?5"= "8= < < @ "A57 57 "χ 2 = -value=. 5 OR =, OR = = = + OR B " B Linear ang Logistic Regression: Note. = + OR 2 women - % β β = + woman

More information

An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program. on the United Methodist Church in Texas

An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program. on the United Methodist Church in Texas An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program on the United Methodist Church in Texas The Texas Methodist Foundation completed its first, two-year Clergy Development

More information

Model fit assessment via marginal model plots

Model fit assessment via marginal model plots The Stata Journal (2010) 10, Number 2, pp. 215 225 Model fit assessment via marginal model plots Charles Lindsey Texas A & M University Department of Statistics College Station, TX lindseyc@stat.tamu.edu

More information

Labor Market Returns to Two- and Four- Year Colleges. Paper by Kane and Rouse Replicated by Andreas Kraft

Labor Market Returns to Two- and Four- Year Colleges. Paper by Kane and Rouse Replicated by Andreas Kraft Labor Market Returns to Two- and Four- Year Colleges Paper by Kane and Rouse Replicated by Andreas Kraft Theory Estimating the return to two-year colleges Economic Return to credit hours or sheepskin effects

More information

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter Sean Howard Econometrics Final Project Paper An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter Introduction This project attempted to gain a more complete

More information

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions Preliminaries 1. Basic Regression. reg y x1 Source SS df MS

More information

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta) Getting Started in Logit and Ordered Logit Regression (ver. 3. beta Oscar Torres-Reyna Data Consultant otorres@princeton.edu http://dss.princeton.edu/training/ Logit model Use logit models whenever your

More information

Econometric Methods for Valuation Analysis

Econometric Methods for Valuation Analysis Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 25 Outline We will consider econometric

More information

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta) Getting Started in Logit and Ordered Logit Regression (ver. 3. beta Oscar Torres-Reyna Data Consultant otorres@princeton.edu http://dss.princeton.edu/training/ Logit model Use logit models whenever your

More information

CHAPTER 4 ESTIMATES OF RETIREMENT, SOCIAL SECURITY BENEFIT TAKE-UP, AND EARNINGS AFTER AGE 50

CHAPTER 4 ESTIMATES OF RETIREMENT, SOCIAL SECURITY BENEFIT TAKE-UP, AND EARNINGS AFTER AGE 50 CHAPTER 4 ESTIMATES OF RETIREMENT, SOCIAL SECURITY BENEFIT TAKE-UP, AND EARNINGS AFTER AGE 5 I. INTRODUCTION This chapter describes the models that MINT uses to simulate earnings from age 5 to death, retirement

More information

Limited Dependent Variables

Limited Dependent Variables Limited Dependent Variables Christopher F Baum Boston College and DIW Berlin Birmingham Business School, March 2013 Christopher F Baum (BC / DIW) Limited Dependent Variables BBS 2013 1 / 47 Limited dependent

More information

Effect of Education on Wage Earning

Effect of Education on Wage Earning Effect of Education on Wage Earning Group Members: Quentin Talley, Thomas Wang, Geoff Zaski Abstract The scope of this project includes individuals aged 18-65 who finished their education and do not have

More information

Allison notes there are two conditions for using fixed effects methods.

Allison notes there are two conditions for using fixed effects methods. Panel Data 3: Conditional Logit/ Fixed Effects Logit Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 2, 2017 These notes borrow very heavily, sometimes

More information

Day 3C Simulation: Maximum Simulated Likelihood

Day 3C Simulation: Maximum Simulated Likelihood Day 3C Simulation: Maximum Simulated Likelihood c A. Colin Cameron Univ. of Calif. - Davis... for Center of Labor Economics Norwegian School of Economics Advanced Microeconometrics Aug 28 - Sep 1, 2017

More information

Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541

Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541 Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541 In determining logistic regression results, you will generally be given the odds ratio in the SPSS or SAS output. However,

More information

F^3: F tests, Functional Forms and Favorite Coefficient Models

F^3: F tests, Functional Forms and Favorite Coefficient Models F^3: F tests, Functional Forms and Favorite Coefficient Models Favorite coefficient model: otherteams use "nflpricedata Bdta", clear *Favorite coefficient model: otherteams reg rprice pop pop2 rpci wprcnt1

More information

Example 7.1: Hourly Wage Equation Average wage for women

Example 7.1: Hourly Wage Equation Average wage for women 1 Stata Textbook Examples Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge (1st & 2nd eds.) Chapter 7 - Multiple Regression Analysis with Qualitative Information: Binary (or Dummy)

More information

ECO671, Spring 2014, Sample Questions for First Exam

ECO671, Spring 2014, Sample Questions for First Exam 1. Using data from the Survey of Consumers Finances between 1983 and 2007 (the surveys are done every 3 years), I used OLS to examine the determinants of a household s credit card debt. Credit card debt

More information

COMPLEMENTARITY ANALYSIS IN MULTINOMIAL

COMPLEMENTARITY ANALYSIS IN MULTINOMIAL 1 / 25 COMPLEMENTARITY ANALYSIS IN MULTINOMIAL MODELS: THE GENTZKOW COMMAND Yunrong Li & Ricardo Mora SWUFE & UC3M Madrid, Oct 2017 2 / 25 Outline 1 Getzkow (2007) 2 Case Study: social vs. internet interactions

More information

. ********** OUTPUT FILE: CARD & KRUEGER (1994)***********.. * STATA 10.0 CODE. * copyright C 2008 by Tito Boeri & Jan van Ours. * "THE ECONOMICS OF

. ********** OUTPUT FILE: CARD & KRUEGER (1994)***********.. * STATA 10.0 CODE. * copyright C 2008 by Tito Boeri & Jan van Ours. * THE ECONOMICS OF ********** OUTPUT FILE: CARD & KRUEGER (1994)*********** * STATA 100 CODE * copyright C 2008 by Tito Boeri & Jan van Ours * "THE ECONOMICS OF IMPERFECT LABOR MARKETS" * by Tito Boeri & Jan van Ours (2008)

More information

Applied Econometrics for Health Economists

Applied Econometrics for Health Economists Applied Econometrics for Health Economists Exercise 0 Preliminaries The data file hals1class.dta contains the following variables: age male white aglsch rheuma prheuma ownh breakhot tea teasug coffee age

More information

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 13, 2018 Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 3, 208 [This handout draws very heavily from Regression Models for Categorical

More information

Econometrics is. The estimation of relationships suggested by economic theory

Econometrics is. The estimation of relationships suggested by economic theory Econometrics is Econometrics is The estimation of relationships suggested by economic theory Econometrics is The estimation of relationships suggested by economic theory The application of mathematical

More information

WWS 508b Precept 10. John Palmer. April 27, 2010

WWS 508b Precept 10. John Palmer. April 27, 2010 WWS 508b Precept 10 John Palmer April 27, 2010 Example: married women s labor force participation The MROZ.dta data set has information on labor force participation and other characteristics of married

More information

Assignment #5 Solutions: Chapter 14 Q1.

Assignment #5 Solutions: Chapter 14 Q1. Assignment #5 Solutions: Chapter 14 Q1. a. R 2 is.037 and the adjusted R 2 is.033. The adjusted R 2 value becomes particularly important when there are many independent variables in a multiple regression

More information

Time series data: Part 2

Time series data: Part 2 Plot of Epsilon over Time -- Case 1 1 Time series data: Part Epsilon - 1 - - - -1 1 51 7 11 1 151 17 Time period Plot of Epsilon over Time -- Case Plot of Epsilon over Time -- Case 3 1 3 1 Epsilon - Epsilon

More information

The SAS System 11:03 Monday, November 11,

The SAS System 11:03 Monday, November 11, The SAS System 11:3 Monday, November 11, 213 1 The CONTENTS Procedure Data Set Name BIO.AUTO_PREMIUMS Observations 5 Member Type DATA Variables 3 Engine V9 Indexes Created Monday, November 11, 213 11:4:19

More information

Example 8.1: Log Wage Equation with Heteroscedasticity-Robust Standard Errors

Example 8.1: Log Wage Equation with Heteroscedasticity-Robust Standard Errors 1 Stata Textbook Examples Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge (1st & 2nd eds.) Chapter 8 - Heteroskedasticity Example 8.1: Log Wage Equation with Heteroscedasticity-Robust

More information

Prof. Dr. Ben Jann. University of Bern, Institute of Sociology, Fabrikstrasse 8, CH-3012 Bern

Prof. Dr. Ben Jann. University of Bern, Institute of Sociology, Fabrikstrasse 8, CH-3012 Bern Methodological Report on Kaul and Wolf s Working Papers on the Effect of Plain Packaging on Smoking Prevalence in Australia and the Criticism Raised by OxyRomandie Prof. Dr. Ben Jann University of Bern,

More information

Post-Estimation Techniques in Statistical Analysis: Introduction to Clarify and S-Post in Stata

Post-Estimation Techniques in Statistical Analysis: Introduction to Clarify and S-Post in Stata Post-Estimation Techniques in Statistical Analysis: Introduction to Clarify and S-Post in Stata PRISM Brownbag November 16, 2004 By: Kevin Sweeney and Brandon Bartels Presenters: Dave Darmofal and Corwin

More information

To be two or not be two, that is a LOGISTIC question

To be two or not be two, that is a LOGISTIC question MWSUG 2016 - Paper AA18 To be two or not be two, that is a LOGISTIC question Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT A binary response is very common in logistic regression

More information

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 10, 2017 Maximum Likelihood Estimation Richard Williams, University of otre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 0, 207 [This handout draws very heavily from Regression Models for Categorical

More information

An Introduction to Event History Analysis

An Introduction to Event History Analysis An Introduction to Event History Analysis Oxford Spring School June 18-20, 2007 Day Three: Diagnostics, Extensions, and Other Miscellanea Data Redux: Supreme Court Vacancies, 1789-1992. stset service,

More information

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings Dummy Variables A dummy variable or binary variable is a variable that takes on a value of 0 or 1 as an indicator that the observation has some kind of characteristic. Common examples: Sex (female): FEMALE=1

More information

Appendix. Table A.1 (Part A) The Author(s) 2015 G. Chakrabarti and C. Sen, Green Investing, SpringerBriefs in Finance, DOI /

Appendix. Table A.1 (Part A) The Author(s) 2015 G. Chakrabarti and C. Sen, Green Investing, SpringerBriefs in Finance, DOI / Appendix Table A.1 (Part A) Dependent variable: probability of crisis (own) Method: ML binary probit (quadratic hill climbing) Included observations: 47 after adjustments Convergence achieved after 6 iterations

More information

1) The Effect of Recent Tax Changes on Taxable Income

1) The Effect of Recent Tax Changes on Taxable Income 1) The Effect of Recent Tax Changes on Taxable Income In the most recent issue of the Journal of Policy Analysis and Management, Bradley Heim published a paper called The Effect of Recent Tax Changes on

More information

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points Economics 102: Analysis of Economic Data Cameron Spring 2015 April 23 Department of Economics, U.C.-Davis First Midterm Exam (Version A) Compulsory. Closed book. Total of 30 points and worth 22.5% of course

More information

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link'; BIOS 6244 Analysis of Categorical Data Assignment 5 s 1. Consider Exercise 4.4, p. 98. (i) Write the SAS code, including the DATA step, to fit the linear probability model and the logit model to the data

More information

Visualisierung von Nicht-Linearität bzw. Heteroskedastizität

Visualisierung von Nicht-Linearität bzw. Heteroskedastizität Visualisierung von Nicht-Linearität bzw. Heteroskedastizität. use..\wooldridge\stata\wage2, clear. scatter wage IQ Kommentar: Folie 38. graph copy a3, replace. summ IQ Variable Obs Mean Std. Dev. Min Max

More information

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to

More information

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation. 1. Using data from IRS Form 5500 filings by U.S. pension plans, I estimated a model of contributions to pension plans as ln(1 + c i ) = α 0 + U i α 1 + PD i α 2 + e i Where the subscript i indicates the

More information

Intro to GLM Day 2: GLM and Maximum Likelihood

Intro to GLM Day 2: GLM and Maximum Likelihood Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the

More information

WesVar Analysis Example Replication C7

WesVar Analysis Example Replication C7 WesVar Analysis Example Replication C7 WesVar 5.1 is primarily a point and click application and though a text file of commands can be used in the WesVar (V5.1) batch processing environment, all examples

More information

Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models

Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models Dirk Enzmann & Ulrich Kohler University of Hamburg, dirk.enzmann@uni-hamburg.de

More information

Introduction to fractional outcome regression models using the fracreg and betareg commands

Introduction to fractional outcome regression models using the fracreg and betareg commands Introduction to fractional outcome regression models using the fracreg and betareg commands Miguel Dorta Staff Statistician StataCorp LP Aguascalientes, Mexico (StataCorp LP) fracreg - betareg May 18,

More information

A Comparison of Univariate Probit and Logit. Models Using Simulation

A Comparison of Univariate Probit and Logit. Models Using Simulation Applied Mathematical Sciences, Vol. 12, 2018, no. 4, 185-204 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2018.818 A Comparison of Univariate Probit and Logit Models Using Simulation Abeer

More information

ECO220Y, Term Test #2

ECO220Y, Term Test #2 ECO220Y, Term Test #2 December 4, 2015, 9:10 11:00 am U of T e-mail: @mail.utoronto.ca Surname (last name): Given name (first name): UTORID: (e.g. lihao8) Instructions: You have 110 minutes. Keep these

More information

A New Look at Technical Progress and Early Retirement

A New Look at Technical Progress and Early Retirement A New Look at Technical Progress and Early Retirement Lorenzo Burlon* Bank of Italy Montserrat Vilalta-Bufí University of Barcelona IZA/RIETI Workshop Changing Demographics and the Labor Market May 25,

More information

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013 Ordinal Multinomial Logistic Thom M. Suhy Southern Methodist University May14th, 2013 GLM Generalized Linear Model (GLM) Framework for statistical analysis (Gelman and Hill, 2007, p. 135) Linear Continuous

More information

FIGURE I.1 / Per Capita Gross Domestic Product and Unemployment Rates. Year

FIGURE I.1 / Per Capita Gross Domestic Product and Unemployment Rates. Year FIGURE I.1 / Per Capita Gross Domestic Product and Unemployment Rates 40,000 12 Real GDP per Capita (Chained 2000 Dollars) 35,000 30,000 25,000 20,000 15,000 10,000 5,000 Real GDP per Capita Unemployment

More information

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley. Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1

More information

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7 Mid-term Exam (November 25, 2005, 0900-1200hr) Instructions: a) Textbooks, lecture notes and calculators are allowed. b) Each must work alone. Cheating will not be tolerated. c) Attempt all the tests.

More information