Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014 In class, Lecture 11, we used a new dataset to examine labor force participation and wages across groups. To do so, we pooled cross- sections of the Current Population Survey, Outgoing Rotational Groups, with 5 year gaps between each cross- section to keep the dataset manageable. Specifically, we merged the cross- sections from 1983, 1988, 1993, 1998, 2003, 2008, and 2013, and used the survey for the fourth month of each group (they were surveyed at multiple points). To begin our study of labor markets, we will focus on labor force participation, which is characterized by a group of dummy variables: empl: 1 if employed, 0 otherwise. unem: 1 if unemployed but in the labor force, 0 otherwise nilf: 1 if not in labor force, 0 otherwise. We use the summarize command to take a first look at these variables:. su empl unem nilf Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- empl 1116395.6139807.4868353 0 1 unem 1116395.041342.1990801 0 1 nilf 1116395.3446773.4752631 0 1 It is also interesting to look at the fraction of the population that is unemployed or underemployed, as in working part- time. The dummy variable unempt is equal to one when the respondent is unemployed or part- time employed.. su unempt Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- unempt 46154.1809161.3849528 0 1 We can evaluate how these variables have changed over time using the tabstat command:. tabstat empl unem unempt nilf, by(year) Summary statistics: mean by categories of: year (Year) year empl unem unempt nilf ---------+---------------------------------------- 1983.5812963.0615547.1632216.357149 1988.6200508.0359243.2089526.3440249 1993.612172.0441608.1951287.3436672 1998.642206.0288856.2268245.3289084 2003.628476.0367215.1716817.3348025 2008.6266061.0352009.1779906.338193 2013.5931076.0435911.1533983.3633013 ---------+---------------------------------------- Total.6139807.041342.1809161.3446773 --------------------------------------------------
Since most labor market statistics are conditioned on the set of the population that is in the labor force, we can condition the tabstat command using if nilf==0, which will calculate the means only using the sample of the pooled cross section for which workers are in the labor force.. tabstat empl unem unempt nilf if nilf==0, by(year) Summary statistics: mean by categories of: year (Year) year empl unem unempt nilf ---------+---------------------------------------- 1983.9042474.0957526.1632216 0 1988.9452353.0547647.2089526 0 1993.9327159.0672841.1951287 0 1998.9569573.0430427.2268245 0 2003.9447961.0552039.1716817 0 2008.946811.053189.1779906 0 2013.9315358.0684642.1533983 0 ---------+---------------------------------------- Total.9369135.0630865.1809161 0 -------------------------------------------------- Not surprisingly, the recent unemployment rate of 6-7% is reflected in the mean of unem, conditional on labor force participation. As this is how unemployment rates are calculated, this suggests that our dataset is a pretty meaningful representation of the US Labor Force and participation within. 1 Labor Force Participation Regressions In Lecture Module 11, we specified a linear probability model to study labor force participation rates as a function of education, age, age^2, gender, and demographic characteristics. We first need to code our demographic characteristics from the survey results (in the variable wbho ):. gen age2 = age^2. gen black = 0. replace black = 1 if wbho==2. gen hispanic = 0. replace hispanic = 1 if wbho==3. gen other = 0. replace other = 1 if wbho==4 The outside group is white. We will also include year fixed effects, which we will estimate using the i.year command within the regression specification. The code and results for the regression are listed in Regression 1A. However, year fixed effects may not be sufficient if there are reasons why education levels, the age of the workforce, and composition of the population may change within states across time. As this is a 30- year collection of cross- sections 5 years apart, large changes that are differential to states could happen. So, to control for these possibilities, or more specifically absorb state- year specific shocks, we will treat state- year combinations as groups and estimate using fixed effects. To define state year groups, we use:. egen state_year = group(state year) Then, we run a fixed effects regression with xtreg, but using i(state_year) as an option (after the fe ). The precise code and results are below in Regression 1B.
REGRESSION 1A. reg nilf educ age age2 i.year female black hispanic other Source SS df MS Number of obs = 1114685 -------------+------------------------------ F( 13,1114671) =38250.21 Model 77603.4956 13 5969.49966 Prob > F = 0.0000 Residual 173960.5451114671.156064475 R-squared = 0.3085 -------------+------------------------------ Adj R-squared = 0.3085 Total 251564.0411114684.225681934 Root MSE =.39505 nilf Coef. Std. Err. t P> t [95% Conf. Interval] educ -.0587172.0003404-172.48 0.000 -.0593844 -.0580499 age -.0333336.0001092-305.14 0.000 -.0335477 -.0331195 age2.0004477 1.13e-06 397.47 0.000.0004455.0004499 year 1988 -.0107145.0013691-7.83 0.000 -.013398 -.0080311 1993 -.0059709.0013667-4.37 0.000 -.0086495 -.0032922 1998 -.0152777.0014244-10.73 0.000 -.0180695 -.012486 2003 -.0028469.0013742-2.07 0.038 -.0055403 -.0001534 2008 -.0073528.0013927-5.28 0.000 -.0100825 -.0046232 2013.0094398.0013996 6.74 0.000.0066967.0121829 female.1315024.0007509 175.12 0.000.1300306.1329742 black.041159.0012912 31.88 0.000.0386284.0436897 hispanic.0214434.0014136 15.17 0.000.0186728.0242141 other.063026.0017417 36.19 0.000.0596124.0664396 _cons.8689099.0025079 346.47 0.000.8639944.8738253 REGRESSION 1B. xtreg nilf educ age age2 female black hispanic other, i(state_year) fe Fixed-effects (within) regression Number of obs = 1114685 Group variable: state_year Number of groups = 357 R-sq: within = 0.3071 Obs per group: min = 1262 between = 0.5021 avg = 3122.4 overall = 0.3082 max = 14772 F(7,1114321) = 70558.75 corr(u_i, Xb) = 0.0184 Prob > F = 0.0000 nilf Coef. Std. Err. t P> t [95% Conf. Interval] educ -.0583487.0003431-170.07 0.000 -.0590211 -.0576762 age -.0334134.0001091-306.26 0.000 -.0336272 -.0331996 age2.0004482 1.13e-06 398.37 0.000.000446.0004504 female.1312328.0007493 175.14 0.000.1297641.1327014 black.0328807.0013498 24.36 0.000.0302351.0355264 hispanic.0118378.0015081 7.85 0.000.008882.0147936 other.0648217.0018745 34.58 0.000.0611477.0684957 _cons.8674742.0024177 358.80 0.000.8627356.8722129 sigma_u.03225172 sigma_e.39415021 rho.00665096 (fraction of variance due to u_i) F test that all u_i=0: F(356, 1114321) = 16.35 Prob > F = 0.0000
To add contrast to our results related to labor force participation, we now condition the sample to only those in the workforce, and evaluate the same factors and their relationship to unemployment status. We allow for state- year fixed effects, since unemployment rates across states due to local shocks and other factors that are not national. The code and regression results are below in Regression 1C. REGRESSION 1C. xtreg unem educ age age2 female black hispanic other if nilf==0, i(state_year) fe Fixed-effects (within) regression Number of obs = 731170 Group variable: state_year Number of groups = 357 R-sq: within = 0.0300 Obs per group: min = 785 between = 0.1774 avg = 2048.1 overall = 0.0312 max = 9663 F(7,730806) = 3231.24 corr(u_i, Xb) = 0.0084 Prob > F = 0.0000 unem Coef. Std. Err. t P> t [95% Conf. Interval] educ -.0187377.000257-72.92 0.000 -.0192413 -.0182341 age -.0071768.0001129-63.59 0.000 -.007398 -.0069556 age2.0000665 1.32e-06 50.23 0.000.0000639.0000691 female -.0056194.0005602-10.03 0.000 -.0067174 -.0045214 black.0642282.0010287 62.44 0.000.062212.0662443 hispanic.0160746.0011127 14.45 0.000.0138937.0182556 other.0197317.0014015 14.08 0.000.0169849.0224786 _cons.2767253.0022465 123.18 0.000.2723222.2811283 sigma_u.02089801 sigma_e.23842608 rho.00762393 (fraction of variance due to u_i) F test that all u_i=0: F(356, 730806) = 15.47 Prob > F = 0.0000 Review Questions for Final 1a. Within state-year groups, calculate the age at which labor force participation is maximized or minimized. Is this a maximum or minimum? How do we know? Be careful about the definition of nilf (not in labor force) when answering this question. 1b. Within state-year groups, calculate the age at which unemployment is maximized or minimized. Is this a maximum or minimum? How do we know? 1c. Going from Regression 1A to Regression 1B, some coefficients change a bit, while others do not (educ, female). What do you think the state-year fixed effects are controlling for in this case? Think omitted variables here. 1d. In Regression 1C, please interpret the coefficients on educ, female and black.
2 Wage Gap Regressions In this section, we present the detailed code and related questions for our discussion of wage gaps. We use the same dataset as above. To begin, we use the real wage, rw, which is the wage of the respondent divided by a local price index, and transform using natural logs:. gen ln_rw = ln(rw) After transforming the variable into natural logs, we regress the real wage of each respondent on their education, age, and demographics, using year fixed effects. The code and results are in Regression 2A. REGRESSION 2A. xtreg ln_rw educ age age2 female black hispanic other if nilf==0, i(year) fe warning: existing panel variable is not year Fixed-effects (within) regression Number of obs = 598155 Group variable: year Number of groups = 7 R-sq: within = 0.3499 Obs per group: min = 78994 between = 0.8909 avg = 85450.7 overall = 0.3557 max = 89543 F(7,598141) = 45993.87 corr(u_i, Xb) = 0.0626 Prob > F = 0.0000 educ.2018287.0005673 355.79 0.000.2007169.2029405 age.0635323.0002598 244.51 0.000.063023.0640415 age2 -.0006443 3.09e-06-208.27 0.000 -.0006504 -.0006382 female -.2545932.001236-205.98 0.000 -.2570158 -.2521706 black -.1055568.0021608-48.85 0.000 -.109792 -.1013217 hispanic -.0892944.0022784-39.19 0.000 -.0937599 -.0848288 other -.0398047.002868-13.88 0.000 -.0454259 -.0341835 _cons.9987371.0050874 196.31 0.000.9887659 1.008708 sigma_u.02503829 sigma_e.47686742 rho.00274928 (fraction of variance due to u_i) F test that all u_i=0: F(6, 598141) = 237.22 Prob > F = 0.0000 Next, we use state_year fixed effects as above rather than year fixed effects to absorb changes in wages attributable to state- year groups that are also correlated to demographic changes. The code and results are below in Regression 2B. REGRESSION 2B. xtreg ln_rw educ age age2 female black hispanic other if nilf==0, i(state_year) fe warning: existing panel variable is not state_year Fixed-effects (within) regression Number of obs = 598155 Group variable: state_year Number of groups = 357 R-sq: within = 0.3514 Obs per group: min = 638 between = 0.4984 avg = 1675.5 overall = 0.3547 max = 7478
F(7,597791) = 46264.39 corr(u_i, Xb) = 0.0519 Prob > F = 0.0000 educ.194149.0005637 344.43 0.000.1930442.1952538 age.0636368.0002559 248.65 0.000.0631352.0641384 age2 -.0006464 3.05e-06-212.14 0.000 -.0006523 -.0006404 female -.2543043.0012164-209.07 0.000 -.2566883 -.2519202 black -.1257219.0022224-56.57 0.000 -.1300777 -.1213661 hispanic -.1347845.0023991-56.18 0.000 -.1394867 -.1300823 other -.0819694.0030604-26.78 0.000 -.0879678 -.0759711 _cons 1.027283.0050173 204.75 0.000 1.017449 1.037116 sigma_u.09603396 sigma_e.46909875 rho.04022452 (fraction of variance due to u_i) F test that all u_i=0: F(356, 597791) = 61.23 Prob > F = 0.0000 Review Questions for Final 2a. Please interpret precisely the coefficient on female for both regressions 2A and 2B. 2b. Using Regression 2B, please calculate and interpret precisely the difference in wage for a black female compared to a white male. Next, we will evaluate how the wage gap has changed over time. We will focus on the male- female wage gap for now. Though this can be done in a variety of ways, the plan will be to first define a year specific dummy variable for females. That is, we are now (for example) allowing for the male- female gap to be different in 1983 from its value in 2003. The code for this is below:. gen female83 = female. gen female88 = female. gen female93 = female. gen female98 = female. gen female03 = female. gen female08 = female. gen female13 = female. replace female83 = 0 if year!=1983. replace female88 = 0 if year!=1988. replace female93 = 0 if year!=1993. replace female98 = 0 if year!=1998. replace female03 = 0 if year!=2003. replace female08 = 0 if year!=2008. replace female13 = 0 if year!=2013 The gen command assigns a variable identical to female, and then the replace command gives a zero to all observations not of that stated year. The results of replacing female with these seven variables in the within state- year regression is below in Regression 2C.
REGRESSION 2C. xtreg ln_rw educ age age2 female83 female88 female93 female98 female03 female08 female13 black hispanic other if nilf==0, i(state_year) fe Fixed-effects (within) regression Number of obs = 598155 Group variable: state_year Number of groups = 357 R-sq: within = 0.3528 Obs per group: min = 638 between = 0.4354 avg = 1675.5 overall = 0.3559 max = 7478 F(13,597785) = 25069.75 corr(u_i, Xb) = 0.0155 Prob > F = 0.0000 educ.1936641.0005633 343.83 0.000.1925601.194768 age.063622.0002557 248.86 0.000.0631209.0641231 age2 -.0006464 3.04e-06-212.37 0.000 -.0006523 -.0006404 female83 -.3344265.0031891-104.86 0.000 -.3406771 -.3281759 female88 -.3062215.0031881-96.05 0.000 -.3124701 -.2999729 female93 -.2397609.0031825-75.34 0.000 -.2459985 -.2335232 female98 -.2416921.0033374-72.42 0.000 -.2482332 -.2351509 female03 -.2278291.0031345-72.68 0.000 -.2339726 -.2216856 female08 -.2242251.0031918-70.25 0.000 -.2304809 -.2179693 female13 -.2033225.0032617-62.34 0.000 -.2097154 -.1969296 black -.1260882.00222-56.80 0.000 -.1304393 -.1217372 hispanic -.1343798.0023965-56.07 0.000 -.1390769 -.1296827 other -.0817574.0030571-26.74 0.000 -.0877492 -.0757657 _cons 1.028751.0050119 205.26 0.000 1.018928 1.038574 sigma_u.09581929 sigma_e.46857791 rho.04013759 (fraction of variance due to u_i) F test that all u_i=0: F(356, 597785) = 59.96 Prob > F = 0.0000 Review Questions for Final 2c. Please comment on the direction of the wage gap over time. Precisely, please interpret the change in the wage gap from 1983 to 2013, as evidenced in Regression 2C. 2d. Suppose, that I want to test precisely the difference between the coefficient on female83 and female13. Please derive a regression that allows me to do this. Show your work! Next, we d like to evaluate these results by looking not just within state- year groups, but adding industries and occupations to the mix. Within the dataset, we use the two- digit industry classification, ind_2d, and the two digit occupational classification, docc03, for this purpose. Since the industry and occupational classifications are available only for 2003 onward, we drop observations for which either are not available using drop if ind_2d==. docc03==.. Then, we define industry- state- year groups, occupation- state- year groups, and then industry- occupation- state- year groups:.egen ind_state_year = group(ind_2d state year).egen occ2_state_year = group(docc03 state year).egen ind_occ2_state_year = group(ind_2d docc03 state year)
REGRESSION 2D xtreg ln_rw educ age age2 female black hispanic other if nilf==0, i(state_year) fe Fixed-effects (within) regression Number of obs = 258721 Group variable: state_year Number of groups = 153 R-sq: within = 0.3473 Obs per group: min = 638 between = 0.4046 avg = 1691.0 overall = 0.3466 max = 6935 F(7,258561) = 19657.57 corr(u_i, Xb) = 0.0242 Prob > F = 0.0000 educ.2161984.0008921 242.35 0.000.2144499.2179469 age.0593373.0003969 149.50 0.000.0585594.0601153 age2 -.00059 4.62e-06-127.71 0.000 -.0005991 -.000581 female -.2209542.0019147-115.40 0.000 -.224707 -.2172015 black -.1397012.0035137-39.76 0.000 -.146588 -.1328144 hispanic -.1229709.0033608-36.59 0.000 -.1295579 -.1163839 other -.0574861.0042478-13.53 0.000 -.0658117 -.0491605 _cons 1.039937.0080455 129.26 0.000 1.024168 1.055706 sigma_u.08300552 sigma_e.4852719 rho.02842624 (fraction of variance due to u_i) F test that all u_i=0: F(152, 258561) = 44.12 Prob > F = 0.0000 REGRESSION 2E. xtreg ln_rw educ age age2 female black hispanic other if nilf==0, i(ind_state_year) fe Fixed-effects (within) regression Number of obs = 258721 Group variable: ind_state_~r Number of groups = 7208 R-sq: within = 0.2669 Obs per group: min = 1 between = 0.5068 avg = 35.9 overall = 0.3463 max = 750 F(7,251506) = 13080.20 corr(u_i, Xb) = 0.2568 Prob > F = 0.0000 educ.1948149.0009537 204.28 0.000.1929457.1966841 age.0480302.0003956 121.42 0.000.0472549.0488055 age2 -.0004731 4.57e-06-103.45 0.000 -.0004821 -.0004642 female -.1830613.0020392-89.77 0.000 -.1870581 -.1790646 black -.1291588.0034449-37.49 0.000 -.1359107 -.1224069 hispanic -.0979738.0033067-29.63 0.000 -.1044548 -.0914929 other -.0525642.0041416-12.69 0.000 -.0606817 -.0444467 _cons 1.325345.0082127 161.38 0.000 1.309248 1.341441 sigma_u.25011209 sigma_e.46296176 rho.22592415 (fraction of variance due to u_i) F test that all u_i=0: F(7207, 251506) = 5.54 Prob > F = 0.0000
REGRESSION 2F. xtreg ln_rw educ age age2 female black hispanic other if nilf==0, i(occ2_state_year) fe Fixed-effects (within) regression Number of obs = 258721 Group variable: occ2_state~r Number of groups = 3361 R-sq: within = 0.2105 Obs per group: min = 1 between = 0.7906 avg = 77.0 overall = 0.3429 max = 1039 F(7,255353) = 9727.83 corr(u_i, Xb) = 0.3934 Prob > F = 0.0000 educ.1464703.0010065 145.52 0.000.1444975.148443 age.048822.000378 129.15 0.000.0480811.049563 age2 -.0004799 4.38e-06-109.58 0.000 -.0004884 -.0004713 female -.1866442.0020564-90.76 0.000 -.1906748 -.1826136 black -.0983273.003322-29.60 0.000 -.1048384 -.0918163 hispanic -.0786142.0032107-24.49 0.000 -.0849071 -.0723213 other -.0473485.0039969-11.85 0.000 -.0551823 -.0395147 _cons 1.44415.0079293 182.13 0.000 1.428608 1.459691 sigma_u.2414583 sigma_e.44916715 rho.22419298 (fraction of variance due to u_i) F test that all u_i=0: F(3360, 255353) = 16.15 Prob > F = 0.0000 REGRESSION 2G. xtreg ln_rw educ age age2 female black hispanic other if nilf==0, i(ind_occ2_state_year) fe Fixed-effects (within) regression Number of obs = 258721 Group variable: ind_occ2_s~r Number of groups = 46724 R-sq: within = 0.1690 Obs per group: min = 1 between = 0.4172 avg = 5.5 overall = 0.3423 max = 402 F(7,211990) = 6160.71 corr(u_i, Xb) = 0.3738 Prob > F = 0.0000 educ.1304153.0011223 116.20 0.000.1282155.132615 age.0427017.0004135 103.28 0.000.0418913.043512 age2 -.0004159 4.78e-06-86.93 0.000 -.0004252 -.0004065 female -.1702831.0023029-73.94 0.000 -.1747968 -.1657694 black -.0879955.0036047-24.41 0.000 -.0950606 -.0809303 hispanic -.0686034.0034313-19.99 0.000 -.0753287 -.061878 other -.0396592.0043013-9.22 0.000 -.0480897 -.0312287 _cons 1.612161.0087454 184.34 0.000 1.59502 1.629302 sigma_u.4084345 sigma_e.43896446 rho.46401885 (fraction of variance due to u_i) F test that all u_i=0: F(46723, 211990) = 2.40 Prob > F = 0.0000
Review Questions for Final 2e. Do industries and occupations contribute to the wage gap (ie. different genders and races selecting into different industries and occupations), or is the wage gap amplified when looking within industries or occupations? 2f. Suppose that I claim within industry-occupation-state-year groups, the male-female wage gap is exactly twice as large as the white-black wage gap. Please write this hypothesis, and a suitable alternative. Please derive an estimating equation that allows for one to test this hypothesis. 2g. Write out code that does the following. Within industry-state-year groups, evaluate the differences in the male-female wage gap as a function of having a college degree. Put differently, does having a college degree affect the size/direction of the wage gap? Write out the regression specification you wish to estimate, and the code that will do it (including any variables that you need to generate).