PAD 705: Research Methods II R. Karl Rethemeyer Department of Public Administration and Policy Rockefeller College of Public Affair & Policy University at Albany State University of New York Final Exam Spring 2004 Name: May 9, 2005 ID Number: Please put ONLY your ID number on the blue books. Three (3) points will be deducted for each time your name appears in a blue book. Tear off this page and turn it in separately. Exam Instructions. You will have 120 minutes to complete this exam. The number of points corresponds to the number of minutes you should spend on each problem. Work the problems in your blue books. Use one blue book for each Part of the exam. Show all of your work and reasoning partial credit will be awarded on all questions. If you have a lot of scratch-throughs or computations, please circle your answer. Work quickly and try not to leave any questions unanswered. The exam is 120 minutes long. You MUST stop writing IMMEDIATELY when time is called. Five (5) points will be deducted for the first and second warning to stop writing. Continuing to write after two warnings constitutes exam misconduct and will result in a score of zero for the exam. Note: Making reference to a Stata command is usually NOT a complete answer though it may get you partial credit For full credit, you must explain conceptually why your choice of a particular command is correct. Good Luck!!
Part I (30 points -- 5 points for each question) Evaluate each of these statements. State whether each is true, partially true, or false and then provide a brief explanation for your answer. 1. In principal components analysis communality (h 2 ) is less than or equal to 1.0. 2. Polynomial distributed lags are not recommended when the number of observations (N) is low because polynomial distributed lags use more degrees of freedom than standard methods for lagged variables. 3. Heteroskedasticity causes bias and inconsistency in coefficient estimates. 4. Maximum likelihood cannot be used to estimate models that fulfill the Gauss-Markov conditions. 5. If the estimated coefficient on age in a probit model is greater than in a logit model then we can say that in the probit model age has a larger effect on the probability of a success than in the logit model. 6. If the independent variables are non-stochastic then OLS is biased and inconsistent. 2
Part II (30 points) It has been known for many years that young people who fail to complete high school face many more problems in later life than do people who graduate. While national leaders have demanded that schools, communities, and families make a major effort to retain students, the dropout rate remains high. According to the Educational Testing Service nearly one in three U.S. high school students drop out. * Governor Pataki has asked your firm to study the national determinants of high school drop out. A colleague has constructed a sample from the 1990 U.S. Census microdata of students 15 to 18 years of age (the age at which students may leave high school). Table I contains the preliminary analysis she completed. Each regression states underneath the header what technique was used for the estimation. Table II contains the variable definition for the dataset. Answer all questions for this Part using Tables I and II. Remember: Whenever possible you should specifically state and test a hypothesis. 1. (5 points) Using regression (6): Are men less likely to drop out then women, all else held constant? 2. (5 points) Are the ethnic variables in regression (12) (i.e., natam, chinese, japanese, pacisle, and other) jointly signficant? 3. (10 points) What is the probability that a black male 17-year old in a family of 4 with a mother but no father who has completed 10 years of education, does not have a child, is not married, earns $325/month, and whose family earns 135% of the U.S. poverty line will drop out of school? Show all your work. 4. (5 points) Explain the change in the size of the male coefficient between regressions (5) and (6). What does the change imply, if anything, about the underlying relationship between the ceofficients in regressions (5) and (6). Be specific about the reason for and direction of the change in the male coefficient. 5. (5 points) Using regression (14): Holding all else constant does having a child increase the chances of droping out by at least 20%? 6. (5 points) A colleague who was analyzing regression (13) stated: I think something is wrong; the coefficient on black is negative and statistically signficant but a tabluation of dropouts (see below) shows that blacks drop out a higher rate than whites. Is the colleague s analysis correct? Is something wrong with regression (13)? If the colleague is correct regarding the direction and signficance of black, is there an explanation for the discrepancy between the ceofficient and the tabulation results? Person dropped out black 0 1 Total -----------+----------------------+---------- 0 10,482 1,007 11,489 91.24 8.76 100.00 -----------+----------------------+---------- 1 1,898 206 2,104 90.21 9.79 100.00 -----------+----------------------+---------- Total 12,380 1,213 13,593 91.08 8.92 100.00 * Adapted from Focus Adolescent Services (http://www.focusas.com/dropouts.html) 3
Table I Dependent Variable: drpout (1 = if teen dropped out of high school; 0 otherwise) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Probit Probit Probit Probit Probit Probit Probit Probit Probit Probit Probit Probit Logit LPM constant -1.366-1.343-1.773 (0.046) -1.954 (0.049) -0.429 (0.092) -0.412 (0.094) -8.034 (0.297) -7.83 (0.3) -7.271 (0.331) -6.743 (0.338) -6.751 (0.338) -6.766 (0.339) -13.586 (0.674) -0.823 (0.047) male 0.038 (0.03) 0.039 (0.03) 0.058 (0.031) 0.068 (0.031) (0.032) 0.131 (0.034) 0.022 0.026 0.026 0.029 0.054 (0.073) 0.001 (0.005) famsize -0.005 (0.008) 0.056 (0.008) 0.073 (0.008) 0.048 (0.009) 0.035 (0.009) 0.027 0.029 0.016 0.019 0.021 0.039 (0.019) 0.004 (0.001) nomom 0.734 (0.039) 0.576 (0.042) 0.657 (0.043) 0.576 (0.044) 0.44 (0.047) 0.433 (0.048) 0.403 (0.048) 0.31 0.305 0.31 0.587 (0.095) 0.071 (0.007) nopop 0.364 0.366 (0.037) 0.308 (0.037) 0.252 0.25 0.242 0.086 (0.045) 0.11 (0.046) 0.109 (0.046) 0.212 (0.09) 0.021 (0.006) educ99-0.203 (0.011) -0.206 (0.011) -0.472-0.478-0.478-0.47-0.469-0.469-0.922 (0.032) -0.078 (0.002) chld 1.009 (0.077) 0.812 (0.082) 0.807 (0.082) 0.712 0.673 0.695 0.692 1.225 (0.155) 0.191 (0.015) age 0.58 0.567 0.565 0.555 0.554 0.555 1.115 (0.043) 0.098 (0.003) inctot 0.000025 0.000024 0.000032 0.000032 0.000032 0.000058 (0.00001) 0.000005 (0.000001) marst -0.087-0.097-0.093-0.092-0.155-0.026 (0.004) poverty -0.001-0.001-0.001-0.002-0.00013 (0.000016) black -0.118 (0.049) -0.128-0.303 (0.097) -0.021 (0.007) natam 0.005 (0.149) 0.058 (0.278) 0.003 chinese -0.353 (0.184) -0.902 (0.405) -0.027 (0.019) japanese -0.252 (0.16) -0.52 (0.334) -0.027 (0.018) pacisle 0.074 (0.325) 0.213 (0.625) 0.001 (0.044) other -0.312 (0.302) -0.463 (0.583) -0.017 (0.03) N 13593 13593 13593 13593 13593 13593 13593 13593 13593 13593 13593 13593 13593 13593 Pseudo-R 2 / Adj R 2 0.0002 0.0003 0.0428 0.0550 0.1050 0.1252 0.2267 0.2267 0.2300 0.2318 0.2318 0.2401 0.2490 0.1825 Log- Likelihood -4087.6-4087.3-3913.2-3863.3-3659.2-3576.5-3161.7-3161.3-3148.2-3140.5-3109.6-3106.7-3070.49 N/A Note: LPM = Linear probability model 4
Table II Variable Definitions drpout = 1 if teen dropped out male = 1 if teen is male famsize Number of own family members in household nomom = 1 if teen s mother is not in household nopop = 1 teen's father is not in household educ99 Educational attainment in years chld =1 if teen is a parent of a child age Age inctot Teen's own total income from all sources marst Teen s marital status poverty Family income as % of Fed poverty level (in whole numbers i.e., 266% = 266) black = 1 if teen is black natam = 1 if teen is a Native American chinese = 1 if teen is of Chinese decent japanese = 1 if teen is of Japanese decent pacisle = 1 if teen is of Pacific Island or other Asian decent other = 1 if teen is not otherwise defined in a category white = 1 if teen is white Note: Each observation is coded into one of the race and ethnicity categories, and each race & ethnicity variable is mutually exclusive (i.e., no person is both black and other, for instance). Thus the race and ethnicity variables are mutually exclusive and collectively exhaustive. 5
Part IV (30 points) a) The following system of equations is proposed as a model of the relationship between the price of gasoline, the number of cars on U.S. roads, government expenditures on public transportation (subways, buses, etc.), and the amount of carbon monoxide (CO) pollution created each year. Eq1: PGAS t,i = β 0 + β 1 #CARS t,i + β 2 PUBTRANS t,i + β 3 INC t-1,i + β 4 INT t,i + η t,i Eq2: #CARS t,i = α 0 + α l PGAS t,i + α 2 PGAS t-1,i + α 3 POP t,i + α 4 INC t-1,i + α 5 PUBTRANS t,i + α 6 PPUBTRANS t,i + α 7 PCARS t,i + α 8 PCARS t-1,i α 9 INT t,i + ε t,i Eq3: Eq4: PUBTRANS t,i = δ 0 + δ 1 #CARS t,i + δ 2 INC t-1,i + δ 3 PPUBTRANS t,i + δ 4 PCARS t,i + δ 5 PCARS t-1,i + δ 6 PGAS t,i + ν t,i CO t,i = γ 0 + γ 1 #CARS t,i + γ 2 PUBTRANS t,i + γ 3 ELEC t,i + υ t,i Where #CARS is the number of cars operating the U.S.; PGAS is the average annual gasoline price in the U.S.; POP is the population of the U.S.; INC is the total U.S. national earnings in a given year; PUBTRANS is the amount spent on public transportation each year by government; PCARS is the average annual price of cars in the U.S.; INT is the average domestic interest rate; ELEC is the total kilowatt hours of electricity generated in a given year; PPUBTRANS is the average price of a trip on public transportation in a given year; CO is the amount of carbon monoxide produced in the U.S. each year in metric tons. 1. (5 points) Would OLS be an appropriate estimation procedure for the first equation? Would OLS be an appropriate estimation procedure for the fourth equation? Explain why or why not for each. 2. Parts a & b are jointly worth 20 points: a) Can the parameters of the third equation be estimated consistently? If so, write down the equations that you would estimate and explain exactly the procedures you would use. It may be helpful to refer to Stata commands here, but be sure to explain why you used a particular command and what the command does if you refer to Stata. If not, state why not. b) Can the parameters of the second equation be estimated consistently? If so, write down the equations that you would estimate and explain exactly the procedures you would use. It may be helpful to refer to Stata commands here, but be sure to explain why you used a particular command and what the command does if you refer to Stata. If not, state why not. 3. (5 points) The modelers who created this system of equations hypothesized that PPUBTRANS is exogenous. Do you agree with this assessment? If so, why? If not, why not and what would have to change about the system to take this change into account? 6
Part IV (30 points -- 5 points for each question) Consider the following equation that models the property tax rate of NY towns and cities (indexed by i) in time period t: TAXRATE t,i = β 0 + β 1 DEM t,i + β 2 EDLEVEL t,i, + β 3 PROPVALUE t-1,i + β 4 DEM t,i *PROPVALUE t-1,i + β 5 UPSTATE t,i + β 6 NYC t,i + β 7 POP t,i + β 8 DEM t,i *UPSTATE t,i + β 9 PARKLAND t,i + β 10 LIBRARY t,i + β 11 CRIMERATE t-1,i + ε t,i Where TAXRATE is the tax rate in dollars per $1,000 assessed valuation; DEM is 1 when a Democrat is town manager; EDLEVEL is the average number of years of education among residents of the town; PROPVALUE is the total assessed property value for land in the town in millions of dollars; UPSTATE is 1 if the county is north of the Mohawk River and east of Utica; NYC is 1 if the town is part of New York City or the counties contiguous with it; POP is the total population of the town as reported in the 2000 U.S. Census; PARKLAND is the total amount of park land in the town in hectares; LIBRARY is 1 if the town supports a library; and CRIMERATE is the number of crimes in the town per 1,000 population. There are 425 towns and cities in New York; the dataset spans 10 years (1994-2004). 1. Explain what the coefficient β 4 measures as precisely as possible. Then explain as precisely as possible what the implication would be of finding that β 4 = 0.00043 and is statistically significant at the 1% level. 2. What would happen to the estimation of β 3 if it was discovered that the population is systematically underreported in the Census? 3. How would you test whether Democratic town managers tend to impose higher property taxes? Be as specific as possible: Write down the hypothesis, the statistic you would need to calculate (in equation form), and the cut-off value for the statistic to be significant at the 95% level. Be sure to report degrees of freedom, if needed. 4. Is ordinary least squares (OLS) an appropriate technique to estimate the coefficients for this equation with the dataset as described above? Why or why not? 5. Let s assume that the results above are heteroskedastic. Draw a graph showing the relationship between one of the independent variables and TAXRATE that would support the conclusion that heteroskedasticity is present. 6. Now assume that the State of New York imposed a minimum tax rate of $2.50 per $1,000 assessed valuation in 1994. Would OLS be appropriate in this circumstance? 7