Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Similar documents
Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

Final Exam - section 1. Thursday, December hours, 30 minutes

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Logistic Regression Analysis

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

1) The Effect of Recent Tax Changes on Taxable Income

Module 4 Bivariate Regressions

West Coast Stata Users Group Meeting, October 25, 2007

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

Model fit assessment via marginal model plots

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

Catherine De Vries, Spyros Kosmidis & Andreas Murr

EC327: Limited Dependent Variables and Sample Selection Binomial probit: probit

STA 4504/5503 Sample questions for exam True-False questions.

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Sociology 704: Topics in Multivariate Statistics Instructor: Natasha Sarkisian. Binary Logit

a. Explain why the coefficients change in the observed direction when switching from OLS to Tobit estimation.

Allison notes there are two conditions for using fixed effects methods.

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Limited Dependent Variables

Intro to GLM Day 2: GLM and Maximum Likelihood

Econometrics is. The estimation of relationships suggested by economic theory

Effect of Education on Wage Earning

Econometric Methods for Valuation Analysis

ECON Introductory Econometrics. Seminar 4. Stock and Watson Chapter 8

u panel_lecture . sum

Calculating the Probabilities of Member Engagement

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Problem Set 9 Heteroskedasticty Answers

Labor Force Participation and the Wage Gap Detailed Notes and Code Econometrics 113 Spring 2014

Quantitative Techniques Term 2

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Problem set 1 Answers: 0 ( )= [ 0 ( +1 )] = [ ( +1 )]

Advanced Econometrics

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Didacticiel - Études de cas. In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA.

Introduction to fractional outcome regression models using the fracreg and betareg commands

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Impact of Household Income on Poverty Levels

Cameron ECON 132 (Health Economics): FIRST MIDTERM EXAM (A) Fall 17

Problem Set 2. PPPA 6022 Due in class, on paper, March 5. Some overall instructions:

To be two or not be two, that is a LOGISTIC question

Why do the youth in Jamaica neither study nor work? Evidence from JSLC 2001

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

Technical Documentation for Household Demographics Projection

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Lecture 10: Alternatives to OLS with limited dependent variables, part 1. PEA vs APE Logit/Probit

An Examination of the Impact of the Texas Methodist Foundation Clergy Development Program. on the United Methodist Church in Texas

You created this PDF from an application that is not licensed to print to novapdf printer (

Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541

Does Capitalism Flow to Poor Countries?

Duration Models: Parametric Models

Morten Frydenberg Wednesday, 12 May 2004

The Multivariate Regression Model

Labor Market Returns to Two- and Four- Year Colleges. Paper by Kane and Rouse Replicated by Andreas Kraft

STATA Program for OLS cps87_or.do

The relationship between GDP, labor force and health expenditure in European countries

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

Jet Fuel-Heating Oil Futures Cross Hedging -Classroom Applications Using Bloomberg Terminal

Valuing Environmental Impacts: Practical Guidelines for the Use of Value Transfer in Policy and Project Appraisal

ORDERED MULTINOMIAL LOGISTIC REGRESSION ANALYSIS. Pooja Shivraj Southern Methodist University

Creation of Synthetic Discrete Response Regression Models

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7

Chapter 6 Part 3 October 21, Bootstrapping

Example 2.3: CEO Salary and Return on Equity. Salary for ROE = 0. Salary for ROE = 30. Example 2.4: Wage and Education

Web Appendix Figure 1. Operational Steps of Experiment

Lecture 21: Logit Models for Multinomial Responses Continued

CHAPTER 2 ESTIMATION AND PROJECTION OF LIFETIME EARNINGS

OVAL OFFICE, CHRISTIE PERFECT TOGETHER? NEW JERSEY VOTERS DON T SEE GOVERNOR AS GOOD FIT FOR PRESIDENT

Modeling wages of females in the UK

Description Remarks and examples References Also see

Supporting Information: Preferences for International Redistribution: The Divide over the Eurozone Bailouts

CHAPTER 4 ESTIMATES OF RETIREMENT, SOCIAL SECURITY BENEFIT TAKE-UP, AND EARNINGS AFTER AGE 50

Heteroskedasticity. . reg wage black exper educ married tenure

Two-stage least squares examples. Angrist: Vietnam Draft Lottery Men, Cohorts. Vietnam era service

Nonlinear Econometric Analysis (ECO 722) Answers to Homework 4

Itasca County Wellness Court Evaluation

Cross- Country Effects of Inflation on National Savings

Percentage of foreclosures in the area is the ratio between the monthly foreclosures and the number of outstanding home-related loans in the Zip code

1. Overall approach to the tool development

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

9. Logit and Probit Models For Dichotomous Data

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Duration Models: Modeling Strategies

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Comparing Odds Ratios and Marginal Effects from Logistic Regression and Linear Probability Models

Estimating Heterogeneous Choice Models with Stata

HuffPost: Midterm elections March 23-26, US Adults

Creating synthetic discrete-response regression models

Problem Set 6 ANSWERS

Rockefeller College University at Albany

First Midterm Examination Econ 103, Statistics for Economists February 16th, 2016

SAMURAI SCROOGE: IMPORTANT CONCEPTS

Transcription:

Sociology 63993 Exam 3 Answer Key - DRAFT May 8, 2007 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. The odds of an event occurring range from negative infinity to positive infinity. False. Odds range from 0 to positive infinity. 2. If a model is not identified, then OLS regression should be used to estimate its parameters. False. If you want to estimate the model, you need to figure out how to get it identified, and then use a technique like 2sls. 3. McFadden s Pseudo R 2 is popular because it uses the exact same formula for R 2 that is used in OLS regression. False. McFadden s Pseudo R 2 is a logical analog to OLS R 2, but the formulas are not identical. True. 4. A Brant test can be used to test the assumptions of the ordered logit model. 5. The main advantage of a multinomial logit model over an ordered logit model is that the multinomial logit model has fewer parameters and is hence easier to interpret. False. Just the opposite is true; the ordered logit model is more parsimonious (but its assumptions are not always met.) II. Short answer. (25 pts each, 50 pts total). Answer both of the following. II-1. (25 points) There is growing concern about diabetes in the United States. A demographer and a public health researcher have joined together to examine how demographic characteristics are related to the risk of having diabetes. They have gathered information from more than 20,000 individuals on the following: Variable diabetes black female age Description Coded 1 if respondent has diabetes, 0 otherwise Coded 1 if black, 0 otherwise Coded 1 if female, 0 if male Age in years They obtain the following results: Sociology 63993 Exam 3 Answer Key Page 1

. nestreg, lr: logit diabetes black female age, nolog Block 1: black Logistic regression Number of obs = 20670 LR chi2(1) = 43.58 Prob > chi2 = 0.0000 Log likelihood = -3976.344 Pseudo R2 = 0.0054 ------------ diabetes Coef. Std. Err. z P> z [95% Conf. Interval] ------------ black.609734.0870692 7.00 0.000.4390816.7803864 _cons -3.063142.0355983-86.05 0.000-3.132913-2.993371 ------------ Block 2: female Logistic regression Number of obs = 20670 LR chi2(2) = 50.02 Prob > chi2 = 0.0000 Log likelihood = -3973.1221 Pseudo R2 = 0.0063 ------------ diabetes Coef. Std. Err. z P> z [95% Conf. Interval] ------------ black.6072815.0870934 6.97 0.000.4365816.7779813 female.1658387.0655102 2.53 0.011.0374412.2942363 _cons -3.15304.0511592-61.63 0.000-3.25331-3.05277 ------------ Block 3: age Logistic regression Number of obs = 20670 LR chi2(3) = 748.34 Prob > chi2 = 0.0000 Log likelihood = -3623.9656 Pseudo R2 = 0.0936 ------------ diabetes Coef. Std. Err. z P> z [95% Conf. Interval] ------------ black.7179046.089665 8.01 0.000.5421644.8936447 female.1545569.0666786 2.32 0.020.0238692.2852445 age.0594654.0026398 22.53 0.000.0542916.0646393 _cons -6.405437.1677385-38.19 0.000-6.734198-6.076675 ------------ +--------------------+ Block LL LR df Pr > LR AIC BIC -------+------------ 1-3976.344 43.58 1 0.0000 7956.688 7972.561 2-3973.122 6.44 1 0.0111 7952.244 7976.054 3-3623.966 698.31 1 0.0000 7255.931 7287.677 +--------------------+ Based on the printout above, answer the following. a. In Model 1 (i.e. Block 1), what do DEV M, G M, DEV 0, and McFadden s Pseudo R 2 equal? Note that LL M = -3976.344, so DEV M = -2LL M = 7952.688. G M = The Model Chi-Square (labeled by Stata as LR chi2(3)) = 43.58. DEV 0 = DEV M + G M = 7952.688 + 43.58 = 7996.268. McFadden s Pseudo R 2 is included in the printout and equals.0054. To confirm, Pseudo R 2 = G M /DEV 0 = 43.58/7996.268 =.00545. Sociology 63993 Exam 3 Answer Key Page 2

b. Using Model 2 (i.e. Block 2), complete the following table: Race Gender Log odds Odds P(Diabetes) Black Not Black Female Male Here are the answers: Race Gender Log odds = a + XB Odds = exp(log Odds) P(Diabetes) = Odds/(1 + Odds) Black Female.6072815 +.1658387-3.15304 = -2.37992.092558.084717 Not Black Male -3.15304.042722.040972 We can use Stata to confirm the results:. adjust black=1 female=1, xb -------------------- Covariates set to value: black = 1, female = 1 -------------------- xb -2.37992 Key: xb = Linear Prediction. adjust black=1 female=1, exp -------------------- Covariates set to value: black = 1, female = 1 -------------------- exp(xb).092558 Key: exp(xb) = exp(xb). adjust black=1 female=1, pr -------------------- Covariates set to value: black = 1, female = 1 -------------------- pr.084717 Key: pr = Probability Sociology 63993 Exam 3 Answer Key Page 3

. adjust black=0 female=0, xb -------------------- Covariates set to value: black = 0, female = 0 -------------------- xb -3.15304 Key: xb = Linear Prediction. adjust black=0 female=0, exp -------------------- Covariates set to value: black = 0, female = 0 -------------------- exp(xb).042722 Key: exp(xb) = exp(xb). adjust black=0 female=0, pr -------------------- Covariates set to value: black = 0, female = 0 -------------------- pr.040972 Key: pr = Probability c. Three models are estimated. Which model do you think is best, and why? What does this model say about the effect of race, gender and age on diabetes? Model 3 is best. Both Wald Tests and LR chi-square tests indicate that all three variables are statistically significant and should be included in the model. The results show that blacks, women, and older people are all more likely to have diabetes. NOTE: This problem uses a modified version of the nhanes2f data, available from the Stata website. From within Stata, type webuse nhanes2f expand 2 The other Stata Commands used are included in the exam. II-2. (25 points) For each of the following circumstances describe the statistical technique you would use for revealing the relationship between the dependent and independent variables. Write a few sentences explaining and justifying your answer. In some instances more than one technique may be reasonable. a. There is ongoing controversy over whether homework helps or hinders the academic and social progress of children in grades 1-4. To address these issues, students are randomly assigned to two classrooms. One class has homework every day, the other class never has any homework. Otherwise the style of teaching and the material covered is identical in the Sociology 63993 Exam 3 Answer Key Page 4

two classes. After 12 weeks, both classes will take the same standardized tests to measure how much they have learned, how much they enjoy school, and their overall psychological well-being. There is one treatment and multiple dependent variables. Manova (or a LISREL-type model) would be appropriate. b. It is summer 2008. A brutal primary campaign has left the Democratic Party bitterly divided, and Rush Limbaugh and Ann Coulter are already cackling over what appears to be an all but certain Republican victory in November. Suddenly, however, after an impassioned call for unity by former presidents Clinton and Carter, the party convention drafts the one man it hopes can bring its warring factions together: the Academy Award winning former vice-president, Al Gore. But, even as Melissa Etheridge leads the delegates in joyous song, Gore knows that the most critical decision of his campaign is only hours away: his choice of a running mate. His instincts tell him that Hillary Clinton is the best choice. But, his instincts once told him that Joe Lieberman would be a great pick too. He has therefore commissioned an overnight telephone survey. A group of 1,000 likely voters will be asked if having Clinton as the vice-presidential nominee would make them more likely to vote for Gore, have no effect on their likelihood of voting for Gore, or make them less likely to vote for Gore. The study will further examine how voter preferences are affected by their gender, race, and party affiliation. The dependent variable is ordinal, so an ordered logit model (or perhaps some other type of ordinal regression) is appropriate. Those who would like to sing along with Melissa can do so at http://www.melissaetheridge.com/ Make sure your computer's audio is turned on! c. The management of a large company is interested in examining how a team-oriented approach to problem solving works. Individuals work with a partner to solve a problem. It is believed that, the harder one partner works to solve the problem, the harder the other partner works, and vice versa. Information on the IQ, income and education of each partner is also available. Partners influence each other, so a nonrecursive model (possibly estimated by 2sls or via a maximum likelihood technique such as LISREL provides) seems called for. Fortunately, the background variables for each partner should make the model identified. d. A researcher is interested in the relationship between popularity and grades. She believes that students with very low grades, and students with very high grades, will be less popular than students whose grades are more in the middle. This sounds like a nonlinear relationship. It can be estimated via OLS by including a term for grades 2. Alternatively, you might try a piecewise regression model. e. A gun control group feels that the Virginia Tech shootings provide yet another reason for greater regulation of firearms. However, it fears that if it mentions the Tech shootings in its ads, it will be seen as exploitive and the ads will be ineffective. Subjects will therefore see each of two gun control ads, one which mentions Virginia Tech and one which does not. The perceived effectiveness of each ad will be measured on 100 point scales. Since the same respondents see both ads, this calls for a matched pairs T-Test. Sociology 63993 Exam 3 Answer Key Page 5

III. Essay. (30 points) Answer one of the following questions. 1. We ve talked about several ways that OLS regression can be modified to deal with violations of its assumptions. Some problems, however, require the use of techniques besides OLS. For three of the following, explain why and when the method would be used instead of OLS. Be sure to make clear what assumptions would be violated if OLS was used instead. a. 2 stage least squares b. Logistic regression c. Ordered Logit models d. Robust regression techniques (e.g. rreg, qreg, robust standard errors) e. Event History Analysis f. Hierarchical Linear Modeling 2. Your psychology professor has told you that you should almost always focus on standardized, rather than unstandardized (metric) coefficients. Explain to your professor (as politely as possible) why he is wrong. Among other things, you may want to discuss the relative strengths and weaknesses of standardized vs. unstandardized coefficients with regard to: a. Variables with arbitrary metrics (e.g. attitudinal scales) b. Structural equation models c. Multiple-group comparisons d. Interpretability of coefficients e. Effect of random measurement error on coefficients 3. Several assumptions are made when using OLS regression. Discuss TWO of the following in depth. What does the assumption mean? When might the assumption be violated? What effects do violations of the assumption have on OLS estimates? How can violations of the assumption be avoided or dealt with? Be sure to talk about techniques such as 2SLS and logistic regression where appropriate. [NOTE: While the material from the last third of the course is especially relevant here, you should try to tie in earlier material as much as possible too. Also, keep in mind that there are often different ways an assumption can be violated, and the appropriate solutions will therefore often differ too.] a. The effects of the independent variables are linear and additive b. Errors are homoskedastic c. Variables are measured without error d. The data are a random and representative sample of the larger population. See the course notes and readings for information pertaining to each of these subjects. IV. Extra Credit. (10 points) Following are additional results related to the analysis of Model III in part II-1.. quietly logit diabetes black female age. tab1 diabetes if e(sample) -> tabulation of diabetes if e(sample) diabetes, 1=yes, 0=no Freq. Percent Cum. ---- 0 19,672 95.17 95.17 1 998 4.83 100.00 ---- Total 20,670 100.00 Sociology 63993 Exam 3 Answer Key Page 6

. predict prob, pr. sum prob Variable Obs Mean Std. Dev. Min Max ---- prob 20674.0482766.0417819.005399.2436938. estat clas Logistic model for diabetes -------- True -------- Classified D ~D Total -----------+----+----------- + 0 0 0-998 19672 20670 -----------+----+----------- Total 998 19672 20670 Classified + if predicted Pr(D) >=.5 True D defined as diabetes!= 0 ------ Sensitivity Pr( + D) 0.00% Specificity Pr( - ~D) 100.00% Positive predictive value Pr( D +).% Negative predictive value Pr(~D -) 95.17% ------ False + rate for true ~D Pr( + ~D) 0.00% False - rate for true D Pr( - D) 100.00% False + rate for classified + Pr(~D +).% False - rate for classified - Pr( D -) 4.83% ------ Correctly classified 95.17% ------. test age = female ( 1) - female + age = 0 chi2( 1) = 2.03 Prob > chi2 = 0.1542 a) According to the classification table, what percentage of the cases were correctly classified? Why were so many cases classified correctly? Do you think this indicates that the model is outstanding? Explain whether you think the classification table is useful or not very useful in this case. Other information in the printout may make this question easier to answer. Note that less than 5% of the respondents have diabetes, and the highest predicted probability of diabetes is only 24.37%. The classification table therefore predicts that no one will have diabetes. It is always right for the 95% of the subjects who do not have diabetes, but it is always wrong for the 5% who do. Classification tables tend not to be very helpful when you have extreme splits like this; you could easily do just as well yourself once you knew what the frequencies were for diabetes. b) Explain what the test command is testing. Indicate whether or not you would conduct the same test, and why. The test command is testing whether the effect of one year of age is the same as the effect of being female rather than male. Given how different the measurement of age and gender is, it would take a rather esoteric theory to make such a hypothesis substantively plausible and interesting. It is not something I would test. Perhaps the researcher just looked at the results and decided to see if he could trick students into thinking this was a smart thing to test. Sociology 63993 Exam 3 Answer Key Page 7