STA 4504/5503 Sample questions for exam True-False questions.

Similar documents
STAT 453/653 Homework 6 Solutions

Lecture 21: Logit Models for Multinomial Responses Continued

Logit Models for Binary Data

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

proc genmod; model malform/total = alcohol / dist=bin link=identity obstats; title 'Table 2.7'; title2 'Identity Link';

Econometric Methods for Valuation Analysis

Intro to GLM Day 2: GLM and Maximum Likelihood

Log-linear Modeling Under Generalized Inverse Sampling Scheme

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Introduction to POL 217

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction

Logit and Probit Models for Categorical Response Variables

Case Study: Applying Generalized Linear Models

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

1. You are given the following information about a stationary AR(2) model:

9. Logit and Probit Models For Dichotomous Data

Final Exam Suggested Solutions

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

To be two or not be two, that is a LOGISTIC question

Probability Distributions II

Keywords Akiake Information criterion, Automobile, Bonus-Malus, Exponential family, Linear regression, Residuals, Scaled deviance. I.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

. 13. The maximum error (margin of error) of the estimate for μ (based on known σ) is:

Modelling Bank Loan LGD of Corporate and SME Segment

CREDIT SCORING & CREDIT CONTROL XIV August 2015 Edinburgh. Aneta Ptak-Chmielewska Warsaw School of Ecoomics

Binomial and multinomial distribution

Clark. Outside of a few technical sections, this is a very process-oriented paper. Practice problems are key!

Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Modelling the potential human capital on the labor market using logistic regression in R

*9-BES2_Logistic Regression - Social Economics & Public Policies Marcelo Neri

Girma Tefera*, Legesse Negash and Solomon Buke. Department of Statistics, College of Natural Science, Jimma University. Ethiopia.

Module 9: Single-level and Multilevel Models for Ordinal Responses. Stata Practical 1

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

Estimation Procedure for Parametric Survival Distribution Without Covariates

Crash Involvement Studies Using Routine Accident and Exposure Data: A Case for Case-Control Designs

Final Exam - section 1. Thursday, December hours, 30 minutes

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Multinomial Logit Models for Variable Response Categories Ordered

Bradley-Terry Models. Stat 557 Heike Hofmann

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

Economics 424/Applied Mathematics 540. Final Exam Solutions

Tests for Two ROC Curves

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

σ e, which will be large when prediction errors are Linear regression model

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

SUPPLEMENTARY ONLINE APPENDIX FOR: TECHNOLOGY AND COLLECTIVE ACTION: THE EFFECT OF CELL PHONE COVERAGE ON POLITICAL VIOLENCE IN AFRICA

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

Practice Exam 1. Loss Amount Number of Losses

The Binomial Probability Distribution

Illustration 1: Determinants of Firm Debt

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Lecture Stat 302 Introduction to Probability - Slides 15

Bayesian Multinomial Model for Ordinal Data

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2009, Mr. Ruey S. Tsay. Solutions to Final Exam

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Equivalence Tests for Two Correlated Proportions

Panel Data with Binary Dependent Variables

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

A Comparison of Univariate Probit and Logit. Models Using Simulation

Introduction to General and Generalized Linear Models

2 of PU_2015_375 Which of the following measures is more flexible when compared to other measures?

CHAPTER V ANALYSIS AND INTERPRETATION

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Solutions to Final Exam

Credit Risk Modelling

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Normal Probability Distributions

Web Appendix Figure 1. Operational Steps of Experiment

STA 103: Final Exam. Print clearly on this exam. Only correct solutions that can be read will be given credit.

Tests for Two Independent Sensitivities

Lecture 3: Factor models in modern portfolio choice

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Violence, Non-violence, and the Effects of International Human Rights Law. Supplemental Information

Duration Models: Modeling Strategies

STUDY SET 1. Discrete Probability Distributions. x P(x) and x = 6.

1) The Effect of Recent Tax Changes on Taxable Income

Spike Statistics: A Tutorial

book 2014/5/6 15:21 page 261 #285

WesVar uses repeated replication variance estimation methods exclusively and as a result does not offer the Taylor Series Linearization approach.

CHAPTER III CONSTRUCTION AND SELECTION OF SINGLE, DOUBLE AND MULTIPLE SAMPLING PLANS

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Maximum Likelihood Estimation

Class Notes: Week 6. Multinomial Outcomes

Spike Statistics. File: spike statistics3.tex JV Stone Psychology Department, Sheffield University, England.

Logistic Regression with R: Example One

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

PRMIA Exam 8002 PRM Certification - Exam II: Mathematical Foundations of Risk Measurement Version: 6.0 [ Total Questions: 132 ]

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

Superiority by a Margin Tests for the Ratio of Two Proportions

Stat 401XV Exam 3 Spring 2017

Transcription:

STA 4504/5503 Sample questions for exam 2 1. True-False questions. (a) For General Social Survey data on Y = political ideology (categories liberal, moderate, conservative), X 1 = gender (1 = female, 0 = male), and X 2 = political party (1 = Democrat, 0 = Republican), the ML fit of the cumulative logit model is logit[ ˆP(Y j)] = ˆα j +.12x 1 +.96x 2. Hence, for each gender, according to this model fit the estimated odds that a Democrat s response is liberal rather than moderate or conservative, and the estimated odds that a Democrat s response is liberal or moderate rather than conservative, is e.96 = 2.6 times the corresponding estimated odds for a Republican s response. This odds ratio estimate indicates that in this sample Democrats tended to be more liberal than Republicans. (b) Subjects suffering from mental depression are measured after 1 week of treatment, 2 weeks of treatment, and 4 weeks of treatment in terms of a (normal, abnormal) response outcome. Covariates are severity of condition at original diagnosis (1 = severe, 0 = mild) and treatment used (1 = new, 0 = standard). Since each subject contributes three observations to the analysis, we can use the GEE (generalized estimating equations) method to fit the model. To use this method, we must choose a working correlation matrix for the form of the dependence among the three responses, but the method is robust in the sense that it still gives appropriate estimates and standard errors for large n even if the actual correlation structure is somewhat different from the one we assumed. (c) (d) (e) A difference between logit and loglinear models is that the logit model is a generalized linear model assuming a binomial random component whereas the loglinear model is a generalized linear model assuming a Poisson random component. Hence, when both are fitted to a contingency table having 50 cells, the logit model treats the cell counts as 25 binomial observations whereas the loglinear model treats the cell counts as 50 Poisson observations. The cumulative logit model assumes that the response variable Y is ordinal; it should not be used with nominal variables. By contrast, the baseline-category logit model treats Y as nominal. It can be used with ordinal Y, but it then ignores the ordering information. The cumulative logit model for J response categories corresponds to a logistic regression model holding for each of the J 1 cumulative probabilities, such that the curves for each cumulative probability have exactly the same shape (i.e., the same β parameter); that is, they increase 1

(f) (g) (h) or decrease at the same rate, so one can use ˆβ to describe effects that apply to all J 1 of the cumulative probabilities. If X and Y are binary, and Z has K categories, so the data can be summarized in a 2 2 K contingency table, one can test conditional independence of X and Y, controlling for Z, using a Wald test or a likelihood-ratio test of H 0 : β = 0 in the model logit[p(y = 1)] = α + βx + β 1 z 1 + + β K 1 z K 1, where z i = 1 for observations in category i of Z and z i = 0 otherwise. For a sample of retired subjects in Florida, a contingency table is used to relate X = cholesterol (8 ordered levels) to Y = whether the subject has symptoms of heart disease (yes = 1, no = 0). For the linear logit model logit[p(y = 1)] = α + βx fitted to the 8 binomials in the 8 2 contingency table by assigning scores to the 8 cholesterol levels, the deviance statistic equals 6.0. Thus, this model provides a poor fit to the data. In the example just mentioned, at the lowest cholesterol level, the observed number of heart disease cases equals 31. The standardized residual equals 1.35. This means that the model predicted 29.65 cases (i.e., 1.35 = 31-29.65). 2. Multiple choice question. Circle the letter(s) for the correct response(s). More than one response may be correct. Let π denote the probability that a randomly selected respondent supports current laws legalizing abortion, predicted using gender of respondent (G = 0, male; G = 1, female), religious affiliation (R 1 = 1, Protestant, 0 otherwise; R 2 = 1, Catholic, 0 otherwise; R 1 = R 2 = 0, Jewish), and political party affiliation (P 1 = 1, Democrat, 0 otherwise; P 2 = 1, Republican, 0 otherwise, P 1 = P 2 = 0, Independent). The logit model with main effects has prediction equation logit(ˆπ) =.11 +.16G.57R 1.66R 2 +.47P 1 1.67P 2 For this prediction equation, a. Females are estimated to be more likely than males to support legalized abortion, controlling for religious affiliation and political party affiliation. b. Controlling for gender and religious affiliation, the estimated odds that a Democrat supports legalized abortion equal e.47 ( 1.67) times the estimated odds that a Republican supports legalized abortion. 2

c. The estimated probability that a male Jewish Independent supports legalized abortion equals e.11 /(1 + e.11 ). d. The estimated probability of supporting legalized abortion is highest for female Jewish Independents. 3. Let Y = political ideology (on an ordinal scale from 1 = very liberal to 5 = very conservative), x 1 = gender (1 = female, 0 = male), x 2 = political party (1 = Democrat, 0 = Republican). (a) A main effects model with a cumulative logit link gives the output shown. Explain why the output reports four intercepts. Standard Wald 95% Confidence Parameter DF Estimate Error Limits Intercept1 1-2.5322 0.1489-2.8242-2.2403 Intercept2 1-1.5388 0.1297-1.7931-1.2845 Intercept3 1 0.1745 0.1162-0.0533 0.4023 Intercept4 1 1.0086 0.1232 0.7672 1.2499 gender female 1 0.1169 0.1273-0.1327 0.3664 gender male 0 0.0000 0.0000 0.0000 0.0000 party democ 1 0.9636 0.1297 0.7095 1.2178 party repub 0 0.0000 0.0000 0.0000 0.0000 LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq gender 1 0.84 0.3586 party 1 56.85 <.0001 (b) Explain how to describe gender effect on political ideology with an odds ratio. (c) Give the hypotheses to which the LR statistic for gender refers, and explain how to interpret the result of the test. (d) When we add an interaction term to the model, we get the output shown. Explain how to find the estimated odds ratio for the gender effect on political ideology for Republicans. 3

Standard Parameter DF Estimate Error Intercept1 1-2.6743 0.1655 Intercept2 1-1.6772 0.1476 Intercept3 1 0.0424 0.1338 Intercept4 1 0.8790 0.1389 gender female 1 0.3661 0.1784 gender male 0 0.0000 0.0000 party democ 1 1.2653 0.1995 party repub 0 0.0000 0.0000 gender*party female democ 1-0.5091 0.2550 gender*party female repub 0 0.0000 0.0000 gender*party male democ 0 0.0000 0.0000 gender*party male repub 0 0.0000 0.0000 (e) Using the interaction model, show how to find the estimated probability that a female Republican is in the first category (very liberal). 4. You decide to use GEE methods to handle dependent observations because of repeated measurment or clustering of some type. a. Explain what is meant by an exchangeable working correlation matrix. b. If you ignore the dependence, will there be bias in your (i) parameter estimates, (ii) standard error estimates? 5. Consider the loglinear model of independence for a two-way contingency table. This has equation for expected frequencies {µ ij } in an I J contingency table, log µ ij = λ + λ X i + λ Y j. Motivate this model, by showing how the definition of statistical independence of two categorical variables implies that a loglinear model of this form holds. 6. (b) To allow for association between X and Y, this model is extended to log µ ij = λ + λ X i + λ Y j + λ XY ij. For a 2 2 contingency table, express the log odds ratio in terms of expected frequencies, and use it to show that the odds ratio for this model equals exp(λ XY 11 + λ XY 22 λ XY 12 λ XY 21 ). (Hence the two-factor interaction parameters provide information about the XY association.) 4

7. Consider the baseline-category logit model, for a multinomial response variable having J categories, log[p(y = j)/p(y = J)] = α j + β j x, j = 1,..., J 1. Show how to use this model to generate a related logit model for log[p(y = a)/p(y = b)] using an arbitrary pair a and b of the response categories. 8. For the effect of a particular explanatory variable on an ordinal response variable, explain why the cumulative logit model has the same parameter for each logit, rather than a different parameter for each logit as is the case for the baseline-category logit model. Explain why the P-value for the effect is usually smaller with the cumulative logit model than with the baseline-category logit model. Solutions 1. a, b, c, d, e, f, are True and g, h are False 2. a, b, c are correct. 3. a. Model refers to four cumulative probabilities, and they differ for any fixed value of explanatory variables. b. For females, estimated odds of response in liberal direction rather than conservative direction (for any of the four cutpoints) are 1.12 times estimated odds for males. c. H 0 : β 1 = 0, H a : β 1 0. If null were true, probability would equal 0.36 of obtaining LR statistic at least as large as observed (0.84). There is not much evidence of a gender effect. d. e 0.366 = 1.44. e. e 2.674+0.366 /[1 + e 2.674+0.366 ] = 0.09 4. a. Guess that each pair of observations on the response in a cluster has the same correlation. b. (i) no, (ii) yes. 5., 6., 7. See class notes. 8. For the cumulative logit model, each logit refers to the same thing namely, the odds of falling below rather than above some point on an ordinal scale. For the baseline-category logit model, each logit deals with a different pair of outcome categories, so there is no reason to expect effects to be constant. The P value is usually smaller for the cumulative logit model because the effect is focused on fewer parameters, so df for the chi-squared test is smaller. Concentrating an effect on a smaller df value means the test statistic tends to be farther out in the right tail, hence smaller. 5

Formulas logit(π) = α + β 1 x 1 + + β k x k π = exp(α+β 1x 1 + +β k x k ) 1+exp(α+β 1 x 1 + +β k x k ) Baseline-category logit model: log[p(y = j)/p(y = J)] = α j + β j x P(Y = j) = e α j+β j x 1 + e α 1+β 1x, j = 1, 2,..., J 1. +... + e α J 1+β J 1 x Cumulative logit model: logit [P(Y j)] = α j + βx P(Y j) = exp(α j + βx)/[1 + exp(α j + βx)], j = 1, 2,..., J 1. z = (n 12 n 21 )/ n 12 + n 21 (McNemar) SE for diff of matched proportions: Kappa : κ = i π ii i π i+ π +i 1 i π i+ π +i (n 12 + n 21 ) (n 12 n 21 ) 2 /n n Independence loglinear model : log µ ij = λ + λ X i + λ Y j (XY, XZ, Y Z) : log µ ijk = λ + λ X i + λ Y j + λ Z k + λ XY ij + λ XZ ik + λ Y Z jk (XZ, Y Z) : log µ ijk = λ + λ X i + λ Y j + λz k + λxz ik + λ Y Z jk 6