Didacticiel - Études de cas. In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA.

Similar documents
Calculating the Probabilities of Member Engagement

Intro to GLM Day 2: GLM and Maximum Likelihood

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Lecture 21: Logit Models for Multinomial Responses Continued

Categorical Outcomes. Statistical Modelling in Stata: Categorical Outcomes. R by C Table: Example. Nominal Outcomes. Mark Lunt.

Phd Program in Transportation. Transport Demand Modeling. Session 11

Sociology Exam 3 Answer Key - DRAFT May 8, 2007

Transport Data Analysis and Modeling Methodologies

Econometric Methods for Valuation Analysis

Assessment on Credit Risk of Real Estate Based on Logistic Regression Model

Multinomial Logit Models - Overview Richard Williams, University of Notre Dame, Last revised February 13, 2017

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Longitudinal Logistic Regression: Breastfeeding of Nepalese Children

STA 4504/5503 Sample questions for exam True-False questions.

[BINARY DEPENDENT VARIABLE ESTIMATION WITH STATA]

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Logistic Regression Analysis

Discrete Choice Modeling

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Sean Howard Econometrics Final Project Paper. An Analysis of the Determinants and Factors of Physical Education Attendance in the Fourth Quarter

Logit Models for Binary Data

Analysis of Microdata

Logistic Regression. Logistic Regression Theory

is the bandwidth and controls the level of smoothing of the estimator, n is the sample size and

Market Variables and Financial Distress. Giovanni Fernandez Stetson University

Morten Frydenberg Wednesday, 12 May 2004

CHAPTER V ANALYSIS AND INTERPRETATION

CHAPTER III METHODOLOGY

DYNAMICS OF URBAN INFORMAL

List of figures. I General information 1

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 13, 2018

Module 4 Bivariate Regressions

WesVar uses repeated replication variance estimation methods exclusively and as a result does not offer the Taylor Series Linearization approach.

Quantitative Techniques Term 2

Case Study: Applying Generalized Linear Models

Determining Probability Estimates From Logistic Regression Results Vartanian: SW 541

Homework 1 Due February 10, 2009 Chapters 1-4, and 18-24

Data Mining: A Closer Look. 2.1 Data Mining Strategies 8/30/2011. Chapter 2. Data Mining Strategies. Market Basket Analysis. Unsupervised Clustering

Formulating Models of Simple Systems using VENSIM PLE

ECS171: Machine Learning

In this chapter: Budgets and Planning Tools. Configure a budget. Report on budget versus actual figures. Export budgets.

A Course in Statistical Modelling

Introduction to the Maximum Likelihood Estimation Technique. September 24, 2015

A Comparison of Univariate Probit and Logit. Models Using Simulation

CSC 411: Lecture 08: Generative Models for Classification

Description Remarks and examples References Also see

Simple Fuzzy Score for Russian Public Companies Risk of Default

Tutorial: Discrete choice analysis Masaryk University, Brno November 6, 2015

Topic 3: An introduction to cost terms and concepts

Girma Tefera*, Legesse Negash and Solomon Buke. Department of Statistics, College of Natural Science, Jimma University. Ethiopia.

Maximum Likelihood Estimation Richard Williams, University of Notre Dame, Last revised January 10, 2017

To be two or not be two, that is a LOGISTIC question

Problem max points points scored Total 120. Do all 6 problems.

PASS Sample Size Software

Non-linearities in Simple Regression

Transportation Theory and Applications

A generalized Hosmer Lemeshow goodness-of-fit test for multinomial logistic regression models

Generalized Linear Models

Keywords Akiake Information criterion, Automobile, Bonus-Malus, Exponential family, Linear regression, Residuals, Scaled deviance. I.

Multiple regression - a brief introduction

Monte Carlo Simulation (Random Number Generation)

Chapter 8 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (May 1, 2010)

Empirical Study on Short-Term Prediction of Shanghai Composite Index Based on ARMA Model

CREDIT RISK MODELING IN R. Logistic regression: introduction

Introduction to Population Modeling

*9-BES2_Logistic Regression - Social Economics & Public Policies Marcelo Neri

International Journal of Business and Administration Research Review, Vol. 1, Issue.1, Jan-March, Page 149

Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS)

ACCT323, Cost Analysis & Control H Guy Williams, 2005

Nonlinear Econometric Analysis (ECO 722) Answers to Homework 4

MS&E 448 Final Presentation High Frequency Algorithmic Trading

Software Tutorial ormal Statistics

The Influence of Bureau Scores, Customized Scores and Judgmental Review on the Bank Underwriting

Table 4. Probit model of union membership. Probit coefficients are presented below. Data from March 2008 Current Population Survey.

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction

Technical Documentation for Household Demographics Projection

Models Multivariate GARCH Models Updated: April

Modelling the potential human capital on the labor market using logistic regression in R

Logistic Regression with R: Example One

Duration Models: Parametric Models

ASSESSING CREDIT DEFAULT USING LOGISTIC REGRESSION AND MULTIPLE DISCRIMINANT ANALYSIS: EMPIRICAL EVIDENCE FROM BOSNIA AND HERZEGOVINA

Review questions for Multinomial Logit/Probit, Tobit, Heckit, Quantile Regressions

PERFORMANCE COMPARISON OF THREE DATA MINING MODELS FOR BUSINESS TAX AUDIT

Morningstar Hedge Fund Operational Risk Flags Methodology

Quant Econ Pset 2: Logit

Final Exam - section 1. Thursday, December hours, 30 minutes

Statistical Analysis of Traffic Injury Severity: The Case Study of Addis Ababa, Ethiopia

Bond Portfolio Management User Guide

Addiction - Multinomial Model

Rescaling results of nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

Appendix. Table A.1 (Part A) The Author(s) 2015 G. Chakrabarti and C. Sen, Green Investing, SpringerBriefs in Finance, DOI /

CHAPTER 6 DATA ANALYSIS AND INTERPRETATION

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Non linearity issues in PD modelling. Amrita Juhi Lucas Klinkers

Superiority by a Margin Tests for the Ratio of Two Proportions

Log-linear Modeling Under Generalized Inverse Sampling Scheme

The Role of Cash Flow in Financial Early Warning of Agricultural Enterprises Based on Logistic Model

Factoring Simple Trinomials February 24, What's Going On? What's the Pattern? Working Backwards. Finding Factors

Transcription:

Subject In this tutorial, we show how to implement a multinomial logistic regression with TANAGRA. Logistic regression is a technique for maing predictions when the dependent variable is a dichotomy, and the independent variables are continuous and/or discrete. The technique can be modified to handle dependent variable with several (K > 2) levels. When the responses categories are unordered, we have the multinomial logistic regression. Roughly speaing, we compute the logit function for each (K-1) categories related to a reference group [http://www.stat.psu.edu/~jglenn/stat504/08_multilog/01_multilog_intro.htm]. Dataset We want to explain the brand, for some commodity, chosen by consumers starting from their age and their sex. The dataset is available on line 1. We can see the results obtained with other software such as R on the same dataset [http://www.ats.ucla.edu/stat/r/dae/mlogit.htm]. Multinomial logistic regression with TANAGRA Accessing the data and creating a new diagram After starting TANAGRA, we create a new diagram by activating the FILE/NEW menu. In the dialog box, we choose the data file BRAND_MULTINOMIAL_DATASET.XLS and then we specify the name of the diagram. For XLS files, the importation functions properly if the folder is not being edited further, and that the data are located in the first sheet. 1 http://eric.univ-lyon2.fr/~ricco/tanagra/fichiers/brand_multinomial_dataset.xls 12 décembre 2007 Page 1 sur 5

The data is loaded. We chec that 3 variables and 735 records have been imported. Defining the role of the variables In the next step, we define the role of the variables. BRAND is the TARGET attribute; FEMALE and AGE are the INPUT ones. 12 décembre 2007 Page 2 sur 5

Multinomial logistic regression We add the MULTINOMIAL LOGISTIC REGRESSION component (SPV LEARNING tab) into the diagram. By default, TANAGRA uses the last encountered value of the dependent variable as the reference group. If you want to modify the choice, the simplest way is to sort adequately the dataset. We obtain the following results (VIEW menu). Confusion matrix (classification matrix) The confusion matrix compares the observed value and the predicted value of the dependent variable. Some ratio can be computed e.g. error rate or accuracy rate. An interesting ratio is the adjusted count pseudo r-square which corrects the accuracy rate with the most frequent value of the dependent variable (cf. http://www.ats.ucla.edu/stat/mult_pg/faq/general/psuedo_rsquareds.htm). For our example, we obtain the following adjusted count r-square R 2 AC # correct max ( n ) n max ( n ) (58 + 238 + 110) 307 735 307 99 428 0.231 If our classifier is no more competitive as the default classifier (predict with the most frequent value of the dependent variable), we obtain 0; for a perfect prediction, we obtain 1. Adjustment quality The next section compares the initial model, predict with the constant only, and our model, using the lielihood ratio principle. Other pseudo R-square indicators are available. Other indicators such as AIC or SC (BIC) statistics mae a trade-off between the deviance and the complexity (number of parameters) of the model. SC is the most rigorous indicator. It shows that our model seems really relevant (SC of the initial model 1604.991; SC of the model 1445.541). The lielihood ratio test (LR) reaches to the same conclusion. The whole model is significant. 12 décembre 2007 Page 3 sur 5

Logit coefficients The «_ 3» value is the reference group. We have 2 (i.e. K 1) equations: P( Y _1/ X ) ln female age P Y X 22.721 0.466 0.686 ( _ 3/ ) P( Y _ 2 / X ) ln + female age P Y X 10.947 0.058 0.318 ( _ 3/ ) The Wald test is used to test the significance of each coefficient, for each equation. The Wald statistic is the square of the ratio between the coefficient and its standard error. It follows a CHI- SQUARE distribution with 1 degree of freedom. We show below the results obtained with the VGAM pacage for the R software. 12 décembre 2007 Page 4 sur 5

Global evaluation of variables In the previous step, we can evaluate the relevance of each variable into each equation. Now, we try to evaluate the global relevance of each variable i.e. the coefficient of the variable is it equal to 0 into all the equations? This test relies also on a Wald statistic. We see here that all variables are relevant for a 5% significance level. Conclusion In this tutorial, we show how to implement and read the results of the multinomial logistic regression with TANAGRA. For more details about the method and the underlying computations, we recommend the following reference http://www.stat.psu.edu/~jglenn/stat504/08_multilog/01_multilog_intro.htm 12 décembre 2007 Page 5 sur 5