NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

1 P age NPTEL Project Econometric Modelling Vinod Gupta School of Management Module 16: Qualitative Response Regression Modelling Lecture 20: Qualitative Response Regression Modelling Rudra P. Pradhan Vinod Gupta School of Management Indian Institute of Technology Kharagpur, India Email: rudrap@vgsom.iitkgp.ernet 1

2 P age MODULE OBJECTIVE This module attempts to explore the relationship between dependent variable and independent variables, where dependent variable is considered as a proxy (or dummy). The earlier discussion, related to dummy econometric modelling, is based on the assumption that independent variables are qualitative in nature. But in the present case, we highlight the issue of econometric modeling where dependent variable is considered as a qualitative variable. Here, the dependent variable can take only two values suppose 1 and 0 that is in one word the regressand is binary or dichotomous variable but it is not restricted to only dichotomous but we can have trichotomous and polychotomous response variable. In this section, we highlight the followings: 1. WHAT IS QUALITATIVE RESPONSE ECONOMETRIC MODELLING 2. BINARY CHOICE MODEL 3. LOGIT MODEL 4. PROBIT MODEL WHAT IS QUALITATIVE RESPONSE ECONOMETRIC MODELLING It is basically represents the involvement of qualitative variable in econometric modelling. We usually we call it dummy variable. Dummy variable is a variable, which can classifying the structure into various subgroups based on qualities or attributes and implicitly allows one to run individual regressions for each group. A dummy variable will take the value 1 or 0 according to whether or not the condition is present or absent for a particular observation. In some cases, it 2

3 P age can be presented with the code 1, 2, 3, 4 and alike. For instance, if we like to study the impact of religion on income, the religion will be categorical (qualitative) and in this context, we take the value like 1, 2, 3, etc. THE BINARY CHOICE MODEL We start our study of qualitative response models by first considering the binary response regression model. The outcome (or response, or endpoint) values 0, 1 can represent success and failure. Occurs often in the biopharmaceutical field; dose-response studies, bioassays, clinical trials. Industrial applications include failure analysis, fatigue testing, reliability testing. For example, functional electrical testing on a semiconductor can yield: success in which case the device works; failure due to a short, an open, or some other failure mode k i1,2,..., n yi 0 jxij i xi i j1 yi 0 or 1 The response y i is a Bernoulli random variable and that follows the below conditions: P( y 1) with 0 1 P( y 0) 1 E( y ) i i i i Var y x i i i i 2 ( i) y i(1 i) i i The error terms take on only two values, so they can t possibly be normally distributed The variance of the observations is a function of the mean (see previous slide) A linear response function could result in predicted values that fall outside the 0, 1 range, and this is impossible because 3

4 P age THE LOGIT MODEL In the logit model the dependent variable is the log of the odds ratio, which is a linear function of the regressors. The probability function that underlies the logit model is the logistic distribution. If the data is available in grouped form, we can use the OLS to estimate the parameters of the logit model, provided we take into account explicitly the heteroscedastic nature of the error term. If the data are available at the individual or micro level, non linear in parameter estimation procedures are called for. There is a lot of empirical evidence that the response function should be nonlinear; an S shape is quite logical (See the scatter plot of the Challenger data). 4

5 P age The usual LOGT model is presented as follows: exp( x 1 E( y) 1exp( x 1exp( x After simplification, the LOGIT model can be presented in this format: Log [p/ (1-p)] = β 0 + β 1 X 1 + β 2 X 2 + u or else, we can write Log [p / (1- p)] = k x k The LOGIT model is related to the odds for a binary outcome. The LOGIT model has the following features: As p goes from 0 to 1, L or LOGIT goes from - infinity to + infinity that is although the probabilities lie between 0 and 1 the LOGIT are not so bounded. The logit model is not subject to problems due to heteroscedastic or non-normal error distributions. The logit of the outcome tends to have a linear relationship with the explanatory variables. THE PROBIT MODEL The structure PROBIT model is very similar to LOGIT model. The difference lies on its functional form. We use normal distribution function in PROBIT instead of logistic function in LOGIT. 5

6 P age The PROBIT uses as the outcome variable the t-score associated with a given p, where F(p) = t = k x k And the functional relationship between t and p is the equation for the normal curve that is as follows E( p) b k x k 2 1 t exp 2 2 dt Coming to estimation procedure, it is very similar to LOGIT model. For comparative point of view, the estimated results and the findings are very similar. However, the following differences can be noted. In a LOGIT model, β 0 and β 1 coefficient refers to the log of an odds ratio. In a PROBIT model, β 0 and β 1 coefficient refers to the change in z-score per unit change in x. Alternately, in a PROBIT model, β 0 and β 1 coefficient refers to the change in probability units per unit change in x. The chief difference between logit and probit model is that the logistic model has a flatter tail that is the normal or probit curves approach the axis faster than the LOGIT model. 6

7 P age Quantitatively LOGIT and PROBIT models give similar results but the estimates of the parameters of the two models are not directly comparable. 7

8 P age Module 16: Qualitative Response Regression Modelling Lecture 21: Qualitative Response Regression Modelling(Contd.) 8