Probits. Catalina Stefanescu, Vance W. Berger Scott Hershberger. Abstract

Similar documents
Consistent estimators for multilevel generalised linear models using an iterated bootstrap

A Two-Step Estimator for Missing Values in Probit Model Covariates

Bayesian Multinomial Model for Ordinal Data

THE EQUIVALENCE OF THREE LATENT CLASS MODELS AND ML ESTIMATORS

Multivariate longitudinal data analysis for actuarial applications

Analysis of Microdata

STATISTICAL METHODS FOR CATEGORICAL DATA ANALYSIS

Multinomial Logit Models for Variable Response Categories Ordered

NPTEL Project. Econometric Modelling. Module 16: Qualitative Response Regression Modelling. Lecture 20: Qualitative Response Regression Modelling

A Comparison of Univariate Probit and Logit. Models Using Simulation

Australian Journal of Basic and Applied Sciences. Conditional Maximum Likelihood Estimation For Survival Function Using Cox Model

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Getting Started in Logit and Ordered Logit Regression (ver. 3.1 beta)

Monte Carlo approximation through Gibbs output in generalized linear mixed models

Volume 37, Issue 2. Handling Endogeneity in Stochastic Frontier Analysis

Econometrics II Multinomial Choice Models

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Local Maxima in the Estimation of the ZINB and Sample Selection models

ESTIMATION OF MODIFIED MEASURE OF SKEWNESS. Elsayed Ali Habib *

Link Function Selection in Stochastic Multicriteria Decision Making Models

The Economic and Social BOOTSTRAPPING Review, Vol. 31, No. THE 4, R/S October, STATISTIC 2000, pp

Correcting for Survival Effects in Cross Section Wage Equations Using NBA Data

Empirical Analysis of the US Swap Curve Gough, O., Juneja, J.A., Nowman, K.B. and Van Dellen, S.

Is neglected heterogeneity really an issue in binary and fractional regression models? A simulation exercise for logit, probit and loglog models

Application of MCMC Algorithm in Interest Rate Modeling

Analyzing the Determinants of Project Success: A Probit Regression Approach

The Multinomial Logit Model Revisited: A Semiparametric Approach in Discrete Choice Analysis

CHAPTER 11 Regression with a Binary Dependent Variable. Kazu Matsuda IBEC PHBU 430 Econometrics

List of tables List of boxes List of screenshots Preface to the third edition Acknowledgements

Asymptotic Distribution Free Interval Estimation

Journal of Global Business and Trade

Laplace approximation

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

9. Logit and Probit Models For Dichotomous Data

A Mixed Grouped Response Ordered Logit Count Model Framework

Introductory Econometrics for Finance

Volatility Models and Their Applications

Package SimCorMultRes

Exchange Rate Exposure and Firm-Specific Factors: Evidence from Turkey

Mixed Logit with Bounded Distributions of Partworths

Multinomial Choice (Basic Models)

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

The Credit Rating Process and Estimation of Transition Probabilities: A Bayesian Approach

Estimation of a Ramsay-Curve IRT Model using the Metropolis-Hastings Robbins-Monro Algorithm

Analysis of extreme values with random location Abstract Keywords: 1. Introduction and Model

Log-linear Modeling Under Generalized Inverse Sampling Scheme

Volatility Spillovers and Causality of Carbon Emissions, Oil and Coal Spot and Futures for the EU and USA

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

The Time-Varying Effects of Monetary Aggregates on Inflation and Unemployment

F. ANALYSIS OF FACTORS AFFECTING PROJECT EFFICIENCY AND SUSTAINABILITY

Pricing & Risk Management of Synthetic CDOs

Market Risk Analysis Volume II. Practical Financial Econometrics

A Test of the Normality Assumption in the Ordered Probit Model *

Credit Scoring Modeling

Questions of Statistical Analysis and Discrete Choice Models

Lecture Note 9 of Bus 41914, Spring Multivariate Volatility Models ChicagoBooth

An Improved Saddlepoint Approximation Based on the Negative Binomial Distribution for the General Birth Process

Using Halton Sequences. in Random Parameters Logit Models

Experience with the Weighted Bootstrap in Testing for Unobserved Heterogeneity in Exponential and Weibull Duration Models

Computational Statistics Handbook with MATLAB

Inflation Regimes and Monetary Policy Surprises in the EU

News Media Channels: Complements or Substitutes? Evidence from Mobile Phone Usage. Web Appendix PSEUDO-PANEL DATA ANALYSIS

Relevant parameter changes in structural break models

Are CEOs Charged for Stock-Based Pay? An Instrumental Variable Analysis

Keywords: China; Globalization; Rate of Return; Stock Markets; Time-varying parameter regression.

Intro to GLM Day 2: GLM and Maximum Likelihood

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Mortality Rates Estimation Using Whittaker-Henderson Graduation Technique

STA 4504/5503 Sample questions for exam True-False questions.

Geographical and Temporal Variations in the Effects of Right-to-Carry Laws on Crime

Bayesian Non-linear Quantile Regression with Application in Decline Curve Analysis for Petroleum Reservoirs.

Geostatistical Inference under Preferential Sampling

Market Risk Analysis Volume I

Equity, Vacancy, and Time to Sale in Real Estate.

Risk Classification In Non-Life Insurance

Difficult Choices: An Evaluation of Heterogenous Choice Models

Postestimation commands predict Remarks and examples References Also see

What s New in Econometrics. Lecture 11

Discussion Paper No. DP 07/05

This chapter introduces Markov chain Monte Carlo (MCMC) methods for empirical corporate

Country Fixed Effects and Unit Roots: A Comment on Poverty and Civil War: Revisiting the Evidence

Fixed Effects Maximum Likelihood Estimation of a Flexibly Parametric Proportional Hazard Model with an Application to Job Exits

The Cox Hazard Model for Claims Data: a Bayesian Non-Parametric Approach

GENERATION OF STANDARD NORMAL RANDOM NUMBERS. Naveen Kumar Boiroju and M. Krishna Reddy

Subject CS2A Risk Modelling and Survival Analysis Core Principles

REVIEW OF STATISTICAL METHODS FOR ANALYSING HEALTHCARE ADDITIONAL MATERIAL. GLOSSARY OF CATEGORIES OF METHODS page 2

Calibration of Interest Rates

A Joint Credit Scoring Model for Peer-to-Peer Lending and Credit Bureau

A Multivariate Analysis of Intercompany Loss Triangles

discussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Utilizing the Flexibility of the Epsilon-Skew-Normal Distribution for Tobit Regression Problems

Stochastic Claims Reserving _ Methods in Insurance

Estimating log models: to transform or not to transform?

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL

Modeling Censored Data Using Mixture Regression Models with an Application to Cattle Production Yields

Spatial regression models for SMEs

UPDATED IAA EDUCATION SYLLABUS

Test Volume 12, Number 1. June 2003

Transcription:

Probits Catalina Stefanescu, Vance W. Berger Scott Hershberger Abstract Probit models belong to the class of latent variable threshold models for analyzing binary data. They arise by assuming that the binary response is the indicator of the event that an unobserved latent variable exceeds a given threshold. Estimation can be done either in a likelihood or a Bayesian framework. The probit models can be generalized for the analysis of a variety of qualitative and limited dependent variables, as well as to the analysis of correlated data. Key Words: Binary data; Latent variables; Threshold models. Probit models have arisen in the context of analysis of dichotomous data. Let Y 1,..., Y n be n binary variables and let x 1,..., x n R p denote corresponding vectors of covariates. The flexible class of Probit models may be obtained by assuming that the response Y i (1 i n) is an indicator of the London Business School. Email: cstefanescu@london.edu University of Maryland Baltimore County. Email: vance917@comcast.net California State University, Department of Psychology. Email: scotth@csulb.edu 1

event that some unobserved continuous variable, Z i say, exceeds a threshold, which can be taken to be zero, without loss of generality. Specifically, let Z 1,..., Z n be latent continuous variables and assume that Y i = I {Zi >0}, for i = 1,..., n, Z i = x i β + ε i, ε i N(0, σ 2 ), (1) where β R p is the vector of regression parameters. In this formulation, x i β is sometimes called the index function [11]. The marginal probability of a positive response with covariate vector x is given by p(x) = Pr(Y = 1; x) = Pr(xβ + ε > 0) = 1 Φ( xβ), (2) where Φ(x) is the standard normal cumulative distribution function. Also, V ar(y ; x) = p(x){1 p(x)} = {1 Φ( xβ)}φ( xβ). As a way of relating stimulus and response, the Probit model is a natural choice in situations in which an interpretation for a threshold approach is readily available. Examples include attitude measurement, assigning pass/fail gradings for an examination based on a a mark cut off, and categorization of illness severity based on an underlying continuous scale [10]. The Probit models first arose in connection with bioassay [4] in toxicology experiments, for example, sets of test animals are subjected to different levels x of a toxin. The proportion p(x) of animals surviving at dose x can then be modelled as a function of x, following (2). The surviving proportion is increasing in the dose when β > 0 and it is decreasing in the dose when β < 0. Surveys of the 2

toxicology literature on Probit modelling are included in [7] and [9]. Probit models belong to the wider class of generalized linear models [13]. This class also includes the logit models, arising when the random errors ε i in (1) have a logistic distribution. Since the logistic distribution is similar to the normal except in the tails, whenever the binary response probability p belongs to (0.1, 0.9) it is difficult to discriminate between the logit and Probit functions solely on the grounds of goodness of fit. As Greene [11] remarks, it is difficult to justify the choice of one distribution or another on theoretical grounds... in most applications, it seems not to make much difference. Estimation of the Probit model is usually based on maximum likelihood methods. The nonlinear likelihood equations require an iterative solution; the Hessian is always negative definite, so the log likelihood is globally concave. The asymptotic covariance matrix of the maximum likelihood estimator can be estimated by using an estimate of the expected Hessian [2], or with the estimator developed by Berndt, Hall, Hall and Hausman [3]. Windmeijer [18] provides a survey of the many goodness of fit measures developed for binary choice models, and in particular for Probits. The following data example has been first offered by Bliss [4]. Table 1 reports the number of beetles killed after five hours of exposure to carbon disulfide at various concentrations. A probit model fitted with maximum likelihood gives Φ 1 [ˆp(x)] = 34.96 + 19.74 x. The table also reports the fitted values from the probit model corresponding to different dose levels x. 3

Table 1: Beetles killed after exposure to carbon disulfide Number Number Fitted Values Log Dose x of beetles killed Probit 1.691 59 6 3.4 1.724 60 13 10.7 1.755 62 18 23.4 1.784 56 28 33.8 1.811 63 52 49.6 1.837 59 53 53.4 1.861 62 61 59.7 1.884 60 60 59.2 The maximum likelihood estimator in a Probit model is sometimes called a quasi maximum likelihood estimator (QMLE) since the normal probability model may be misspecified. The QMLE is not consistent when the model exhibits any form of heteroscedasticity, nonlinear covariate effects, unmeasured heterogeneity or omitted variables [11]. In this setting, White [17] proposed a robust sandwhich estimator for the asymptotic covariance matrix of the QMLE. As an alternative to maximum likelihood estimation, Albert and Chib [1] developed a framework for estimation of latent threshold models for binary data, using data augmentation. The univariate Probit is a special case of this class of models, and data augmentation can be implemented by means of Gibbs sampling. Under this framework, the class of Probit regression models can be extended by using mixtures of normal distributions to model the latent data. There is a large literature on the generalizations of the Probit model to the analysis of a variety of qualitative and limited dependent variables. For 4

example, McKelvey and Zavoina [14] extend the Probit model to the analysis of ordinal dependent variables, while Tobin [16] discusses a class of models in which the dependent variable is limited in range. In particular, the Probit model specified in (1) can be generalized by allowing the error terms ε i to be correlated. This leads to a multivariate Probit model, useful for the analysis of clustered binary data. The multivariate Probit focuses on the conditional expectation given the cluster level random effect, and thus it belongs to the class of cluster specific approaches for modelling correlated data, as opposed to population average approaches of which the most common example are the GEE type methods [19]. The multivariate Probit model has several attractive features which make it particularly suitable for the analysis of correlated binary data. First, the connection to the Gaussian distribution allows for flexible modelling of the association structure and straightforward interpretation of the parameters. For example, the model is particularly attractive in marketing research of consumer choice, because the latent correlations capture the cross dependencies in latent utilities across different items. Also, within the class of cluster specific approaches, the exchangeable multivariate Probit model is more flexible than other fully specified models (such as the beta binomial) which use compound distributions to account for overdispersion in the data. This is due to the fact that both underdispersion and overdispersion can be accommodated in the multivariate Probit model through the flexible underlying covariance structure. Finally, due to the underlying threshold approach, the multivariate Probit model has the potential of extensions to the analysis of clustered mixed binary and continuous data, or of multivariate binary data 5

([12], [15]). Likelihood methods are one option for inference in the multivariate Probit model (see, e.g. [5]), but they are computationally difficult due to the intractability of the expressions obtained by integrating out the latent variables. As an alternative, estimation can be done in a Bayesian framework ([6], [8]) where generic prior distributions may be employed to incorporate prior information. Implementation is usually done with Markov chain Monte Carlo methods in particular the Gibbs sampler is useful in models where some structure is imposed on the covariance matrix (e.g. exchangeability). References [1] Albert, J.H. and Chib, S. (1997) Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88, 669 679. [2] Amemiya, T. (1981) Qualitative response models: A survey. Journal of Economic Literature, 19, 481 536. [3] Berndt,E., Hall, B., Hall, R., and Hausman J. (1974) Estimation and inference in nonlinear structural models. Annals of Economic and Social Measurement, 3/4, 653 665. [4] Bliss, C.I. (1935) The calculation of the dosage mortality curve. Annals of Applied Biology, 22, 134 167. 6

[5] Chan, J.S.K. and Kuk, A.Y.C. (1997) Maximum likelihood estimation for probit linear mixed models with correlated random effects. Biometrics, 53, 86 97. [6] Chib, S. and Greenberg, E. (1998) Analysis of multivariate probit models. Biometrika, 85, 347 361. [7] Cox, D. (1970) Analysis of Binary Data. London: Methuen. [8] Edwards, Y.D. and Allenby, G.M. (2003) Multivariate analysis of multiple response data. Journal of Marketing Research, 40, 321 334. [9] Finney, D. (1971) Probit Analysis. Cambridge: Cambridge University Press. [10] Goldstein, H. (2003) Multilevel Statistical Models. 3rd Edition, London: Arnold. [11] Greene, W.H. (2000) Econometric Analysis. 4th Edition, Englewood Cliffs, NJ: Prentice Hall. [12] Gueorguieva, R.V. and Agresti, A. (2001) A correlated probit model for joint modelling of clustered binary and continuous responses. Journal of the American Statistical Association,, 96, 1102 1112. [13] McCullagh, P. and Nelder, J.A. (1989) Generalised Linear Models. 2nd Edition, London: Chapman and Hall. [14] McKelvey, R.D., and Zavoina, W. (1976) A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 103 120. 7

[15] Regan, M.M. and Catalano, P.J. (1999) Likelihood models for clustered binary and continuous outcomes: Application to developmental toxicology. Biometrics, 55, 760 768. [16] Tobin, J. (1958) Estimation of relationships for limited dependent variables. Econometrica, 26, 24 36. [17] White, H. (1982) Maximum likelihood estimation of misspecified models. Econometrica, 53, 1 16. [18] Windmeijer, F. (1995) Goodness of fit measures in binary choice models. Econometric Reviews, 14, 101 116. [19] Zeger, S.L., Liang, K.Y. and Albert, P.S. (1988) Models for longitudinal data: A generalized estimating equations approach. Biometrics, 44, 1049 1060. 8