Stochastic Frontier Models with Binary Type of Output

Size: px

Start display at page:

Download "Stochastic Frontier Models with Binary Type of Output"

Isabella Barrett
5 years ago
Views:

1 Chapter 6 Stochastic Frontier Models with Binary Type of Output 6.1 Introduction In all the previous chapters, we have considered stochastic frontier models with continuous dependent (or output) variable. But, the output variable need not be continuous always. There are several situations where it is binary. For example, when the output quality is of interest as opposed to the quantity, the observations are of the type binary or categorical ('good' or 'bad'). In certain occasions, among a group of firms which are producing a particular type of product and are having more or less similar set up, some of them would have got an 'excellent' certification and others do not have such a certification based on their products. The traditional continuous frontier methods fail in these situations. The efficiency analysis of discrete type output started with a paper by Forsund (2002). He had considered a deterministic frontier analysis which uses data envelopment methods to arrive at the efficiency scores. Fe-Rodreguez (2008) considered the estimation of a stochastic frontier model when the output and inefficiency variables were discrete valued random variables. Fe-Rodreguez and Hofler (2009) claimed that 97

2 SFM with Binary Type of Output 98 they have provided a proper count data stochastic frontier model. They discussed in length about the need of considering such models by giving number of reasons. Griffith et al. (2010) have attempted to model individual level of health at the household level with demographic, socioeconomic characteristics and lifestyle factors as inputs to health production. Here the output is of ordinal in nature. The data from Australian longitudinal survey has been used to illustrate the health production efficiency. Griffiths et al. (2010) have used Bayesian approach with MCMC algorithm for the estimation and inference of the suggested model. All these research papers which deal with a discrete or categorical type of output are still unpublished manuscripts or may be in the press. Thus, lot needs to be done in connection with the theory related to stochastic frontier models with categorical/count type of outputs. In this chapter, we introduce a stochastic frontier regression model with a binary type output variable. Our modeling approach is very close to that of Griffiths et al. (2010). We have discussed the estimation of the technical efficiency of the individual decision making units. Here, unlike Griffiths (2010), we have followed the traditional maximum likelihood approach. To examine the appropriateness of the model, few simulations were carried out. The suggested model has been applied to compute the efficiencies of decision making units in patent data by considering the output as categorical. Fe-Rodriguez and Holfer (2009) have also used the same data from 70 pharmaceutical firms from the 1976 wave of the National Bureau of Economic Research R&D. The chapter is organized as follows. In Section 6.2, we introduce stochastic frontier models for the binary type of outputs. The details of the simulation study constitute Section 6.3. An empirical application of the suggested method to a particular data set forms Section 6.4 and the last section include some conclusions.

3 SFM with Binary Type of Output Stochastic Frontier Logistic Regression Model for Binary Outputs Consider a production process in which several inputs are used to produce certain output. Suppose the output measurement is made using a binary variable y, with measurement as T and '0', or 'good' and 'bad' etc. The problem here is to develop a stochastic frontier model to compare efficiencies of these production units. The ordinary binary logistic regression model used for modeling the impact of the predictors Xi on a binary variable yi is given by yi = tt{x u fl)+v i! i = 1.2,...,n. with i/i taking only two values '1' and '0', n is the logistic function with the corresponding vector of explanatory variables x t = {x\ ) x 2,,x p )' ; and (3 = (AJ&I ^Pp) is the parameter vector such that P(y l = l) = E(y l )=n(x l,p). Here Vi is the usual error with Note that j 1 -ir{xi,/3) Vi = < if yi = 1 (6.1) y -yr(cc t)^) if y, = 0. E{ Vi ) = 0, Var{v z ) = 7r(aj i5 /3)[l - n(x iy /3)]. Since y is a Bernoulli random variable, v follows a two point distribution. Now, suppose yi represents the binary output information of the i decision making unit and x t be its corresponding input vector. We assume that the effect of a stochastic inefficiency factor on the decision making unit will bring down the

4 SFM with Binary Type of Output 100 probability of y t 1 from its actual level 7r(x;,/3). Thus, we modify the ordinary logistic model as y 4 = 7r(x»,/3,«i)+Ui, i= l,2,.. M n, (6.2) where «j is a non negative random variable representing the inefficiency component. The above model is defined as a stochastic frontier logistic regression model for binary outputs. Exponential, half normal or gamma distributions are some of the candidates for modeling the inefficiency term u t. Next we discuss the estimation of parameters of the model introduced in (6.2) and derive an expression for estimating the individual inefficiency of each decision making units. The probability mass function for binary data stochastic frontier model given in (6.2) is P(m\x h f3, u. t ) = [*(*,-, (3, t k )] v *[1 - n{xi, (3, ujf^ for Vi = 0,1 (6.3) where Ui assumed to follow a certain non negative continuous random variable with probability density function /(«; 0). The conditional distribution of y is defined as, / oo / ex' t /3-u t \» / e«</3~«i \V~Vi) P(m\x i,m = l (i + ^/3-uJ ^-i + e^-mj f(uuo)d Ui. (6.4) It may not be possible to obtain a closed form expression for the distribution of y. However we can compute the integration numerically. Therefore, likelihood is n L = l[p{yi\xi,p,6). (6.5) i=i The log of the likelihood of (6.5) can be maximized to obtain the maximum likelihood estimates of the parameters ((3,9). These estimates are consistent, asymptotically normal and efficient.

5 SFM with Binary Type of Output 101 To estimate approximate efficiency of individual decision making unit, we use the conditional distribution of u given information on data {y,x). The conditional distribution given by ff i \ P(yi\xi,0,Ui) f{ui,0) «"<!* * > = P ^ -» ) < 66 > Therefore, the technical efficiency may be obtained as roo ~mp, I 0 \ ft ffs TEi = / o/ i 1\ Ju t =o P(yi\xi,9) du * A Simulation Study To evaluate the suggested technical inefficiency measures, a simulation study has been carried out. For various sample sizes n=50, 100, 250 and 500 results are mentioned. The selected values for intercept and slope 0.1 and 0.5 respectively. We generated the x from Uniform(0, 2) distribution. All results mentioned here are based on 1000 simulations. Table 6.1 reports simulation results for u ~ exp(x = 0.8). Tables 6.2 and 6.3 report the cases for u ~ halfnormal(0,al = 0-8) an d Gam,ma(\ = 0.8, m = 2) respectively. From Tables 6.1, 6.2 and 6.3 we observe that the estimates of parameters are quite close to their actual values. There is much effect for the sample size, on the estimation of parameters.

6 SFM with Binary Type of Output 102 Table 6.1: Parameters estimation and mean technical efficiency using binary stochastic frontier model with «; ~ Exp(\ = 0.8) (standard deviation in brackets). n a P A Mean TE ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Table 6.2: Parameters estimation and mean technical efficiency using binary stochastic frontier model with u % ~ half normal(0.a u = 0.8). n a P Mean TE ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

7 SFM with Binary Type of Output 103 Table 6.3: Parameters estimation and mean technical efficiency using binary stochastic frontier model with u { ~ Gamm,a(\ = 0.8, m = 2) (standard deviation in brackets). n a 0 m A Mean TE ( ) ( ) ( ) ( ) (0.0372) ( ) ( ) ( ) ( ) (0.0307) ( ) ( ) ( ) (0.0115) (0.0328) ( ) ( ) ( ) ( ) (0.0345) Next we discuss an application of this model to a real data set. 6.4 Empirical Application The relationship between patents awarded to a firm in a given year and investment in R&D by estimating the production function of patents is discussed by Fe-Rodreguez and Hofler (2009). They estimated the technical efficiency of 70 pharmaceutical firms from the 1976 wave of the National Bureau of Economic Research R&D Masterfile (Hall et al., 1986). We used the same data set by converting the number of patents to a binary type random variable taking the value one, if there is atleast one patent and zero if there are no patents. After converting data in to a binary form, we applied binary logistic regression stochastic frontier model with exponential technical

8 SFM with Binary Type of Output 104 efficiency term. Hence the model is y % = 7r(a?t,/3,«j) + Wj, i = 1,2,..., n, where x, = (1, log R&D, log Sales) Table 6.4: Parameter estimates and mean technical efficiency from Patents data. Intercept log R&D log R&D 2 log sales A Mean Efficiency Exponential Half normal Poisson HN count model (By Fe-Rodreguez) Table 6.4 shows the parameter estimate of R&D data which differ for each distributional choice. Table 6.5 represents the quantiles of technical efficiency. It is found that the range of technical efficiency estimated by Fe-Rodreguez and Hofier (2009) is larger than that estimated by the binary output model. There is also a difference in the technical efficiency estimated by binary models with different distributional assumption on u. From Table 6.5 we can see that the range of technical efficiency by Poission HN count model (by Fe-Rodreguez) is larger than that of the binary model. Median of the technical efficiency by binary type model is nearly the same but smaller than the Poission HN count model (by Fe-Rodreguez). It is observe that for the same data, if we use different models then the technical efficiency is estimated differently.

9 SFM with Binary Type of Output 105 Table 6.5: Quantiles of technical efficiency from Patents Data Set with exponential inefficiency term. Poisson HN Exponential Half normal (HN) count model (By Fe-Rodreguez) minimum % % % Quantile 50% % % % maximum Concluding Remarks The existing literature concentrated on stochastic frontier models with continuous dependent (or output) variable. But, there are situations in which the output variable need not be continuous, but binary. We have discussed the estimation of the technical efficiency of the individual decision making units in a binary setup. From simulation study we observe the the introduced model is working reasonably well. We have applied the binary stochastic frontier model to a data set on R&D expenditure and existence or non existence of a patent to a firm. It is possible to extend this setup to categorical data with k-categories and further to count data stochastic frontier models. This may be taken up as a future research initiative.

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.