Wage Determinants Analysis by Quantile Regression Tree

Similar documents
Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Window Width Selection for L 2 Adjusted Quantile Regression

Keywords : Korean labor market, Quantile regression, Korean Labor and Income Panel Study

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Quantile Regression due to Skewness. and Outliers

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY

Lasso and Ridge Quantile Regression using Cross Validation to Estimate Extreme Rainfall

Five Things You Should Know About Quantile Regression

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

ECS171: Machine Learning

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Conditional inference trees in dynamic microsimulation - modelling transition probabilities in the SMILE model

Modeling and Forecasting Customer Behavior for Revolving Credit Facilities

A case study on using generalized additive models to fit credit rating scores

Credit Card Default Predictive Modeling

Explaining procyclical male female wage gaps B

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

Public-private sector pay differential in UK: A recent update

Lecture 9: Classification and Regression Trees

Predicting Foreign Exchange Arbitrage

Bayesian Non-linear Quantile Regression with Application in Decline Curve Analysis for Petroleum Reservoirs.

Keywords Akiake Information criterion, Automobile, Bonus-Malus, Exponential family, Linear regression, Residuals, Scaled deviance. I.

Ratio of Projected RP-2000 Rates to RP-2014 Rates Male Healthy Annuitants. Figure 10.3(M)

2 Exploring Univariate Data

SUPPLEMENT TO THE LUCAS ORCHARD (Econometrica, Vol. 81, No. 1, January 2013, )

Session 5. Predictive Modeling in Life Insurance

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

Modeling Implied Volatility

Name: 1. Use the data from the following table to answer the questions that follow: (10 points)

Ownership Structure and Capital Structure Decision

DATA SUMMARIZATION AND VISUALIZATION

Forecasting Design Day Demand Using Extremal Quantile Regression

CPSC 540: Machine Learning

Nonparametric Estimation of a Hedonic Price Function

VC Index Calculation White Paper

Quantile Regression in Survival Analysis

Internet Appendix. The survey data relies on a sample of Italian clients of a large Italian bank. The survey,

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing

The demand for lottery expenditure in Taiwan: a quantile regression approach. Abstract

NCSS Statistical Software. Reference Intervals

Comparison of OLS and LAD regression techniques for estimating beta

Online Appendix from Bönke, Corneo and Lüthen Lifetime Earnings Inequality in Germany

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree

CPSC 540: Machine Learning

Journal of Insurance and Financial Management, Vol. 1, Issue 4 (2016)

Exploring Data and Graphics

Logit Models for Binary Data

Econometric Methods for Valuation Analysis

Essays on Some Combinatorial Optimization Problems with Interval Data

COMMUNITY ADVANTAGE PANEL SURVEY: DATA COLLECTION UPDATE AND ANALYSIS OF PANEL ATTRITION

Financial Development and Economic Growth at Different Income Levels

THE EFFECT OF DEMOGRAPHIC AND SOCIOECONOMIC FACTORS ON HOUSEHOLDS INDEBTEDNESS* Luísa Farinha** Percentage

Variable Life Insurance

WORKING PAPERS IN ECONOMICS & ECONOMETRICS. Bounds on the Return to Education in Australia using Ability Bias

A Recommended Financial Model for the Selection of Safest portfolio by using Simulation and Optimization Techniques

A Comparison of Univariate Probit and Logit. Models Using Simulation

Correcting for Survival Effects in Cross Section Wage Equations Using NBA Data

Intro to GLM Day 2: GLM and Maximum Likelihood

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

Random Variables and Probability Distributions

Data Appendix. A.1. The 2007 survey

Bank Characteristics and Payout Policy

Modeling Private Firm Default: PFirm

Monte-Carlo Methods in Financial Engineering

INCOME DISTRIBUTION AND INEQUALITY IN LUXEMBOURG AND THE NEIGHBOURING COUNTRIES,

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

Estimation of Volatility of Cross Sectional Data: a Kalman filter approach

Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 18 PERT

Econometric Models of Expenditure

Impact of Weekdays on the Return Rate of Stock Price Index: Evidence from the Stock Exchange of Thailand

Budget Setting Strategies for the Company s Divisions

Understanding the underlying dynamics of the reservation wage for South African youth. Essa Conference 2013

Assessing the reliability of regression-based estimates of risk

A new look at tree based approaches

Decision Trees An Early Classifier

Dynamic Risk Management in Electricity Portfolio Optimization via Polyhedral Risk Functionals

Journal of Economic Studies. Quantile Treatment Effect and Double Robust estimators: an appraisal on the Italian job market.

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

The current study builds on previous research to estimate the regional gap in

CS360 Homework 14 Solution

Comparing Estimates of Family Income in the Panel Study of Income Dynamics and the March Current Population Survey,

An Adjusted Trinomial Lattice for Pricing Arithmetic Average Based Asian Option

Nonlinear Dependence between Stock and Real Estate Markets in China

Online Appendices Practical Procedures to Deal with Common Support Problems in Matching Estimation

Subject CS2A Risk Modelling and Survival Analysis Core Principles

Statistical Case Estimation Modelling

Determining the Optimal Subsampling Rate for Refusal Conversion in RDD Surveys

Appendix A. Additional Results

starting on 5/1/1953 up until 2/1/2017.

Chapter 15: Dynamic Programming

Mortality Rates Estimation Using Whittaker-Henderson Graduation Technique

Reemployment after Job Loss

SciBeta CoreShares South-Africa Multi-Beta Multi-Strategy Six-Factor EW

The Economic Consequences of a Husband s Death: Evidence from the HRS and AHEAD

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Recitation 1. Solving Recurrences. 1.1 Announcements. Welcome to 15210!

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras

Journal of Economics and Financial Analysis, Vol:1, No:1 (2017) 1-13

Transcription:

Communications of the Korean Statistical Society 2012, Vol. 19, No. 2, 293 301 DOI: http://dx.doi.org/10.5351/ckss.2012.19.2.293 Wage Determinants Analysis by Quantile Regression Tree Youngjae Chang 1,a a Research Department, The Bank of Korea Abstract Quantile regression proposed by Koenker and Bassett (1978) is a statistical technique that estimates conditional quantiles. The advantage of using quantile regression is the robustness in response to large outliers compared to ordinary least squares(ols) regression. A regression tree approach has been applied to OLS problems to fit flexible models. Loh (2002) proposed the GUIDE algorithm that has a negligible selection bias and relatively low computational cost. Quantile regression can be regarded as an analogue of OLS, therefore it can also be applied to GUIDE regression tree method. Chaudhuri and Loh (2002) proposed a nonparametric quantile regression method that blends key features of piecewise polynomial quantile regression and tree-structured regression based on adaptive recursive partitioning. Lee and Lee (2006) investigated wage determinants in the Korean labor market using the Korean Labor and Income Panel Study(KLIPS). Following Lee and Lee, we fit three kinds of quantile regression tree models to KLIPS data with respect to the quantiles, 0.05, 0.2, 0.5, 0.8, and 0.95. Among the three models, multiple linear piecewise quantile regression model forms the shortest tree structure, while the piecewise constant quantile regression model has a deeper tree structure with more terminal nodes in general. Age, gender, marriage status, and education seem to be the determinants of the wage level throughout the quantiles; in addition, education experience appears as the important determinant of the wage level in the highly paid group. Keywords: Quantile regression, nonlinear quantile regression, tree-structured regression. 1. Introduction Quantile regression originally proposed by Koenker and Bassett (1978) is a statistical technique that estimates conditional quantiles. It is originated from the linear l 1 -regression problem by Barrodale and Roberts (1980), Bartels and Conn (1980) and others which is also based on Charnes et al. (1955) and Wagner (1959). Koenker and Bassett (1978) extended these algorithms to linear quantile regression. Koenker and D Orey (1987) improved the linear quantile regression based on the simplex method by modifying the Barrodale-Roberts algorithm. Koenker and Park (1994) proposed a new approach to the computation of nonlinear quantile regression estimators based on interior point methods for solving linear programs. They discussed the interior point methods to solve strictly linear programs (that include the linear quantile regression problem) and extended them to nonlinear problems. It turned out the interior point algorithm offered the natural extension to nonlinear problems unlike the simplex method. A regression tree approach has been applied to ordinary least squares(ols) problems to fit flexible models. Loh (2002) proposed the GUIDE algorithm, which has a negligible selection bias and a relatively low computational cost. GUIDE is also known as a smart data mining tool with a flexible model fitting methods at each node. Just as an OLS problem is solved by GUIDE, a quantile regression problem can be dealt with in a piecewise regression tree approach. Chaudhuri and Loh (2002) proposed a nonparametric quantile 1 Economist, Research Department, The Bank of Korea, 39, Namdaemun-Ro, Jung-Gu, Seoul 110-794, Korea. E-mail: yjchang@bok.or.kr

294 Youngjae Chang Figure 1: Example of a quantile regression tree for α quantile: At each intermediate node, a case goes to the left child node if the condition is satisfied. Number beneath a leaf is sample quantile of the dependent variable. regression method that blends key features of piecewise polynomial quantile regression and treestructured regression based on adaptive recursive partitioning. Unlike least squares regression trees that concentrate on modeling the relationship between the response and the covariates at the center of the response distribution, the method can provide insight into the nature of that relationship at the center as well as the tails of the response distribution. Figure 1 shows an example of a quantile regression tree, where the root node contains all the training observations, and the training data are recursively partitioned by values of the input variables until reaching the terminal nodes (t 4, t 5, t 6, t 8 and t 9 ) where the predictions are made. At the terminal nodes, each quantile regression model is fitted based on data partitioned. Tree-structured quantile regression algorithm has the advantage of fitting flexible models since they can capture non-linearity by piecewise linear models fitted at the terminal nodes. Lee and Lee (2006) investigated wage determinants in the Korean labor market using the Korean Labor and Income Panel Study(KLIPS). They used the quantile regression method for each conditional quantile wage group. Quantile regressions in the paper examined more comprehensive pictures for different quantile wage groups while most other previous labor market analyses use (mean) regression analysis, that focused only on average statistics. They discovered that education does not seem always appear to provide the necessary job skills, so that the return on education is fairly low compared to the US labor market. Age; however, it was shown to be one of the most important factors for wage determination especially for the higher wages groups. Likewise, they did several analyses to find the respective relationship between wage and independent variables in the data. We are interested in the analysis of KLIPS data using a quantile regression tree approach. We may find more interesting results compared to those of the simple quantile regression method used in Lee and Lee (2006). The characteristics of a quantile regression tree may give us a more helpful and meaningful result. This paper is organized as follows. In Section 2, we introduce the concept of quantile regression followed by a tree-structured quantile regression algorithm (GUIDE). Section 3 covers real data analysis and Section 4 concludes with a summary of the real data analysis using a quantile regression tree.

Quantile Regression Tree 295 2. Quantile Regression and GUIDE 2.1. Quantile regression Quantile regression analysis focuses on the conditional α th quantile of the response variable given the predictor variables. Unlike usual regression analysis that focuses on the conditional mean of the response given the predictors, quantile regression gives insight into the center as well as the lower and upper tails of the conditional distribution of the response with varying choices of α. Chaudhuri and Loh (2002) pointed out that quantile regression is quite effective as a tool to explore and model the nature of dependence of a response on the predictors when the predictors have different effects on different parts of the conditional distribution of the response that occurs in many econometric problems. For example, in marketing studies, where predictor variables may have different effects on high, medium and low consumption groups, quantile regression can be useful to understand the nature of the dependence between the response and the predictors. Besides the effective modeling, the advantage of using quantile regression is the robustness in response to large outliers compared to ordinary least squares(ols) regression that can be easily understood. Quantile regression can be described as following according to Koenker (2005); Let Y be a dependent variable, X a (d-dimensional) predictor variable, Q α (X = x) = inf {y : F(y X = x) α}. The conditional distribution function F(y X = x) is, F(y X = x) = P(Y y X = x), where F( ) is cdf of Y and consequently α th quantile of Y is F 1 (α) = inf {y : F(y X = x) α}. Let the check function be ρ α is ρ α (u) = u(α I(u < 0)). Then looking for the ŷ that minimizes ŷ E ρα (Y ŷ) = (α 1) (Y ŷ)df(y) + α Leads to the first order condition, ŷ 0 = (1 α) df(y) α ŷ ŷ (Y ŷ)df(y). df(y) = F(ŷ) α. While least-squares regression focuses only on the conditional mean E(Y X = x) that minimizes the expected squared error loss, the objective of quantile regression is to find the conditional quantile that minimizes the expected loss E(ρ α ), 2.2. GUIDE quantile regression Q α (X = x) = arg min β R d E(ρ α (Y x β)). The aim of regression analysis is to discover the relationships between the response variable and the predictor variables, and eventually to use the relationships to make predictions based on the information. A regression tree is a tree-structured solution in which a constant or a relatively simple regression model is fitted to the data in each partition. Chaudhuri and Loh (2002) proposed a nonparametric quantile regression method using a regression tree. Quantile regression trees have a piecewise

296 Youngjae Chang constant, piecewise polynomial, or piecewise multiple linear option, where each piece is obtained by fitting respective corresponding models to the data in the terminal node of a binary tree. The tree is constructed by recursively partitioning the data based on repeated analyses of the residuals obtained after model fitting with quantile regression. This idea is implemented in GUIDE(Generalized, Unbiased, Interaction Detection and Estimation (Loh, 2002)) software, of which a multiple linear procedure is briefly sketched as follows: 1. Fit a quantile regression model to the data in the node using the algorithm in Koenker and D Orey (1987) and compute the residuals. 2. For each observation, define the class variable Z by the sign of its residual for each observation. That is, Define Z = 1 if the observation is associated with a positive residual. Otherwise, define Z = 0. 3. Construct a 2 m cross-classification table for each predictor variable X. The rows of the table are the values of Z, while the columns of the table are 4 intervals at the sample quartiles if X is a numerical variable (m = 4). If X is a categorical variable, its m distinct values form the columns of the table. Compute a p-value for the chi-squared test for each X based on the table. 4. Select the split variable X from the previous steps. Let t L and t R denote the left and right subnodes of t. If X is a numerical variable, search for the split point that gives the lowest total of the sums of squared residuals in t L and t R, provided that the number of observations at each node is at least n 0 or user-specified value. If X is a categorical variable, search for the split of the form X C, which gives the lowest weighted sum of the variances of Z in t L and t R, provided that the number of observations at each node is at least n 0. Here C is a subset of the values taken by X, and weights are proportional to sample sizes. 5. After splitting has stopped, prune the tree with a test sample or by cross-validation. For details, see Loh (2002). 3. Real Data Analysis 3.1. Data description We use data from the Korean Labor and Income Panel Study(KLIPS) due to Lee and Lee (2006). The Korea Labor Institute began to collect detailed data for households and individuals starting in 1998. The data collection is modeled after the Panel Study of Income Dynamics(PSID) from the University of Michigan. We use data of 2007 and currently employed ones at the year of 2007 are selected from among the 13,738 individual observations in the dataset. That is, self-employed or unemployed observation are excluded for the analysis. The wage variable is an average monthly wage in Korean won in 10,000 won units. Independent variables are generated as described in Lee and Lee (2006). Education is measured as the total number of years in school. The original education variable in the dataset is a categorical variable that was converted to a numerical variable based on duration of schooling. For example, graduation from elementary school gives six, from middle school gives nine, and so on. Occupational types are categorized as highly skilled white-collar, lower-skilled

Quantile Regression Tree 297 Table 1: List of variables Dependent variable Independent variables Variables Wage Education, Age, Job experience Total jobs, High White, Low White, High Blue, Low Blue Origin 1 through 4, Region 1 through 3, Gender, Married, Union white-collar, highly skilled blue-collar and lower skilled blue-collar jobs. Origin variables are dummy variables of birthplace. Origin 1 is for Kyungsang-do, Origin 2 for Seoul, Incheon, and Kyunggi-do, Origin 3 for Chola-do and Jeju-do, and Origin 4 for Chungcheng-do and Kangwon-do and the rest of Korea. Regional variables are also dummy variables with Region 1 for Seoul, Region 2 for other large metropolitan areas, and Region 3 for all other areas. The list of variables is in Table 1. 3.2. Results We fit three quantile regression models to the KLIPS data using GUIDE. Piecewise constant, piecewise multiple linear, and piecewise simple linear quantile regression models are fitted. Each tree of quantiles α = 0.05, 0.20, 0.50, 0.80 and 0.95 is presented. These three kinds of trees give quite similar results; however, some trees may have more detailed information based on the models fitted at the terminal nodes than others. 3.2.1. Piecewise constant quantile regression tree We can see that AGE is the first split variable in the lower quantiles (α = 0.05, 0.20). GENDER and MARRIED are also the common variables that show up in the lower quantile trees. Especially, AGE appears three times in the lowest quantile, which means AGE divides the wage level into many pieces compared to other quantiles. We can see that the married males between 24.5 and 43.5 years old get paid most in the 5 percentile wage level. The 20 percentile tree looks a little different from the lowest quantile tree. The education duration more than 14.5 years gives the highest wage level among people over 43.5 years old. The next highest level belongs to married males not more than 43.5 years old. The older people get paid more provided that they are relatively well educated. The median tree gives quite different result from previous two trees with lower quantiles. The education experience determines the wage level in the root node. It is regarded as the most important variable in the tree that tells the wage level. The highest level comes from the group with more than 15.5 years of education experience, with ages older than 36.5. We can see that the split point of the AGE variable is quite low that means there exist well-paid young people in the median level compared to the lower quantiles. Concerning GENDER, males get paid more like the previous two trees. The only split variable is EDUCATION in the 80 and 95 percentile trees. Education duration more than 15.5 years get paid more in the higher quantiles. The education duration of 15.5 years could be regarded as nearly university-graduate level. Other particular things we can see from these high wage level trees are the disappearing of the two variables AGE and GENDER. It seems that education mainly determines the wage level in the well-paid groups. 3.2.2. Multiple linear quantile regression tree For the multiple linear quantile regression trees, we do not see any variable for the split at the 5, 20, and 95 percentiles. The model fitted at the terminal node is in the form of multiple regression, so the split structure rarely appears in comparison to the constant quantile regression trees. The multiple regression model may contain the curvature structures that are presented by the split variables in

298 Youngjae Chang Figure 2: GUIDE piecewise constant quantile regression tree (The left tree is for the 5 percentile and the right one is for the 20 percentile). Figure 3: GUIDE piecewise constant quantile regression tree (The left tree is for the 50 percentile and the right one is for the 80 percentile). Figure 4: GUIDE piecewise constant quantile regression tree (This tree is for the 95 percentile).

Quantile Regression Tree 299 Figure 5: GUIDE piecewise multiple linear quantile regression tree (The left tree is for the 5 percentile and the right one is for the 20 percentile). Figure 6: GUIDE piecewise multiple linear quantile regression tree (The left tree is for the 50 percentile and the right one is for the 80 percentile). Figure 7: GUIDE piecewise multiple linear quantile regression tree (This tree is for the 95 percentile). the constant regression trees. Only the 50 and 80 percentile trees have the splits with AGE and EDUCATION variables The 50 percentile multiple linear regression tree has the same split variables as those in the 50 percentile constant tree, but the split points are different because of the model fitted at the terminal nodes. The 80 percentile multiple linear tree also has the same split variable, EDUCATION, but the split point is 12.50 and is a little smaller than that in the constant tree. The difference is explained by the model at the terminal node. The multiple linear quantile regression trees comply with constant quantile regression trees. 3.2.3. Simple linear quantile regression tree A simple linear tree is the tree that has a simple regression model at the terminal nodes. The difference between the multiple linear regression tree and simple linear regression tree is the number of predictors of the models at the terminal nodes. So, we fit a model with only one predictor at the terminal node. The best simple linear model fitted at the terminal nodes is best in the sense that the model gives the lowest mean squared error. Such a simple linear regression tree is useful when a piecewise constant tree has too many nodes but a piecewise linear one has too few. MARRIED, AGE, and GENDER variables come out as split variables at the 5 percentile tree that also appear as split variables in the 5 percentile constant tree; however, the structure is quite different from each other. Note that EDUCATION appears almost everywhere as the predictor variable at one of the terminal nodes. 4. Conclusion and Future Work Following Lee and Lee, we fit three kinds of quantile regression tree model to KLIPS data with respect to the quantiles, 0.05, 0.2, 0.5, 0.8, and 0.95. Among the three models, multiple linear piecewise

300 Youngjae Chang Figure 8: GUIDE piecewise simple linear quantile regression tree (The left tree is for the 5 percentile and the right one is for the 20 percentile). Figure 9: GUIDE piecewise simple linear quantile regression tree (The left tree is for the 50 percentile and the right one is for the 80 percentile). Figure 10: GUIDE piecewise simple linear quantile regression tree (This tree is for the 95 percentile). quantile regression model forms the shortest tree structure, while the piecewise constant quantile regression model has a deeper tree structure with more terminal nodes in general. This implies that we can simplify the tree structure by fitting a linear model instead of a constant at each node. This result corresponds to that of a usual regression tree approach. Similar to Chang s analysis (2010) of the impact factors for the Business Survey Index(BSI) using regression trees, we can easily detect the important factors that impact wage levels from several quantile regression trees in this paper. AGE appears as a very important determinant of wage in the lowest paid group. It seems that EDUCATION mainly determines the wage level in the well-paid groups. Concerning GENDER, males get paid more

Quantile Regression Tree 301 than females in general. There is also some room for research improvement in the near future. We can think about an extension of the cross-sectional quantile regression tree analysis to compare the changes in determinants as time goes on. We may consider the application of the panel data analysis method to the regression trees to make it possible. References Barrodale, I. and Roberts, F. D. K. (1980). Solution of the constrained l 1 linear approximation problem, ACM Transactions on Mathematical Software, 6, 231 235. Bartels, R. and Conn, A. (1980). Linearly constrained discrete 1 problems, ACM Transactions on Mathematical Software, 6, 594 608. Chang, Y. (2010). The analysis of factors which affect business survey index using regression trees, The Korean Journal of Applied Statistics, 23, 63 71. Charnes, A., Cooper, W. W. and Ferguson, R. O. (1955). Optimal estimation of executive compensation by linear programming, Management Science, Jan, 138 151. Chaudhuri, P. and Loh, W.-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees, Bernoulli, 8, 561 576. Koenker, R. (2005). Quantile Regression, Econometric Society Monograph Series, Cambridge University Press. Koenker, R. and Bassett, G. W. (1978). Regression quantiles, Econometrica, 46, 33 50. Koenker, R. and D Orey, V. (1987). Algorithm AS229: Computing regression quantiles, Applied Statistics, 36, 383 393. Koenker, R. and Park, B. J. (1994). An interior point algorithm for nonlinear quantile regression, Journal of Econometrics, 71, 265 283. Lee, B.-J. and Lee, M. J. (2006). Quantile regression analysis of wage determinants in the Korean labor market, The Journal of the Korean Economy, 7, 1 31. Loh, W.-Y. (2002). Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, 12, 361 386. Wagner, H. M. (1959). An integer linear-programming model for machine scheduling, Naval Research Logistics Quarterly, 6, 131 140. Received January 24, 2012; Revised February 21, 2012; Accepted March 6, 2012