Nonparametric Estimation of a Hedonic Price Function

Similar documents
Public and Private Capital Productivity Puzzle: A Nonparametric Approach

Omitted Variables Bias in Regime-Switching Models with Slope-Constrained Estimators: Evidence from Monte Carlo Simulations

} Number of floors, presence of a garden, number of bedrooms, number of bathrooms, square footage of the house, type of house, age, materials, etc.

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

A Generalized Empirical Model of Corruption, Foreign Direct Investment, and Growth

Alternative VaR Models

Effects of skewness and kurtosis on model selection criteria

Wage Determinants Analysis by Quantile Regression Tree

Applied Econometrics and International Development. AEID.Vol. 5-3 (2005)

Equity, Vacancy, and Time to Sale in Real Estate.

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

Analysis of Variance in Matrix form

Journal of Economics and Financial Analysis, Vol:1, No:1 (2017) 1-13

Stock price synchronicity and the role of analyst: Do analysts generate firm-specific vs. market-wide information?

The Simple Regression Model

Predicting the Success of a Retirement Plan Based on Early Performance of Investments

Parametric versus nonparametric methods in risk scoring: an application to microcredit

The Simple Regression Model

INTERNATIONAL REAL ESTATE REVIEW 2002 Vol. 5 No. 1: pp Housing Demand with Random Group Effects

Credit Supply and House Prices: Evidence from Mortgage Market Segmentation Online Appendix

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

Do Domestic Chinese Firms Benefit from Foreign Direct Investment?

ESTIMATING MONEY DEMAND FUNCTION OF BANGLADESH

Keywords Akiake Information criterion, Automobile, Bonus-Malus, Exponential family, Linear regression, Residuals, Scaled deviance. I.

AN ANALYSIS OF THE DEGREE OF DIVERSIFICATION AND FIRM PERFORMANCE Zheng-Feng Guo, Vanderbilt University Lingyan Cao, University of Maryland

Is there a decoupling between soft and hard data? The relationship between GDP growth and the ESI

THE EFFECT OF CAPITAL MARKET DEVELOPMENT ON ECONOMIC GROWTH: CASE OF CROATIA

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS001) p approach

The Stochastic Approach for Estimating Technical Efficiency: The Case of the Greek Public Power Corporation ( )

Volume 35, Issue 1. Thai-Ha Le RMIT University (Vietnam Campus)

Week 7 Quantitative Analysis of Financial Markets Simulation Methods

There is poverty convergence

Predicting Economic Recession using Data Mining Techniques

The relationship between output and unemployment in France and United Kingdom

1. You are given the following information about a stationary AR(2) model:

Impact of Weekdays on the Return Rate of Stock Price Index: Evidence from the Stock Exchange of Thailand

Moral hazard in a voluntary deposit insurance system: Revisited

Continuous Time Hedonic Methods

Threshold cointegration and nonlinear adjustment between stock prices and dividends

Switching Monies: The Effect of the Euro on Trade between Belgium and Luxembourg* Volker Nitsch. ETH Zürich and Freie Universität Berlin

Window Width Selection for L 2 Adjusted Quantile Regression

FS January, A CROSS-COUNTRY COMPARISON OF EFFICIENCY OF FIRMS IN THE FOOD INDUSTRY. Yvonne J. Acheampong Michael E.

Nonlinear Dependence between Stock and Real Estate Markets in China

Research Article The Volatility of the Index of Shanghai Stock Market Research Based on ARCH and Its Extended Forms

The Determinants of Bank Mergers: A Revealed Preference Analysis

An Empirical Examination of Traditional Equity Valuation Models: The case of the Athens Stock Exchange

The Relationship between Foreign Direct Investment and Economic Development An Empirical Analysis of Shanghai 's Data Based on

The Duration Derby: A Comparison of Duration Based Strategies in Asset Liability Management

Fitting financial time series returns distributions: a mixture normality approach

A Two-Step Estimator for Missing Values in Probit Model Covariates

Deviations from Optimal Corporate Cash Holdings and the Valuation from a Shareholder s Perspective

Optimizing Modular Expansions in an Industrial Setting Using Real Options

Correcting for Survival Effects in Cross Section Wage Equations Using NBA Data

Weighted Country Product Dummy Variable Regressions and Index Number Formulae

Does Commodity Price Index predict Canadian Inflation?

Comparison of OLS and LAD regression techniques for estimating beta

Online Appendix. income and saving-consumption preferences in the context of dividend and interest income).

Does R&D Influence Revisions in Earnings Forecasts as it does with Forecast Errors?: Evidence from the UK. Seraina C.

Introductory Econometrics for Finance

Tax or Spend, What Causes What? Reconsidering Taiwan s Experience

The Impact of Foreign Direct Investment on the Export Performance: Empirical Evidence for Western Balkan Countries

DEPARTMENT OF ECONOMICS WORKING PAPER SERIES

Volume 37, Issue 2. Handling Endogeneity in Stochastic Frontier Analysis

Performance of Statistical Arbitrage in Future Markets

Corresponding author: Gregory C Chow,

Estimating term structure of interest rates: neural network vs one factor parametric models

Online Appendix to R&D and the Incentives from Merger and Acquisition Activity *

The Impact of Auctions on Residential Sale Prices : Australian Evidence

The use of real-time data is critical, for the Federal Reserve

Another Look at Market Responses to Tangible and Intangible Information

Package semsfa. April 21, 2018

Bayesian Non-linear Quantile Regression with Application in Decline Curve Analysis for Petroleum Reservoirs.

Monetary policy under uncertainty

Prerequisites for modeling price and return data series for the Bucharest Stock Exchange

Analysis of the Influence of the Annualized Rate of Rentability on the Unit Value of the Net Assets of the Private Administered Pension Fund NN

Long Run Stock Returns after Corporate Events Revisited. Hendrik Bessembinder. W.P. Carey School of Business. Arizona State University.

Ruhm, C. (1991). Are Workers Permanently Scarred by Job Displacements? The American Economic Review, Vol. 81(1):

DETERMINANTS OF BILATERAL TRADE BETWEEN CHINA AND YEMEN: EVIDENCE FROM VAR MODEL

The Effect of the Internet on Economic Growth: Evidence from Cross-Country Panel Data

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

A Threshold Multivariate Model to Explain Fiscal Multipliers with Government Debt

Estimating the Natural Rate of Unemployment in Hong Kong

The duration derby : a comparison of duration based strategies in asset liability management

And The Winner Is? How to Pick a Better Model

An Improved Saddlepoint Approximation Based on the Negative Binomial Distribution for the General Birth Process

Modified ratio estimators of population mean using linear combination of co-efficient of skewness and quartile deviation

Public Expenditure on Capital Formation and Private Sector Productivity Growth: Evidence

Time Invariant and Time Varying Inefficiency: Airlines Panel Data

The Dynamic Effects of Personal and Corporate Income Tax Changes in the United States

Quantitative Techniques Term 2

Indian Institute of Management Calcutta. Working Paper Series. WPS No. 797 March Implied Volatility and Predictability of GARCH Models

Financial Development and Economic Growth at Different Income Levels

The Importance (or Non-Importance) of Distributional Assumptions in Monte Carlo Models of Saving. James P. Dow, Jr.

RE-EXAMINE THE INTER-LINKAGE BETWEEN ECONOMIC GROWTH AND INFLATION:EVIDENCE FROM INDIA

Hedonic Regressions: A Review of Some Unresolved Issues

Journal of Economic Studies. Quantile Treatment Effect and Double Robust estimators: an appraisal on the Italian job market.

Wage Flexibility in Turbulent Times: A Practitioner s Guide, with an Application to Poland

An Empirical Study on the Determinants of Dollarization in Cambodia *

Which GARCH Model for Option Valuation? By Peter Christoffersen and Kris Jacobs

Average Earnings and Long-Term Mortality: Evidence from Administrative Data

Transcription:

Nonparametric Estimation of a Hedonic Price Function Daniel J. Henderson,SubalC.Kumbhakar,andChristopherF.Parmeter Department of Economics State University of New York at Binghamton February 23, 2005 Abstract In this paper we attempt to replicate the results of an article (Anglin and Gençay 1996) published in this journal which applied semiparametric procedures to estimate a hedonic price function. To relax additional restrictive assumptions, we also employ a fully nonparametric model that captures nonlinearity in both continuous and categorical variables. We find that the nonparametric procedure gives more intuitive and meaningful results. Keywords: Nonparametric, Generalized Kernel Estimation, Hedonic Price JEL Classification No.: C13, C14, D40 Daniel J. Henderson, Department of Economics, State University of New York, Binghamton, NY 13902-6000, (607) 777-4480, Fax: (607) 777-2681, e-mail: djhender@binghamton.edu. Subal C. Kumbhakar, Department of Economics, State University of New York, Binghamton, NY 13902-6000, (607) 777-4762, Fax: (607) 777-2681, e-mail: kkar@binghamton.edu. Christopher F. Parmeter, Department of Economics, State University of New York, Binghamton, NY 13902-6000, Fax: (607) 777-2681, e-mail: cparmet1@binghamton.edu.

1 Introduction In this paper we present replication results of the hedonic price functions estimated in the housing market study by Anglin and Gençay (1996), hereafter AG. We also report results that can be viewed as robustness checks of their semiparametric model. In hedonic price models it is argued that the value of a good (a house in the present case) depends on the amounts of attributes it contains. Thus, its price will be a function of the attributes/characteristics z 1,,z l. Implicit prices of the characteristics can be computed from the partial derivatives of the price function with respect to the level of the characteristics. Since these derivatives may be dependent upon the levels of these characteristics, the choice of functional form in empirical analysis is quite important. To relax some of the traditional functional form assumptions AG used a semiparametric approach in which the hedonic price function is specified as(equation11inag) ln P i = β 1 DRV i + β 2 REC i + β 3 FFIN i + β 4 GHW i + β 5 CA i + β 6 GAR i +β 7 REG i + q[lot i,bdms i,fb i,sty i ]+ε i, i =1,...,n, (1) where P i is the price of house i; DRV, REC, FFIN, GHW, CA,andREG are dummy variables for driveway, recreational room, finished basement, gas water heating, central air conditioning, and preferred neighbourhood; GAR, BDMS, FB, andsty are the number of garages, bedrooms, full bathrooms, and stories; and LOT is the lot size (in square feet). As a comparison, they considered the following parametric benchmark model ln P i = β 1 DRV i + β 2 REC i + β 3 FFIN i + β 4 GHW i + β 5 CA i + β 6 GAR i + β 7 REG i + γ 1 ln(lot i )+γ 2 ln(bdms i )+γ 3 ln(fb i )+γ 4 ln(sty i )+u i. (2) We reproduced the results of this parametric model, reported in Table II of their paper. The OLS results for the other parametric models (viz., extensions of the above model in 2) and the findings from the corresponding specification tests were also reproduced without any difficulty. 1

2 Semiparametric Estimation In a general setting, we can write equation (1) as y i = x 0 1iβ + m (x 2i )+ε i, i =1,...,n. (3) Semiparametric models of the above form were studied by Robinson (1988) and Stock (1989) to take advantage of a known parametric relationship and an unknown functional relationship in economic models. In this setup, y i is the dependent variable, x 1i is a q-dimensional vector of discrete contrasts, β is a q-dimensional vector of contrast effects, x 2i is a k-dimensional vector of continuous variables that effect the dependent variable through the unknown function m ( ), ε i is a stochastic error that accounts for inherent randomness in the model, and n is the sample size available to the researcher. Robinson (1988) noted that the additional information given by the linear portion of (1) leads to n-consistent estimation of the linear parameters (when correctly specified). While AG were concerned primarily with the linear coefficients and the out-of-sample prediction of the estimates, we thought it would be interesting to also study the unknown function and its derivatives. This is of interest because the derivatives of the variables in the nonlinear function are often of equal or greater interest to the researcher. In other words, it is natural to ask what is the implicit price of lot size, one extra bedroom, bathroom, etc. Specifically, we follow AG s methodology to estimate the linear coefficients, and then use local linear least squares to obtain estimates of m (x 2 ) and its derivatives. Our estimation of (1) gave us similar results to the AG study, but there were subtle differences that may be attributed to computational differences and the accuracy of computer precision (note that the conclusions of the paper do not change). Specifically, we were not able to achieve the same bandwidth via the method used in their paper and we were not able to match the linear estimates given in their Table V (using their specified bandwidth or our calculated bandwidth). Our semiparametric estimates of the linear coefficients (along with the associated standard errors) are provided in Table 1. It is worth mentioning that the OLS estimates (reported in Table II of AG) are not much different from the semiparametric results. Thus, if the main objective is to estimate the linear parameters, the OLS results might be as good as the semiparametric results. Here we argue that the semiparametric models are more flexible so far as the estimation of implicit prices (derivatives) of the variables in the unknown function are concerned. 2

Once we obtained the linear coefficients, we estimated the unknown function ³y i = y i x 0 b 1iβ = m (x 2i )+ε i, using local linear least squares and employed the same kernel function and bandwidth used by AG. The results for the local linear least squares estimation of the semiparametric model are found in Table 2. The table reports the median coefficient with respect to each variable (along with it s bootstrapped standard error in italics). One striking feature of the table is the coefficient on lot size. Our results suggestthatthelotsizevariable(lot ) has a small, but significant, impact on the log price of a house. However, this coefficient is deceiving since the dependent variable is the natural logarithm of price and the lot size of the property is in level form. Given that the mean values of LOT and price are 5150 square feet and 68112, respectively, the average implicit price of a square foot of a lot is (0.00007 68112 =) 4.77 dollars. For the parametric model the average implicit price (evaluated at the mean) is (0.303 (68112/5150) =) 4.01 dollars, which is not much different from the semiparametric estimate. One might, however, ask why the lot size variable in the parametric model appears in natural logarithm while it appears in level in the semiparametric model. One would expect (for the purpose of comparison) that LOT would traditionally be included in the unknown function in logarithmic form. 1 Further, it is not quite clear to the present authors why three of the four discrete (what we refer to as ordered categorical) variables were included in the unknown function and the fourth (GAR) was not. In fact, given that the unknown function is traditionally composed of continuous variables, we are unsure as to why any of the four discrete variables would show up inside this function in the first place. This is especially mysterious since AG wished to compare the semiparametric model to a benchmark parametric model where three of their discrete variables were included in logarithmic form, 2 see their equation (9). AG did estimate several variations of their benchmark parametric model (including a log-level model), but their out-of-sample results of the semiparametric model were compared with their benchmark parametric model which had LOT, BDMS, FB,andSTY in logarithmic form. While there are reasons to exclude shift variables from an unconstrained, unknown function (see Pagan and Ullah 1999, pp. 198), we follow Rosen s (1974) suggestion and 1 We also estimated the semiparametric model in (1) with lot size measured in logs. Although the median coefficient on ln(lot ) is 0.292, the implicit price does not change significantly, nor do the coefficients on BDMS, FB, orsty. 2 GAR cannot be included in logarithmic form because there are zero values for some of the observations. 3

consider a fully nonlinear specification of the hedonic price equation. This setup leads us to estimate the model fully nonparametrically. 3 Nonparametric Estimation Although the techniques used in AG were cutting edge at the time the paper was written, recent advances allow us to estimate their model fully nonparametrically. Previously, with the presence of categorical regressors, authors were often forced to use semiparametric techniques (e.g. see Robinson 1988, and Stock 1989). However, Li and Racine (2004) and Racine and Li (2004) developed a model to smooth both ordered and unordered categorical data in a nonparametric kernel regression. This is especially important here because the idea is to check robustness/appropriateness of the results from the parametric and semiparametric models. To estimate the hedonic price function fully nonparametrically, we utilize Li-Racine Generalized Kernel Estimation. To begin, consider the nonparametric regression model y i = m(x i )+ε i, i =1,...,n (4) where y i is the dependent variable (in our case, ln(p )) for observation i. Further, m is the unknown smooth hedonic price function with argument x i =[x c i,xu i,xo i ],wherexc i is a vector of continuous regressors (in our case, a single continuous regressor, ln(lot )), x u i is a vector of regressors that assume unordered discrete values (DRV, REC, FFIN, GHW, CA, REG), x o i is a vector of regressors that assume ordered discrete values (GAR, BDMS, FB, STY ), ε is an additive error, and n is the number of observations. It is well known that estimation of the bandwidths is the most salient factor when performing nonparametric estimation. Although there exist many automatic selection methods, we utilize Hurvich, Simonoff, and Tsai s (1998) Expected Kullback Leibler (AIC c ) criteria. This method chooses smoothing parameters using an improved version of a criterion based on the Akaike Information Criteria. AIC c has been shown to perform well in small samples and avoids the tendency to undersmooth as often happens with other approaches such as Least-Squares Cross-Validation. 3 The results for the local linear least squares estimation of the nonparametric model 3 See Li and Racine (2004) and Racine and Li (2004) for further details on the procedure and bandwidth choice. 4

can be found in Table 3. 4 The table reports the mean coefficient with respect to each variable (along with it s bootstrapped standard error in italics), as well as the coefficients at the 25th, 50th, and 75th percentiles (labelled Q1, Q2, and Q3). The first thing to notice are the mean and quartile values of the coefficients on the discrete variables. Besides the insignificant REC =1(which was found to be significant in both the parametric and semiparametric procedures) and BDMS =2(perhaps due to the fact that there were only two houses in the sample with a single bedroom), the coefficients vary significantly over the quartiles. This variation suggests that the dummy variable approach is not appropriate. In other words, assuming a constant coefficient for these variables across the entire sample is incorrect. Next, as compared to the semiparametric approach, the nonparametric approach gives smaller estimates for the unordered categorical variables. Another benefit of the generalized kernel estimation procedure is that we can now analyze changes across ordered categorical variables without assuming a linear shift. For instance, the coefficient on GAR =1shows the counterfactual increase in the log price of a particular house when you increase the number of car garages from zero to one, ceteris paribus. Similarly, GAR =2would show the counterfactual increase in the log price of a particular house when you increase the number of car garages from zero to two, ceteris paribus. If the linear structure is appropriate, one would expect the coefficient on GAR 2 (this is grouped because there are very few houses in the sample with a three car garage) to be at least twice that of GAR =1. This is not the case. The mean coefficient goes from 0.026 on GAR =1to 0.029 on GAR 2. In other words, having a one car garage significantly increases the log price of a home, but the effect of an upgrade from a one-car to a two-car garage on the log price of a home is minimal. Finally, the coefficient on ln(lot ) is positive and significant at each quartile. Each of these results suggests that the nonparametric procedure is more appropriate for this particular data set. To pursue the idea of comparing results across models further, we compute the implicit price of lot size (which we consider to be an important variable in a home s price) for the 5 houses which have the median lot size of 4600 square feet. We compare two parametric models (equation 2 in the present paper which corresponds to equation 9 in AG, and equation 2 in the present paper where the variables BDMS, FB,andSTY are 4 All bandwidths were calculated using N c. 5

not logged), to the semiparametric and nonparametric models. The results are reported in Table 4. For the first house, the implicit price of lot size is almost the same across all four models. This is not the case for the other homes. For the second house (which is the most expensive in this group), the implicit price of lot size is the highest across all models. The price difference for the other houses across models is quite high. In three out of five cases, the implicit prices derived from the nonparametric model is the highest. To provide a basis for comparison for these implicit prices, we also computed the average price per square foot of lot size for each of these houses. The results show that the average price per square foot is much higher than the implicit price. Finally, we note that the parametric model isaspecialcaseofthesemiparametric model. Further, the semiparametric model is a special case of the nonparametric model. When the results differ, it is often argued that the more restrictive approach is inappropriate. Thus, it might be suggested that the OLS and semiparametric results are biased because they fail to take all nonlinearities of the model into account. 4 Conclusion In this paper we attempted to replicate the results of Anglin and Gençay (1996). Although we were able to exactly replicate their parametric results, we were unable to obtain identical results for their semiparametric procedure. In spite of the fact that our results differed, they did so slightly and the conclusions of the model stayed the same. We therefore assume that the differences are most likely due to differences in programming software. We enhanced their findings by also estimating the unknown function and its derivatives. Further, we extended their model by using advances in the literature which allow us to smooth both continuous and categorical data. In addition to being able to smooth discrete data, our preferred model employed new techniques for bandwidth estimation. Our results showed that the semiparametric model is too restrictive and that the use of a fully nonparametric model gives more intuitive and meaningful results. 6

References [1] Anglin, P. M., and R. Gençay (1996). Semiparametric Estimation of a Hedonic Price Function, Journal of Applied Econometrics, 11, 633-48. [2] Hurvich, C. M., J. S. Simonoff, and C.-L. Tsai (1998). Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion, Journal of the Royal Statistical Society, Series B, 60, 271-93. [3] Li, Q., and J. Racine (2004). Cross-Validated Local Linear Nonparametric Regression, Statistica Sinica, 14, 485-512. [4] N c, Nonparametric software by Jeff Racine (http://www.economics.mcmaster.ca/racine/). [5] Pagan, A., and A. Ullah (1999). Nonparametric Econometrics, Cambridge, Cambridge University Press. [6] Racine, J., and Q. Li (2004). Nonparametric Estimation of Regression Functions with Both Categorical and Continuous Data, Journal of Econometrics, 119, 99-130. [7] Robinson, P. M. (1988). Root-N-Consistent Semiparametric Regression, Econometrica, 56, 931-54. [8] Rosen, S. (1974). Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition, Journal of Political Economy, 82, 34-55. [9] Stock, J. (1989). Nonparametric Policy Analysis, Journal of the American Statistical Association, 84, 567-75. 7

Table 1 Hedonic Price Function: Comparison of Linear Coefficients from the Semiparametric Specification 5 Variable AG (1996) HKP (2005) DRV 0.147 0.117 0.048 0.028 REC 0.078 0.077 0.028 0.027 FFIN 0.097 0.098 0.023 0.023 GHW 0.191 0.173 0.045 0.045 CA 0.158 0.154 0.022 0.022 GAR 0.064 0.056 0.013 0.012 REG 0.124 0.120 0.025 0.025 5 The natural log of price is the dependent variable in each regression. These are the estimates from AG s equation (11) and our equation (1). Both sets of estimates were calculated using the kernel function and bandwidth suggested by AG. Standard errors of the estimates are given in italics beneath each estimate. 8

Table 2 Hedonic Price Function: Local Linear Least Squares Results of Derivatives of the Unknown Function 6 Variable Median LOT 0.00007 0.00003 BDMS 0.0512 0.0451 FB 0.1718 0.1441 STY 0.0781 0.0997 6 The natural log of price is the dependent variable in the regression. Standard errors for the local linear least squares procedure (listed in italics beneath each estimate) were calculated with 199 bootstrap replications using the same kernel function and bandwidth employed by AG. However we must note that these results must be viewed with caution because the matrix of kernel values was near singular for several of the bootstrap iterations (this problem is alleviated, if e.g., we arbitrailly increase the bandwidth which results in relatively minor changes in the coefficients). 9

Table 3 Hedonic Price Function: Generalized Kernel Estimation 7 Variable Mean Q1 Q2 Q3 DRV =1 0.051 0.025 0.043 0.075 0.009 0.002 0.009 0.011 REC =1 0.000 0.000 0.000 0.000 0.003 0.003 0.003 0.003 FFIN =1 0.113 0.076 0.155 0.286 0.028 0.012 0.028 0.036 GHW =1 0.186 0.075 0.155 0.286 0.028 0.012 0.012 0.035 CA =1 0.142 0.104 0.138 0.174 0.021 0.020 0.021 0.034 REG =1 0.078 0.055 0.081 0.109 0.006 0.001 0.013 0.013 BDMS =2 0.014 0.023 0.016 0.012 0.008 0.011 0.008 0.008 BDMS =3 0.031 0.014 0.030 0.046 0.008 0.007 0.008 0.008 BDMS =4 0.045 0.013 0.037 0.067 0.005 0.007 0.008 0.007 BDMS 5 0.075 0.036 0.058 0.107 0.013 0.008 0.018 0.012 FB =2 0.156 0.103 0.150 0.202 0.011 0.014 0.022 0.041 FB 3 0.294 0.231 0.307 0.361 0.029 0.029 0.029 0.025 STY =2 0.061 0.030 0.055 0.084 0.004 0.003 0.004 0.004 STY =3 0.127 0.093 0.128 0.166 0.002 0.004 0.002 0.008 STY =4 0.197 0.167 0.185 0.250 0.012 0.008 0.009 0.008 GAR =1 0.026 0.005 0.024 0.046 0.002 0.003 0.003 0.003 GAR 2 0.029 0.011 0.030 0.041 0.003 0.002 0.003 0.003 ln(lot ) 0.404 0.320 0.390 0.473 0.077 0.069 0.078 0.069 7 The natural logarithm of price is used as the dependent variable in the regression. Q1, Q2, and Q3 refer to first, second, and third quartile, respectively. AICc is used for bandwith selection. Bootstrapped standard errors (199 replications) are listed in italics beneath each estimate. 10

Table 4 Implicit Price of Lot Size 8 House OLS(II) OLS(III) SP NP Price Price/Lot 1 2.926 2.832 2.887 2.820 43000 9.348 2 8.642 8.365 11.146 15.227 127000 27.609 3 3.402 3.293 5.085 4.437 50000 10.870 4 4.083 3.952 5.266 8.266 60000 13.043 5 5.137 4.973 6.042 8.859 75500 16.413 8 Each house in the table has the median lot size of 4600 square feet. Each coefficient is the implict price of lot size (e.g., P / LOT = β(ln(lot )) P / LOT). OLS(II) refers to the results using the estimation procedure from Table II of AG, whereas OLS(III) refers to the results using the estimation procedure from Table III of AG. SP refers to the semiparametric results using the estimation procedure from Table 2 in our paper, while NP refers to the nonparametric results using the estimation procedure from Table 3 in our paper. 11