A Bayesian Approach to Bandwidth Selection for Multivariate Kernel Regression with an Application to State-Price Density Estimation

Similar documents
Statistical Models and Methods for Financial Markets

On Local Polynomial Estimation of State-Price Densities: An Application

Fitting financial time series returns distributions: a mixture normality approach

Calibration of Interest Rates

Application of MCMC Algorithm in Interest Rate Modeling

Estimating Pricing Kernel via Series Methods

State Price Densities in the Commodity Market and Its Relevant Economic Implications

Applied Quantitative Finance

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS001) p approach

Which GARCH Model for Option Valuation? By Peter Christoffersen and Kris Jacobs

Market Risk Analysis Volume I

Lecture 6: Non Normal Distributions

Department of Econometrics and Business Statistics

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function?

Financial Models with Levy Processes and Volatility Clustering

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

University of Washington at Seattle School of Business and Administration. Asset Pricing - FIN 592

Chapter 15: Jump Processes and Incomplete Markets. 1 Jumps as One Explanation of Incomplete Markets

GENERATION OF STANDARD NORMAL RANDOM NUMBERS. Naveen Kumar Boiroju and M. Krishna Reddy

Relevant parameter changes in structural break models

Semiparametric Modeling, Penalized Splines, and Mixed Models

On the valuation of the arbitrage opportunities 1

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

Semiparametric Modeling, Penalized Splines, and Mixed Models David Ruppert Cornell University

Machine Learning for Quantitative Finance

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

1. You are given the following information about a stationary AR(2) model:

Skewness and Kurtosis Trades

2017 IAA EDUCATION SYLLABUS

Using MCMC and particle filters to forecast stochastic volatility and jumps in financial time series

A comment on Christoffersen, Jacobs and Ornthanalai (2012), Dynamic jump intensities and risk premiums: Evidence from S&P500 returns and options

A Non-Parametric Technique of Option Pricing

On the Distribution and Its Properties of the Sum of a Normal and a Doubly Truncated Normal

Asset Allocation Model with Tail Risk Parity

Computational Statistics Handbook with MATLAB

Equity correlations implied by index options: estimation and model uncertainty analysis

Diagnosing Affine Models of Options Pricing: Evidence from VIX

Volatility Trading Strategies: Dynamic Hedging via A Simulation

UPDATED IAA EDUCATION SYLLABUS

2. Copula Methods Background

Asymmetric Price Transmission: A Copula Approach

Internet Appendix for Asymmetry in Stock Comovements: An Entropy Approach

Modeling skewness and kurtosis in Stochastic Volatility Models

Estimation of dynamic term structure models

A Skewed Truncated Cauchy Logistic. Distribution and its Moments

COS 513: Gibbs Sampling

Time Dependent Relative Risk Aversion

IEOR E4602: Quantitative Risk Management

Window Width Selection for L 2 Adjusted Quantile Regression

Journal of Economics and Financial Analysis, Vol:1, No:1 (2017) 1-13

Extracting Information from the Markets: A Bayesian Approach

Modeling of Price. Ximing Wu Texas A&M University

Option-based tests of interest rate diffusion functions

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Content Added to the Updated IAA Education Syllabus

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

OULU BUSINESS SCHOOL. Ilkka Rahikainen DIRECT METHODOLOGY FOR ESTIMATING THE RISK NEUTRAL PROBABILITY DENSITY FUNCTION

Volatility Spillovers and Causality of Carbon Emissions, Oil and Coal Spot and Futures for the EU and USA

From Financial Engineering to Risk Management. Radu Tunaru University of Kent, UK

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Hedging Under Jump Diffusions with Transaction Costs. Peter Forsyth, Shannon Kennedy, Ken Vetzal University of Waterloo

Bayesian Multinomial Model for Ordinal Data

Smile in the low moments

Asset Pricing Models with Underlying Time-varying Lévy Processes

Testing for the martingale hypothesis in Asian stock prices: a wild bootstrap approach

ARCH Models and Financial Applications

Trading on Deviations of Implied and Historical Densities

Discussion Paper No. DP 07/05

Market risk measurement in practice

MODELLING VOLATILITY SURFACES WITH GARCH

Entropic Derivative Security Valuation

Monte Carlo Methods in Financial Engineering

Black-Scholes Option Pricing

Valuation of performance-dependent options in a Black- Scholes framework

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

Estimation after Model Selection

Do investors dislike kurtosis? Abstract

Does Commodity Price Index predict Canadian Inflation?

Can Rare Events Explain the Equity Premium Puzzle?

Empirical Test of Affine Stochastic Discount Factor Model of Currency Pricing. Abstract

Greek parameters of nonlinear Black-Scholes equation

Empirical Dynamic Asset Pricing

Analyzing Oil Futures with a Dynamic Nelson-Siegel Model

TEST OF BOUNDED LOG-NORMAL PROCESS FOR OPTIONS PRICING

Modelling the Term Structure of Hong Kong Inter-Bank Offered Rates (HIBOR)

Essays on Empirical Asset Pricing. Dongyoup Lee

Dynamic Portfolio Choice II

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Option Pricing with Aggregation of Physical Models and Nonparametric Statistical Learning

Richardson Extrapolation Techniques for the Pricing of American-style Options

THAT COSTS WHAT! PROBABILISTIC LEARNING FOR VOLATILITY & OPTIONS

Estimation Appendix to Dynamics of Fiscal Financing in the United States

Idiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective

Assessing Regime Switching Equity Return Models

The Use of Importance Sampling to Speed Up Stochastic Volatility Simulations

GARCH Options in Incomplete Markets

FX Smile Modelling. 9 September September 9, 2008

Genetics and/of basket options

Oil Price Volatility and Asymmetric Leverage Effects

Transcription:

A Bayesian Approach to Bandwidth Selection for Multivariate Kernel Regression with an Application to State-Price Density Estimation Xibin Zhang Robert D. Brooks Maxwell L. King 1 Department of Econometrics and Business Statistics, Monash University June 2006 Abstract: Multivariate kernel regression is an important tool for investigating the relationship between a response and a set of explanatory variables. It is generally accepted that the performance of a kernel regression estimator largely depends on the choice of bandwidth rather than the kernel function. This nonparametric technique has been employed in a number of empirical studies including the state-price density estimation pioneered by Aït-Sahalia and Lo (1998). However, the widespread usefulness of multivariate kernel regression has been limited by the difficulty in computing a data-driven bandwidth. In this paper, we present a Bayesian approach to bandwidth selection for multivariate kernel regression. A Markov chain Monte Carlo (MCMC) algorithm is presented to sample the bandwidth vector and other parameters in a multivariate kernel regression model. The proposed bandwidth selection algorithm is applied to a multivariate kernel regression model that is often used to estimate the state-price density estimation of Arrow-Debreu securities. When applying the proposed method to the DAX data, we find similar results as those reported in Aït-Sahalia and Lo (1998). Key words: Black-Scholes formula, likelihood, Markov chain Monte Carlo, posterior density. 1 Corresponding Author. Email: max.king@buseco.monash.edu.au; telephone: +61-3-99052449; address: Monash University, Victoria 3800, Australia.

1 Introduction The multivariate kernel regression technique helps investigate the relationship between a response and a set of explanatory variables without imposing any parametric assumptions of the form of such a relationship. Stanton (1997) indicated that one potentially serious problem with any parametric model, particularly when we have no economic reason to prefer one functional form over another, is misspecification, which was further addressed by Backus, Foresi and Zin (1995) by showing that misspecification of interest rate models can lead to serious pricing and hedging errors 2. However, a proper use of relevant nonparametric techniques often helps avoid misspecification problems caused by most parametric models. In empirical studies, the multivariate kernel regression technique can be employed to avoid having to specify a functional form for the relationship between a response and explanatory variables, which we denote as y and x = (x 1, x 2,..., x d ), respectively. Given observations (y i, x i ), for i = 1, 2,, n, the multivariate kernel regression model is expressed as y i = m(x i ) + ε i, (1) where ε i, for i = 1, 2,, n, are assumed to be independent and identically distributed with mean zero and variance γ 2. The Nadaraya-Watson estimator of m( ) is given by ˆm(x, h) = n 1 n i=1 K h (x x i )y i n 1 n i=1 K h (x x i ), (2) where h = (h 1, h 2,, h d ) is a vector of bandwidths with all elements positive, and K h (x) = ( 1 x1 K, x 2,, x ) d h 1 h 2 h d h 1 h 2 h d 2 Stanton (1997) also indicated that existing parametric models of interest rate do not even fit historical data well. Aït-Sahalia (1996) presented empirical studies to compare the marginal density implied by each parametric model with that estimated directly from the same data, and found that every parametric model of the spot rate previously proposed in the literature was rejected. 1

with K( ) denoting a multivariate kernel function. This nonparametric estimation method is a popular technique in the empirical finance area, having been used by Aït-Sahalia and Lo (1998, 2000), Broadie, Detemple, Ghysels and Torres (2000), Aït-Sahalia, Bickel and Stoker (2001), Breitung and Wulff (2001), Mancuso, Goodwin and Grennes (2003) and Fernandes (2006), among others. As revealed by (2), the multivariate kernel regression estimator is a weighted average of observed values of y. Herrmann (2000) indicated that the region of such a local average and the amount of smoothness of the regression estimator are dominated by the bandwidth, and that the performance of kernel regression estimators largely depends on the choice of bandwidth rather than the kernel function. Multivariate kernel regression is an important technique for investigating the relationship between a response and covariates and has a number of important applications (see, for example, Donald, 1997; Stanton, 1997; Aït-Sahalia and Lo, 1998; Boudoukh, Whitelaw, Richardson and Stanton, 1997). However, its widespread usefulness has been limited by the difficulty in computing a data-driven bandwidth. We remedy this deficiency in this paper. According to Härdle and Müller (2000), methods employed for choosing a bandwidth in kernel regression are basically the same as those employed in kernel density estimation. A large body of literature exists on bandwidth selection for univariate kernel density estimation (Marron, 1987; Scott, 1992; Wand and Jones, 1995; Jones, Marron and Sheather, 1996; for surveys). However, the literature on bandwidth selection for multivariate kernel density estimation is quite limited. Sain, Baggerly and Scott (1994) employed the biased cross-validation method to estimate bandwidth for bivariate kernel density estimation. Wand and Jones (1994) and Duong and Hazelton (2003) presented plug-in algorithms for choosing bandwidth for bivariate data. However, the above-mentioned biased crossvalidation method and plug-in algorithms cannot be directly extended to kernel density estimation with more than two variables (see, for example, Zhang, King and Hyndman, 2

2006). Hence there is little guidance in the literature on how to derive a data-driven bandwidth vector for multivariate kernel regression with more than two regressors, which is definitely an important issue in empirical studies. Fan and Gijbels (2000) presented a survey on bandwidth selection for univariate local polynomial fitting, which includes the Nadaraya-Watson estimator as a special case. They discussed two bandwidth selectors, namely the rule-of-thumb and the plug-in bandwidth selectors, in which the former is basically the same as the rule-of-thumb bandwidth selector for kernel density estimation documented in Scott (1992) and Bowman and Azzalini (1997) 3. However, the plug-in bandwidth selector cannot be directly extended to kernel regression with more than two regressors. The rule-of-thumb bandwidth selector is eligible for multivariate kernel regression in the situation, where the data are observed from a multivariate normal density and the kernel function the standard normal density. This is a rather crude bandwidth selector, even though it is often used in practice, in the absence of any other practical bandwidth selectors, despite the fact that most interesting data are non-gaussian. Härdle and Müller (2000) discussed bandwidth selection for multivariate kernel regression and showed that in practice, the cross-validation method is often employed to choose a data-driven bandwidth for kernel regression. This bandwidth selection method requires a numerical optimization procedure, which becomes increasingly difficult to implement as the number of regressors increases. Zhang, King and Hyndman (2006) presented a Bayesian approach to bandwidth selection for multivariate kernel density estimation, where bandwidths are treated as parameters, whose posterior is derived via the Kullback-Leibler information. In the context of choosing a data-driven bandwidth for multivariate kernel regression, we can also treat h as a vector of parameters, whose posterior density can be obtained through the cross-validation method with a known distribution 3 Herrmann, Wand, Engel and Gasser (1995) provided a detailed discussion of the bivariate plug-in bandwidth selector. 3

of errors given in (1). A posterior estimate of h can be derived via a Markov chain Monte Carlo (MCMC) algorithm. One important advantage of the MCMC technique for deriving a data-driven bandwidth is that it is applicable to data with any number of regressors. Moreover, the sampling algorithm involves no increased difficulty when the number of regressors increases. The empirical finance literature is characterized by a number of problems that start with the state-price density (SPD) or pricing kernel implicit in the prices of traded financial assets. Major applications of this approach have focused on option pricing (see, for example, Aït-Sahalia and Lo, 1998; Broadie, Detemple, Ghysels and Torres, 2000; Ait-Sahalia and Duarte, 2003), value at risk estimation (see, for example, Aït-Sahalia and Lo, 2000), modelling financial crashes (see, for example, Fernandes, 2006), modelling exchange rate dynamics (see, for example, Brandt and Santa-Clara, 2002; Inci and Lu, 2004), portfolio performance measurement (see,for example, Ayadi and Kryzanowski, 2005) and the term structure of interest rates (see, for example, Hong and Li, 2005). Yatchew and Härdle (2006) indicated that in general that economic theory does not propose specific functional forms for the state price densities. As such, Yatchew and Härdle (2006) proposed a nonparametric solution based on constrained least squares and a bootstrap procedure. One important application of multivariate kernel regression is the one pioneered by Aït-Sahalia and Lo (1998), who employed this nonparametric technique to estimate the SPD of Arrow-Debreu securities known as the fundamental building block for analyzing economic equilibrium under uncertainty. dynamic equilibrium model, the price of a security is given by Aït-Sahalia and Lo (1998) showed that in a P t = exp{r t,τ τ}et [Z(S T )] = exp{r t,τ τ} Z(S T )ft (S T )ds T, where T = t + τ, E t represents the conditional expectation given information available at date t, Z(S T ) is the payoff of the security at date T, r t,τ is a constant risk-free interest rate 4

between t and T, and ft (S T ) is the date-t SPD for the payoff of the security at date T. Aït-Sahalia and Lo (1998) argued that the SPD summarizes all relevant information for the purpose of pricing the underlying security. When the underlying security is an option, Aït-Sahalia and Lo (1998) indicated that the SPD is the second-order derivative of a call option-pricing formula with respect to strike price computed at S T, and the option-pricing formula can be estimated using the multivariate kernel regression technique. Aït-Sahalia and Lo (1998) argued that the price of a call option is a nonlinear function of (S t, X t, τ, r t,τ, δ t,τ ) with unknown form, in which δ t,τ represents the dividend rate at date t. Once the nonlinear relationship is estimated through the multivariate kernel regression technique, there will be no further difficulty to derive the second-order derivative of H with respect to X. This nonparametric approach to SPD estimation pioneered by Aït- Sahalia and Lo (1998) has been followed by a large number of empirical studies, including Huynh, Kervella and Zheng (2002) who presented two methods for dimension reduction. Aït-Sahalia and Lo (1998) presented comprehensive simulation and empirical studies to illustrate the effectiveness of the multivariate kernel regression technique in estimating SPDs. However, it appears that their bandwidth vectors were chosen subjectively. As the choice of bandwidth plays an important role in multivariate kernel regression, we remedy this problem using a modification of Zhang, King and Hyndman s (2006) algorithm for choosing data-driven bandwidths. This paper aims to investigate the problem of choosing a data-driven bandwidth vector for multivariate kernel regression, where the bandwidth vector is treated as a vector of parameters. An algorithm will be presented to sample parameters from their posterior according to the Metropolis-Hastings rule, and the estimated bandwidth vector is optimal with respect to the averaged squared error (ASE) criterion, which will be further discussed in the next section. This algorithm represents the first data-driven bandwidth selection method for multivariate kernel regression with more than two regressors. 5

The rest of the paper is organized as follows. Section 2 provides a Bayesian approach to bandwidth selection for multivariate kernel regression models. In Section 3, we present a brief description of the nonparametric state-price density estimation method presented by Aït-Sahalia and Lo (1998). In addition, we show that how the Bayesian bandwidth selection technique can be applied to the nonparametric estimation of volatility. Section 4 provides an application of Bayesian bandwidth selection technique to volatility estimation and the state-price density estimation with the DAX data. Concluding remarks are given in the last section. 2 Bayesian Bandwidth Selector 2.1 Cross-validation As we discussed in the previous section, the most important issue of multivariate kernel regression is the selection of the optimal bandwidth under a chosen criterion. One such criterion is ASE given by ASE(h) = 1 n [ ˆm(x i, h) m(x i )] 2, (3) n i=1 and an optimal bandwidth, denoted by ĥo, is the one that minimizes ASE(h). The goodness of fit of the Nadaraya-Watson estimator ˆm(x i, h) can be assessed by the sum of squared residuals SSE(h) = 1 n [y i ˆm(x i, h)] 2, (4) n i=1 which is referred to as the re-substitution estimate of ASE by Härdle and Müller (2000). SSE(h) can be made arbitrarily small by allowing h 0, because y i is used in ˆm(x i, h) to predict itself. The cross-validation method is to estimate m(x) using data with the ith observation deleted, and the resulted leave-one-out estimator is ˆm i (x i, h) = n 1 n j i K h (x i x j )y j n 1 n j i K h (x i x j ), (5) 6

for i = 1, 2,, n. An optimal bandwidth under the cross-validation rule is the one that minimizes CV(h) = n [y i ˆm i (x i, h)] 2. (6) i=1 Let ĥcv denote the bandwidth obtained through cross-validation. Härdle, Hall and Marron (1988) showed that ASE(ĥo)/ASE(ĥcv) 1 and ĥcv ĥo, as n, where the convergence is in probability. Hence ĥcv is asymptotically optimal with respect to the ASE criterion. In general, solving the problem of minimization CV(h) with respect to h requires a procedure of numerical computation, which becomes increasingly difficult as the dimension of x increases. However, if we treat h as a vector of parameters, choosing a bandwidth for multivariate kernel regression is equivalent to estimating parameters based on available data. When the errors in (1) are assumed to follow a known distribution, the likelihood of (y 1, y 2,, y n ) given h can be derived through the cross-validation method. Moreover, the posterior density of h can be obtained accordingly. 2.2 Likelihood Consider the multivariate kernel regression model given by (1), where we assume that ε i, for i = 1, 2,, n, are independent and identically distributed (iid) as N(0, γ 2 ) with γ 2 an unknown parameter. 4 It follows that [y i m(x i )]/γ iid N(0, 1), for i = 1, 2,, n. Given that (h, γ 2 ) is treated as a vector of parameters, we wish to derive the distribution of y i, for i = 1, 2,, n. For such a purpose, we assume that y i iid N ( ˆm(x i, h), γ 2), (7) 4 It should be noted that the distribution of errors given in (1) is not restricted to the assumption of iid normal. The errors can be assumed to follow any known distribution or to be correlated, as long as likelihood can be derived. However, the focus of the paper is to investigate estimating bandwidth rather than selecting an appropriate assumption for the errors. We will investigate this issue elsewhere. 7

for i = 1, 2,, n. According to the discussion of convergence in Härdle, Hall and Marron (1988), ˆm(x i, h) and ˆm i (x i, h) are asymptotically equivalent when n is large. Hence we can assume that [y i ˆm i (x i, h)]/γ approximately follows N(0, 1), for i = 1, 2,, n. Given (h, σ 2 ), the likelihood of (y 1, y 2,, y n ) is { ni=1 l(y 1, y 2,, y n h, γ 2 ) = (2πγ 2 ) n/2 [y i ˆm i (x i, h)] 2 } exp. (8) 2γ 2 2.3 Posterior Estimate of the Bandwidth Vector Let π(h) and π(γ 2 ) denote the prior densities of h and γ 2, respectively. According to Bayes theorem, the posterior of (h, γ 2 ) is π(h, γ 2 y 1, y 2,, y n ) π(h)π(γ 2 )l(y 1, y 2,, y n h, γ 2 ). (9) Assume that the prior density of σ 2 is an inverted Gamma density denoted as IG(p/2, ν/2) with its density function given by π(γ 2 ) ( ) p/2+1 { 1 exp ν/2 }, γ 2 γ 2 where p and ν are hyperparameters. The prior density of h j is assumed to be π(h j ) 1 exp{ λ 1h 2 j}, (10) exp{λ 2 h 2 j} for j = 1, 2,, d, where λ 1 and λ 2 are positive hyperparameters controlling the shape of the prior 5. Then the posterior of (h, γ 2 ) becomes ( ) d (n+p)/2+1 { 1 ni=1 π(h, γ 2 [y i ˆm i (x i, h)] 2 } + ν y 1, y 2,, y n ) π(h j ) exp. γ 2 2γ 2 j=1 After integrating out γ 2, we can obtain the posterior of h expressed as π(h y 1, y 2,, y n ) = π(h, σ 2 y 1, y 2,, y n )dγ 2 (11) d 1 n π(h j ) i ˆm i (x i, h)] i=1 2 j=1[y 2 + ν (n+p)/2. (12) 2 5 λ 1 and λ 2 can be equal. When h j tends to infinity, the prior will tend to zero; when h j tends to zero, the prior will also tend to zero. 8

The random-walk Metropolis-Hastings algorithm can be employed to sample h with the acceptance probability computed through (9), while γ 2 can be sampled directly from ( n + p γ 2 1 n IG, i ˆm i (x i, h)] 2 2 i=1[y 2 + ν ). (13) 2 The ergodic average or the posterior mean of h acts as an estimator of the bandwidth. The likelihood function given by (8) is flat when components of h are large. If we use uniform priors for the components of h and employ the random-walk Metropolis- Hastings algorithm to sample h, the update of h will often have a negligible effect on the acceptance probability when components of h are already large. The purpose of our priors (10) is to put a low prior probability on the problematic region in the parameter space, at which likelihood is flat. See, for example, Zhang, King and Hyndman (2006) for a detailed discussion. 3 Nonparametric State-Price Density Estimation 3.1 SPD Estimator Derived via Black-Scholes Formula Aït-Sahalia and Lo (1998) presented a summary of investigating the relationship between the pricing of derivative securities and their SPDs. Let S t represent the date-t price of an asset, p t the date-t price of a derivative security written on the asset, and Z(S T ) the date-t payoffs of the security. In a dynamic equilibrium model, p t can be expressed as the expected net present value of Z(S T ), where the expectation is computed in terms of the state-price density f t (S T ). According to Aït-Sahalia and Lo s (1998) discussion, SPD is sufficient for the purpose of asset pricing. When the derivative security is an option, Aït-Sahalia and Lo (1998) indicated that the SPD is proportional to the second-order derivative of a call option-pricing formula with respect to the strike price. Under the hypothesis of Black and Scholes (1973) and 9

Merton (1973), the date-t price of a call option maturing at date T is given by H BS (S t, X, τ, r t,τ, δ t,τ ; σ) = S t exp(δ t,τ τ)φ(b 1 ) X exp(r t,τ τ)φ(b 2 ), where X is the strike price, σ is the volatility of the underlying asset, τ = T t, Φ( ) is the Gaussian cumulative density function, and b 1 and b 2 are defined as b 1 = ln(s t/x) + (r t,τ δ t,τ + σ 2 /2) σ, and b 2 = b 1 σ τ. τ Huynh, Kervella and Zheng (2002) presented the formula of the SPD, as well as those of the risk measure delta ( ) and gamma (Γ) given by { 1 f BS,t (S T ) = S T 2πσ2 τ exp [ln(s t/s t ) (r t,τ δ t,τ σ 2 /2)τ] 2 }, 2σ 2 τ (14) BS = H BS(S t, X, τ, r t,τ, δ t,τ ; σ) = Φ(b 1 ), S t (15) Γ BS = 2 H BS (S t, X, τ, r t,τ, δ t,τ ; σ) S 2 t where φ( ) is the Gaussian density function. = φ(b 1) S t σ τ, (16) 3.2 Nonparametric Estimation of Option-Pricing Formula Aït-Sahalia and Lo (1998) argued that the SPD estimator given by (14) is associated with the parametric assumptions underlying the Black-Scholes option-pricing model. If any of these assumptions does not hold, option prices derived though (14) might be incorrect. Aït-Sahalia and Lo (1998) showed that the date-t price of a call option, denoted by H, can be viewed as an unknown nonlinear function of z = (S t, X t, τ, r t,τ, δ t,τ ), which can be estimated through the multivariate kernel regression technique. The Nadaraya-Watson estimator of the relation between H and z is given by Ĥ(z h) = n 1 n i=1 K h (z z i )H i n 1 n i=1 K h (z z i ), (17) where (H i, z i ), for i = 1, 2,, n, are paired observations of (H, z). 10

As we discussed in previous sections, choosing a data-driven bandwidth under a chosen criterion is an important issue for multivariate kernel regression. Aït-Sahalia and Lo (1998) presented a graphical demonstration to illustrate the importance of bandwidth selection. They suggested choosing bandwidths according to the formula given by h j = c j s(z j )n 1/(d+2p), (18) for j = 1, 2,, d, where p is the order of the kernel function, s(z j ) is the unconditional standard deviation of z j, and c j is constant depending on the sample size. This bandwidth selector is similar to the rule-of-thumb and seems to be somewhat subjective. However, the proposed Bayesian approach to bandwidth selection for multivariate kernel regression discussed in Section 2 can be employed to derive a data-driven bandwidth. Aït-Sahalia and Lo (1998) raised a practical concern about the dimension involved in the multivariate kernel regression given by (17), because it is increasingly difficult to derive accurate estimators of the regression function and its derivatives as the number of regressors increases. They presented three methods to reduce the number of regressors, and one of these methods assumes that the call-option pricing formula is given by the Black-Scholes formula except that the date-t implied volatility, denoted by σ t, is a nonparametric function of z = (F t, X, τ), where F T is the date-t futures price of the underlying asset. The kernel estimator of the regression function of σ on z is given by ˆσ(F t, X, τ h) = n 1 n i=1 K h( z z i )σ i n 1 n i=1 K h( z z i ), (19) where is the volatility implied by the prices H i, and h is a vector of bandwidths. The call-option pricing function is given by Ĥ(S t, X, τ, r t,τ, δ t,τ ) = H BS ( St, X, τ, r t,τ, δ t,τ ; ˆσ(F t, X, τ h) ), (20) based on which the option s, Γ and SPD estimators can obtained by substituting ˆσ(F t, X, τ h) into (14) to (16), respectively. 11

Aït-Sahalia and Lo (1998) showed that the SPD derived through the above dimensionreduction method is not significantly different from that obtained through the full nonparametric regression model (17). In what follows, we will employ the bandwidth selection method presented in Section 2 to choose a data-driven bandwidth vector for the kernel estimator given by (19). 4 An Application to DAX Data The data set contains daily settlement prices for each call-option contract of January 1997 (28 trading days) with the following variables: option price, spot price, strike price, time to maturity, risk-free interest rate, dividend, futures price and implied volatility. The sample size is n = 2972. This data set was provided by Huynh, Kervella and Zheng (2002), who derived similar results as those reported by Aït-Sahalia and Lo (1998). 4.1 Bandwidth Selection We accepted Aït-Sahalia and Lo s (1998) conclusion that SPD, and Γ derived from the kernel regression of H on z are not significantly different from those derived from the Black-Sholes formula with volatility estimated by the kernel regression of σ on z. The kernel regression model is given by σ t = m( z t ) + ε t, (21) for t = 1, 2,, n, and ε t, for t = 1, 2,, n, are assumed to be iid and distributed as N(0, γ 2 ). To obtain a closed form of the posterior density of ( h, γ 2 ), we set λ 1 = λ 2 = 0.5, p = 2 and ν = 0.1, in which the values of p and ν are quite standard in most algorithms for sampling the variance parameter. The random-walk Metropolis-Hastings algorithm 12

Table 1: Estimated bandwidth vector and some statistics parameters mean standard batch-mean SIF acceptance deviation standard error rate γ 2 0.000203 0.00000528 0.00000002 0.8669 0.2572 h 1 0.186550 0.10471060 0.00778742 10.8934 h 2 0.077039 0.00329595 0.00006296 1.3583 h 3 0.077039 0.00116666 0.00004014 2.4371 was employed to sample h, while γ 2 was directly sampled from an inverted Gamma density given by (13). We employed the batch-mean standard error and the simulation inefficiency factor (SIF) to check the convergence performance of the sampling algorithm (see, for example, Roberts, 1996; Kim, Shephard and Chib, 1998; Tse, Zhang and Yu, 2004). Both the batch-mean standard error and SIF indicate that all the simulated chains have converged very well. Table 1 presents the estimated bandwidth vector, the estimated γ 2 and associated statistics. 4.2 SPD Estimation via Black-Scholes Pricing Function Given the estimated bandwidth vector, we computed the fitted values of σ t according to (19). Using the fitted value of σ t to replace σ in (14) to (16), we derived estimates of SPD, and Γ at any observation. Figure 1 presents graphs of SPD, and Γ computed at the first observation. Both bandwidth selection methods produce quite similar values and Γ values, except fact that the graph of Γ estimated through the Bayesian bandwidth is more skewed to the left than that estimated through the subjective bandwidth. However, the SPD estimated through the Bayesian bandwidth has a lower peak and fatter tails than that estimated through the subjective bandwidth. Unfortunately, we do not know which estimate of SPD is more accurate than the other, because the true SPD is unknown. 13

Figure 1: Graphs of the estimated, Γ and SPD. The solid lines represent estimates obtained through the Bayesian bandwidth, with dashed lines represent estimates obtained through the subjective bandwidth. (1) is the graph of ; (2) is the graph of Γ; and (3) is the graph of SPD. 14

Guidolin and Timmermann (2003) extended the results reported by Aït-Sahalia and Lo (1998) and showed how Bayesian learning effects produce more extreme tail probabilities in SPDs than those observed in the lognormal case of the Black-Scholes model. We found that our estimated SPDs have different tail probabilities from those estimated using the Aït-Sahalia and Lo s (1998) model. 5 Conclusion This paper presented a Bayesian approach to bandwidth selection for multivariate kernel regression. The proposed sampling algorithm represents the first data-driven method for choosing bandwidth for kernel regression with more than two regressors. Our sampling algorithm provides a solution for choosing a data-driven bandwidth for multivariate kernel regression, which is employed for estimating the state-price density of Arrow-Debreu securities. When applying the the proposed bandwidth selection method to the kernel regression of implied volatility on the futures price, strike price and time to maturity, we have found that the estimated volatility produces an obviously different SPD from the one produced via a subject bandwidth selector discussed in Huynh, Kervella and Zheng (2002) and Aït-Sahalia and Lo (1998). Our paper provides a data-driven solution for bandwidth selection in SPD estimation pioneered by Aït-Sahalia and Lo (1998). 15

References Aït-Sahalia, Y., 1996, Testing Continuous-Time Models of the Spot Interest Rate, Review of Financial Studies, 9, 385-426. Aït-Sahalia, Y., and A.W. Lo, 1998, Nonparametric Estimation of State-Price Densities Implicit in Financial Asset Prices, The Journal of Finance, 53, 499-547. Aït-Sahalia, Y., P. Bickel, and T. Stoker, 2001, Goodness-of-fit Tests for Kernel Regression with an Application to Option Implied Volatilities, Journal of Econometrics, 105, 363-412. Aït-Sahalia, Y., and J. Duarte, 2003, Nonparametric Option Pricing under Shape Restrictions, Journal of Econometrics, 116, 85-112. Aït-Sahalia, Y., and A.W. Lo, 2000, Nonparametric Risk Management and Implied Risk Aversion, Journal of Econometrics, 94, 9-51. Ayadi, M., and L. Kryzanowski, 2000, Portfolio Performance Measurement Using APMfree Kernel Models, Journal of Banking and Finance, 29, 623-659. Backus, D.K., S. Foresi, and S.E. Zin, 1998, Arbitrage Opportunities in Arbitrage-Free Models of Bond Pricing, Journal of Business and Economic Statistics, 16, 13-24. Boudoukh, J., R.F. Whitelaw, M. Richardson, and R. Stanton, 1997, Pricing Mortgage- Backed Securities in a Multifactor Interest Rate Environment: A Multivariate Density Estimation Approach, Review of Financial Studies, 10, 405-446. Bowman, A.W., and A. Azzalini, 1997, Applied Smoothing Techniques for Data Analysis, Oxford University Press, London. Brandt, M., and P. Santa-Clara, 2002, Simulated Likelihood Estimation of Diffusions with an Application to Exchange Rate Dynamics in Incomplete Markets,, Journal of Financial Economics, 63, 161-210. Breitung, J., and C. Wulff, 2001, Non-linear Error Correction and the Efficient Market Hypothesis: The Case of German Dual-class Shares, German Economic Review, 2, 16

419-434. Broadie, M., J. Detemple, E. Ghysels, and O. Torres, 2000, American Options with Stochastic Dividends and Volatility: Econometrics, 94, 53-92. A Nonparametric Investigation, Journal of Donald, S.G., 1997, Inference Concerning the Number of Factors in a Multivariate Nonparametric Relationship, Econometrica, 65, 103-131. Duong, T., and M.L. Hazelton, 2003, Plug-in Bandwidth Selectors for Bivariate Kernel Density Estimation, Journal of Nonparametric Statistics, 15, 17-30. Fan, J., and I. Gijbels, 2000, Local Polynomial Fitting, in M.G. Schimek (eds.), Smoothing and Regression: Approaches, Computation, and Application, John Wiley & Sons, New York, 229-276. Härdle, W., and P. Hall, and J.S. Marron, 2000, How Far Are Automatically Chosen Regression Estimators from Their Optimum? Journal of the American Statistical Association, 83, 86-97. Härdle, W., and M. Müller, 2000, Multivariate and Semiparametric Kernel Regression, in M.G. Schimek (eds.), Smoothing and Regression: Approaches, Computation, and Application, John Wiley & Sons, New York, 357-392. Fernandes, M., 2006, Financial Crashes as Endogenous Jumps: Estimation, Testing and Forecasting, Journal of Economic Dynamics and Control, 30, 111-141. Guidolin, M., and A. Timmermann, 2003, Option Prices under Bayesian Learning: Implied Volatility Dynamics and Predictive Densities, Journal of Economic Dynamics and Control, 27, 717-769. Herrmann, E., 2000, Variance Estimation and Bandwidth Selection for Kernel Regression, in M.G. Schimek (eds.), Smoothing and Regression: Approaches, Computation, and Application, John Wiley & Sons, New York, 71-108. Herrmann, E., M.P. Wand, J. Engel, and T. Gasser, 1995, A Bandwidth selector for 17

Bivariate Kernel Regression, Journal of the Royal Statistical Society, Series B, 57, 171-180. Hong, Y., and H. Li, 2005, Nonparametric Specification Testing for Continuous-time Models with Applications to Term Structure of Interest Rates, Review of Financial Studies, 18, 37-84. Huynh, K., P. Kervella, and J. Zheng, 2002, Estimating State-Price Densities with Nonparametric Regression, in W. Härdle, T. Kleinow and G. Stahl (eds.), Applied Quantitative Finance, Chapter 8. Inci, A, and B. Lu, 2004, Exchange Rates and Interest Rates: Can Term Structure Models Explain Currency Movements, Journal of Economic Dynamics and Control, 28, 1595-1624. Jones, M.C., J.S. Marron, and S.J. Sheather, 1996, A Brief Survey of Bandwidth Selection for Density Estimation, Journal of the American Statistical Association, 91, 401-407. Kim, S., N. Shephard, and S. Chib, 1998, Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models, Review of Economic Studies, 65, 361-393. Mancuso, A., B. Goodwin, and T. Grennes, 2003, Nonlinear Aspects of Capital Market Integration and Real Interest Rate Equalization, International Review of Economics and Finance, 12, 283-303. Marron, J.S., 1987, A Comparison of Cross-Validation Techniques in Density Estimation, Annals of Statistics, 15, 152-162. Roberts, G.O., 1996, Markov Chain Concepts Related to Sampling Algorithms, in W.R. Gilks, S. Richardson, and D.J. Spiegelhalter (eds.), Markov Chain Monte Carlo in Practice, Chapman & Hall, London, 45-57. Sain, S.R., K.A. Baggerly, and D.W. Scott, 1994, Cross-Validation of Multivariate Densities, Journal of the American Statistical Association, 89, 807-817. 18

Scott, D.W., 1992, Multivariate Density Estimation: Theory, Practice, and Visualization, John Wiley & Sons, New York. Stanton, R., 1997, A Nonparametric Model of Term Structure Dynamics and the Market Price of Interest Rate Risk, The Journal of Finance, 52, 1973-2002. Tse, Y.K., X. Zhang, and J. Yu, 2004, Estimation of Hyperbolic Diffusion Using the Markov Chain Monte Carlo Simulation Method, Quantitative Finance, 4, 158-169. Wand, M.P., and M.C. Jones, 1994, Multivariate Plug-in Bandwidth Selection, Computational Statistics, 9, 97-116. Wand, M.P., and M.C. Jones, 1995, Kernel Smoothing, Chapman & Hall, London. Yatchew, A., and W. Härdle, 2006, Nonparametric State Price Density Estimation Using Constrained Least Squares and the Bootstrap, forthcoming in Journal of Econometrics. 19