Pricing of high-dimensional American options by neural networks

Pricig of high-dimesioal America optios by eural etworks Michael Kohler,,Adam Krzyżak 2 ad Nebojsa Todorovic Departmet of Mathematics, Uiversity of Saarbrücke, Postfach 550, 6604 Saarbrücke, Germay, email: kohler@math.ui-sb.de, todorovic@math.ui-sb.de 2 Departmet of Computer Sciece ad Software Egieerig, Cocordia Uiversity, 455 De Maisoeuve Blvd. West, Motreal, Quebec, Caada H3G M8, email: krzyzak@cs.cocordia.ca December 2, 2006 Abstract Pricig of America optios i discrete time is cosidered, where the optio is allowed to be based o several uderlyigs. It is assumed that the price processes of the uderlyigs are give Markov processes. We use the Mote Carlo approach to geerate artificial sample paths of these price processes, ad the we use the least squares eural etworks regressio estimates to estimate from this data the so-called cotiuatio values, which are defied as mea values of the America optios for give values of the uderlyigs at time t subject to the costrait that the optios are ot exercised at time t. Results cocerig cosistecy ad rate of covergece of the estimates are preseted, ad the pricig of America optios is illustrated by simulated data. AMS classificatio: Primary 9B28, 60G40, 93E20; secodary 65C05, 93E24, 62G05. Key words ad phrases: America optios, cosistecy, eural etworks, oparametric regressio, optimal stoppig, rate of covergece, regressio based Mote Carlo methods. Correspodig author. Tel: +49-68-302-2435, Fax: +49-68-302-6583 Ruig title: Pricig America optios by eural etworks

Itroductio I this article we cosider America optios i discrete time. The price V 0 of such optios ca be defied as a solutio of a optimal stoppig problem V 0 = sup E {f τ X τ }. τ T 0,...,T Here f t is the discouted payoff fuctio, X 0, X,..., X T is the uderlyig stochastic process describig e.g. the prices of the uderlyigs ad the fiacial eviromet like iterest rates, etc. ad T 0,...,T is the class of all {0,...,T }-valued stoppig times, i.e., τ T 0,...,T is a measurable fuctio of X 0,..., X T satisfyig {τ = α} FX 0,...,X α for all α {0,...,T }. As a simple example cosider pricig of a America put optio with strike K o the arithmetic mea of several correlated uderlyigs, where the stock values are modelled via Black-Scholes theory by X i,t = x i,0 e r t ep m j= σ i,j W j t 2 σ2 i,j t i =,...,m. 2 Here r > 0 is the give riskless iterest rate, σ i = σ i,,...,σ i,m T is the give volatility of the i-th stock, x i,0 is the iitial stock price of the i-th stock, ad {W j t : t IR + } j =,..., m are idepedet Wieer processes. If we sell the optio at time t > 0 ad the stock prices are at this poit x = x,...,x m i.e., the arithmetic mea of the stock prices is m m j= x j, we get the payoff max K m x j,0 m, ad if we discout this payoff towards time zero, we get the discouted payoff fuctio f t x,...,x m = e r t max K m x j,0 m. 3 But eve if all the parameters are kow i.e., if x i,0 i =,...,m ad K are give ad if we estimate the volatilities σ i i =,...,m ad the riskless iterest rate from observed data from the past, it is ot obvious how we ca compute the price { { }} V 0 = sup E e r τ max K m X i,τ,0 τ T 0,...,T m 2 j= j= i=

of the correspodig America optio. I the above Black-Scholes model we ca reformulate the whole problem as a free boudary problem for partial differetial equatios cf., e.g., Chapter 8 i Elliott ad Kopp 999, but the umerical solutio of this free boudary problem gets very complicated if the umber m of uderlyigs gets large. I additio, for m 2 biomial trees cf., e.g., Chapter i Elliott ad Kopp 999 are able to produce very good estimates of V 0, but for m > 3 it is with this method basically impossible to model the correlatio structure of the stocks correctly. The purpose of this article is to develop a Mote Carlo algorithm which is able to compute a approximatio of the price eve i case that the optio is based o a large umber of correlated stocks, that the stock prices are ot modelled by a simple Black-Scholes model as i 2 ad that the payoff fuctio is ot as simple as i 3. I particular the method developed i this article is also applicable i case that the process X i,t are adjusted to observed data by time series estimatio as described, e.g., i Frake ad Diage 2002. I the sequel we assume that X 0, X,..., X T is a IR d valued Markov process recordig all ecessary iformatio about fiacial variables icludig prices of the uderlyig assets as well as additioal risk factors drivig stochastic volatility or stochastic iterest rates. Neither the Markov property or the form of the payoff as a fuctio of the state X t is restrictive ad ca always be achieved by icludig supplemetary variables. The computatio of ca be doe by determiatio of a optimal stoppig rule τ T 0,...,T satisfyig Let q t x = V 0 = E{f τ X τ }. 4 sup E {f τ X τ X t = x} 5 τ T t+,...,t be the so called cotiuatio value describig the value of the optio at time t give X t = x ad subject to the costrait of holdig the optio at time t rather tha exercisig it. Here T t +,...,T is the class of all {t +,...,T } valued stoppig times. It ca be show that τ = if{s 0 : q s X s f s X s } 6 3

satisfies 4, i.e., τ is a optimal stoppig time cf., e.g., Chow, Robbis ad Siegmud 97 or Shiryayev 978. Therefore it suffices to compute the cotiuatio values 5 i order to solve the optimal stoppig problem. The cotiuatio values satisfy the dyamic programmig equatios q T x = 0, q t x = E {max{f t+ X t+,q t+ X t+ } X t = x} t = 0,,...,T. 7 Ideed, by aalogy to 6 we have q t x = E{f τ t X τ t X t = x} where τt = if{s t + q s X s f s X s }, hece by usig the Markov property of {X s } s=0,...,t we get q t X t } = E {f t+ X t+ I {qt+ X t+ f t+ X t+ } + f τ X t+ τ I t+ {qt+ X t+ >f t+ X t+ } X t = E{E{... X 0,...,X t+ } X 0,...,X t } = E {max{f t+ X t+,q t+ X t+ } X t }. Ufortuately, the coditioal expectatio i 7 i geeral caot be computed i applicatios. The basic idea of regressio-based Mote Carlo methods for pricig America optios is to apply recursively regressio estimates to artificially created samples of X t,max {f t+ X t+, ˆq t+ X t+ } so called Mote Carlo samples to costruct estimates ˆq t of q t. I coectio with liear regressio this was proposed i Tsitsiklis ad Va Roy 999, ad, based o a differet regressio estimatio tha 7, i Logstaff ad Schwartz 200. Noparametric least squares regressio estimates have bee applied ad ivestigated i this cotext i Egloff 2005 ad Egloff, Kohler ad Todorovic 2006, smoothig splie regressio estimates have bee aalyzed i this cotext i Kohler 2006b, recursive kerel regressio estimates have bee cosidered i Barty et al. 2006. I this article we propose to use least squares eural etworks regressio estimates i order to compute the coditioal expectatios i 7, which is particularly promisig for optios based o several uderlyigs, where high-dimesioal regressio problems have 4

to be solved i order to compute approximatios of the cotiuatio values. Due to the well-kow curse of dimesioality it is difficult to choose here a reasoable oparametric regressio estimate, ad eural etworks belog together with regressio trees cf., e.g., Breima et al. 984 or iteractio models cf., e.g., Stoe 994 to stadard estimates i this field. Below we defie least squares eural etworks regressio estimates of the cotiuatio values where all parameters of the estimates are chose usig the give data oly. We will show that these estimates are uiversally cosistet i the sese that their L 2 errors coverge to zero i probability for all distributios. Furthermore, uder regularity coditios o the smoothess of the cotiuatio values we will aalyze the rate of covergece of the estimates. Fially, we will illustrate the estimates by applyig them to simulated data. The precise defiitio of the estimates ad the mai theoretical results cocerig cosistecy ad rate of covergece of the estimate are give i Sectios 2 ad 3, respectively. The applicatio of the estimates to simulated data will be described i Sectio 4, ad the proofs will be give i Sectio 5. 2 Defiitio of the estimate Let σ : IR [0, ] be a sigmoid fuctio, i.e., assume that σ is mootoically icreasig ad satisfies σx 0 x ad σx x. A example of such a sigmoid fuctio is the logistic squasher defied by σx = +e x x IR. I the sequel we estimate the cotiuatio values by eural etworks with k IN hidde euros ad a sigmoid fuctio σ. We will use the priciple of least squares to fit such a fuctio to the data, ad for techical reasos we will restrict the sum of the absolute values of the output weights. The choice of umber k of hidde euros will be data-drive by usig sample splittig. Let β > 0 which we will choose later such that β ad let F k β 5

be a class of eural etworks defied by { k F k β = c i σa T i x + b i + c 0 : a i IR d, b i IR, i= where σ is the sigmoid fuctio from above. } k c i β I the sequel we describe a algorithm to estimate the cotiuatio values q t recursively. To do this we geerate artificial idepedet Markov processes {X l i,t } t=0,...,t l = 0,,...,T, i =,2,..., which are idetically distributed as {X t } t=0,...,t. The we use these so-called Mote Carlo samples to geerate recursively data to estimate q t by usig the regressio represetatio give i 7. We start with ˆq,T x = 0 x IR d. Fix t {0,,...,T }. Give a estimate ˆq,t+ of q t+, we estimate i=0 q t x = E{max{f t+ X t+,q t+ X t+ } X t = x} by applyig a eural etworks regressio estimate to a approximative sample of With the otatio X t,max{f t+ X t+,q t+ X t+ }. Ŷ t i,t = max{f t+ X t i,t+, ˆq,t+X t i,t+ } where we have suppressed the depedecy of Ŷ t i,t give by { X t i,t 8 o this approximative sample is } t,ŷ i,t : i =,...,. 9 Observe that this sample depeds o the t-th sample of {X s } s=0,...,t ad ˆq,t+, i.e., for each time step t we use a ew sample of the stochastic process {X s } s=0,...,t i order to defie our data 9. To choose parameter k of the eural etworks regressio estimate fully automatically we use splittig of the sample. Thus we subdivide 9 i a learig sample of size l = /2 ad a testig sample of size t = l ad defie for a give k P = {,...,} a regressio estimate of q t by ˆq k l,t = arg mi f F k β l fx t i,t Ŷ t i,t 2, 0 l i= 6

where z = arg mi x D fx is a abbreviatio for z D ad fz = mi x D fx. Here we assume for simplicity that the above miima exist, however we do ot require them to be uique. The we miimize the empirical L 2 risk o the testig sample i order to choose the value of parameter k. So we choose ˆk = arg mi k P t i= l + ad defie our fial eural etworks regressio estimate of q t by ˆq k l,tx t i,t Ŷ t i,t 2 ˆq,t x = ˆqˆk l,t x x IRd. 2 3 Theoretical results We say that a = O P b if lim sup Pa > c b = 0 for some fiite costat c. Our mai theoretical result is the followig theorem. Theorem Let L > 0. Assume that X 0,X,...,X T is a IR d -valued Markov process ad that the discouted payoff fuctio f t is bouded i absolute value by L. Defie the estimate ˆq,t by 0, ad 2 for some β > 0. Let k P ad assume that k,β satisfy β, k, β 4 k log 0. The ˆq,t x q t x 2 P Xt dx = O P β 4 k log + max if s {t,t+,...,t } f F k β fx q t x 2 P Xs dx for all t {0,,...,T }. As a first cosequece we get cosistecy of the estimate. Corollary Let L > 0. Assume that X 0,X,...,X T is a IR d -valued Markov process ad that the discouted payoff fuctio f t is bouded i absolute value by L, i.e., f t x L for x IR d ad t {0,,...,T }. 3 7

Defie the estimate ˆq,t by 0, ad 2. Let β > 0 ad assume that β satisfies β, β 4 log 0. The ˆq,t x q t x 2 P Xt dx 0 i probability for all t {0,,...,T }. Proof of Corollary. Because of the coditios of Corollary we ca choose k P such that k ad β4 k log ˆq,t x q t x 2 P Xt dx = O P β 4 k log + max 0. By Theorem we get if s {t,t+,...,t } f F k β fx q t x 2 P Xs dx for all t {0,,...,T }. Coditio 3 implies that q t is bouded, hece we get by Lemma 6.2 i Györfi et al. 2002 max if s {t,t+,...,t } f F k β fx q t x 2 P Xs dx 0. The above corollary shows that the L 2 error of our estimate coverges to zero i probability for sample size of the Mote Carlo sample tedig to ifiity. I view of a applicatio with ecessarily fiite sample size it would be ice to kow how quickly the error coverges to zero for sample size tedig to ifiity. It is well-kow i oparametric regressio that assumptios o the uderlyig distributio, i particular o the smoothess of the regressio fuctio, are ecessary i order to be able to derive o-trivial rates of covergece results see, e.g., Cover 968, Devroye 982 or Chapter 3 i Györfi et al. 2002. For our eural etworks estimate we restrict the smoothess of the cotiuatio values by imposig costraits o their Fourier trasformatio see below. I additio we assume that the stochastic process is bouded. Usually i modellig of fiacial processes oe models them by ubouded processes. I this case we choose a large value A > 0 ad replace X t by its bouded approximatio X A t = X mi{t,τa } where τ A = if{s 0 : X s / [ A,A] d }. 8

Here we assume for simplicity that the stochastic process has cotiuous paths i order to be able to eglect a additioal trucatio of X A t. This boudedess assumptio eables us to estimate the price of the America optio from samples of polyomial size i the umber of free parameters, i cotrast to Mote Carlo estimatio from stadard ubouded Black-Scholes models, where Glasserma ad Yu 2004 showed that samples of expoetial size i the umber of free parameters are eeded. Next we aalyze the rate of covergece of the estimate. To this ed we eed to itroduce the class of fuctios havig Fourier trasform with the first absolute momet fiite. The Fourier trasform F of a fuctio f L IR d is defied by Fω = 2π d/2 e iωt x fxdx ω IR d. IR d If F L IR d the the iverse formula fx = 2π d/2 IR d e iωt x Fωdω 4 holds almost everywhere with respect to the Lebesgue measure. Let 0 < C < ad cosider the class of fuctios F C for which 4 holds o IR d ad, i additio, IR d ω Fωdω C. 5 A class of fuctios satisfyig 5 is a subclass of fuctios with Fourier trasform havig first absolute momet fiite, i.e., d IR ω Fωdω < these fuctios are cotiuously differetiable o IR d. The ext corollary provides the rate of covergece of the estimate. Corollary 2 Let L > 0. Assume that X 0,X,...,X T is a IR d -valued Markov process, X t [ A,A] d almost surely for some A > 0 ad all t {0,,...,T }, that the discouted payoff fuctio f t is bouded i absolute value by L, i.e., f t x L for x IR d ad t {0,,...,T }, ad that the Fourier trasform Q t of q t satisfies 4 ad 5 for all x IR d ad all t {0,...,T }. Let β = cost log ad defie the estimate ˆq,t by 0, ad 2. The log ˆq,t x q t x 2 5 /2 P Xt dx = O P for all t {0,,...,T }. 9

Proof of Corollary 2. Set k = /2. log 5 From Lemma 6.8 i i Györfi et al. 2002 we have for sufficietly large max if s {t,t+,...,t } f F k β The Theorem implies fx q t x 2 P Xs dx 2 dac 2 k. ˆq,t x q t x 2 P Xt dx = O P β 4 k log β 4 = O k log P = O P log 5 + max if s {t,t+,...,t } f F k β + 2 dac 2 k fx q t x 2 P Xs dx for all t {0,,...,T }. Remark. Assume X 0 = x 0 a.s. for some x 0 IR. We ca estimate the price cf., ad 5 of the America optio by V 0 = max{f 0 x 0,q 0 x 0 } ˆV 0 = max{f 0 x 0, ˆq,0 x 0 }. Sice the distributio of X 0 is cocetrated o x 0, uder the assumptios of Corollary 2 we have the followig error boud: ˆV 0 V 0 2 = max{f 0 x 0, ˆq,0 x 0 } max{f 0 x 0,q 0 x 0 } 2 ˆq,0 x 0 q 0 x 0 2 log 5 /2 = O P. 4 Applicatio to simulated data I this sectio, we illustrate the fiite sample behavior of our algorithm by comparig it with the Tsitsiklis Va Roy algorithm ad Logstaff Schwartz algorithm proposed by Tsitsiklis ad Va Roy 999 ad Logstaff ad Schwartz 200, respectively. 0

We simulate the paths of the uderlyig stocks with a simple Black-Scholes model. The time to maturity is assumed to be oe year. We discretize the time iterval [0,] by dividig it ito m equidistat time steps with t 0 = 0 < t <... < t m =. I the first two examples we cosider a optio o a sigle stock. The prices of the uderlyig stock at time poits t j j = 0,...,m are the give by X i,tj = x 0 exp r /2 σ 2 t j + σ W tj i =,...,,j = 0,...,m. We choose x 0 = 00, r = 0.05, m = 2 ad discout factors e rt j for j = 0,...,m. For our algorithm we use sample size of 2000 while for the other algorithms sample size of 0000. For our algorithm we set the umber of learig ad traiig samples to l = t = 000. To simplify the implemetatio we select the k hidde euros by sample splittig as described i Sectio 2 from the set {2 0,2,...,2 5 }. The eural etworks least squares estimate is computed approximately by backpropagatio i.e., by gradiet descet. For the Logstaff Schwartz ad Tsitsiklis Va Roy algorithms we use polyomials of degree 3 i the oe-dimesioal case ad degree i the high-dimesioal case, sice these choices yield the best results. We apply all three algorithms to 00 idepedetly geerated sets of paths. We would like to stress that all three algorithms provide lower bouds to the optimal stoppig value. Sice we evaluate the approximative optimal stoppig rule o ewly geerated data, a higher MCE idicates a better performace of the algorithm. We compare the algorithms usig boxplots. Observe that the higher the boxplot of the MCE the better the performace of the correspodig algorithm. I our first example we aalyze a stadard put-payoff with exercise price 90 as illustrated i Figure, ad simulate the paths of the uderlyig stock with a volatility of σ = 0.25. As we ca see from Figure 2, our algorithm is slightly better tha the Logstaff Schwartz algorithm ad comparable to the algorithm of Tsitsiklis Va Roy. This is ot surprisig, sice it is well kow that for simple payoff fuctios the Logstaff Schwartz as well as the Tsitsiklis Va Roy algorithms perform very well. I our secod example we make the pricig problem more difficult. We cosider m = 48 time steps, a stragle spread payoff with strikes 50, 90, 0 ad 50 as illustrated i Figure 3, ad a large volatility of σ = 0.5. Figure 4 shows that our algorithm provides a higher

00 90 80 70 60 50 40 30 20 0 0 0 50 00 50 200 Figure : Put-payoff with exercise price 90. 4.3 4.3 4.3 4.2 4.2 4.2 4. 4. 4. 4 4 4 Values 3.9 Values 3.9 Values 3.9 3.8 3.8 3.8 3.7 3.7 3.7 3.6 3.6 3.6 3.5 price TR 3.5 price LS 3.5 price KKT Figure 2: Boxplots for the realized optio prices for the put-payoff of the Tsitsiklis Va Roy price TR, Logstaff Schwartz price LS algorithms ad our algorithm price KKT. I the boxplot the box stretches from the 25th percetile to the 75th percetile ad the media is show as a lie across the box. 2

50 45 40 35 30 25 20 5 0 5 0 0 50 00 50 200 Figure 3: Stragle spread payoff with strike prices 50, 90, 0 ad 50. 27.5 27.5 27.5 27 27 27 26.5 26.5 26.5 26 26 26 Values 25.5 Values 25.5 Values 25.5 25 25 25 24.5 24.5 24.5 24 24 24 23.5 price TR 23.5 price LS 23.5 price KKT Figure 4: Realized optio prices for the stragle spread-payoff of the Tsitsiklis Va Roy price TR, Logstaff Schwartz price LS ad our algorithm price KKT i a - dimesioal case. MCE of the optio price tha Logstaff Schwartz ad Tsitsiklis Va Roy algorithms. Fially, i our third example we cosider the high-dimesioal case ad use for the pricig problem a stragle spread fuctio with strikes 75, 90, 0 ad 25 for the average of five correlated stock prices. The stocks are ADECCO R, BALOISE R, CIBA, CLARI- ANT ad CREDIT SUISSE R. The stock prices were observed from Nov. 0, 2000 util Oct. 3, 2003 o weekdays whe the stock market was ope for the total of 756 days. We 3

2 2 2.8.8.8.6.6.6 Values.4 Values.4 Values.4.2.2.2 0.8 price TR 0.8 price LS 0.8 price KKT Figure 5: Realized optio prices for the stragle spread-payoff of the Tsitsiklis Va Roy price TR, Logstaff Schwartz price LS ad our algorithm price KKT i a 5- dimesioal case. estimate the volatility from data observed i the past by the historical volatility 0.3024 0.354 0.0722 0.367 0.64 0.354 0.2270 0.063 0.264 0.60 σ = 0.0722 0.063 0.077 0.0884 0.0699. 0.367 0.264 0.0884 0.2937 0.394 0.64 0.60 0.0699 0.394 0.2535 Agai we used x 0 = 00, r = 0.05 ad m = 48. As we ca see i Figure 5, our algorithm is superior to Logstaff Schwartz ad Tsitsiklis Va Roy algorithms, sice the higher boxplot of the MCE agai idicates better performace. 5 Proofs 5. Auxiliary results I the sequel we formulate auxiliary results which will be eeded i the derivatio of the rate of covergece. We start by defiig so-called coverig umbers: Let x,...,x IR d ad set x = x,...,x. Defie the distace d 2 f,g betwee 4

f,g : IR d IR by d 2 f,g = /2 fx i gx i 2. i= A ǫ-cover of F w.r.t. the distace d 2 is a set of fuctios f,...,f κ : IR d IR with the property mi d 2f,f j < ǫ for all f F. j κ Let N 2 ǫ, F,x deote the size κ of the smallest ǫ-cover of F w.r.t. the distace d 2, ad set N 2 ǫ, F,x = if there does ot exist ay ǫ-cover of F of a fiite size. N 2ǫ, F,x is called L 2 -ǫ-coverig umber of F o x. I the appedix we will prove the followig boud o the coverig umber of F k β, where F k β is defied by 8. Lemma Let F k β be defied by 8, let ǫ > 0 ad let x IRd. The N 2 ǫ, F k β,x 2eβ k + 4d+9k+. 6 ǫ I the proof we will use results cocerig regressio estimatio i case of additioal measuremet errors i the depedet variable, which we describe i the sequel. Let X,Y,X,Y,... be idepedet ad idetically distributed IR d IR valued radom variables with EY 2 <. Let mx = E{Y X = x} be the correspodig regressio fuctio. Assume that we wat to estimate m from observed data, but istead of a sample D = {X,Y,...,X,Y } of X,Y we have oly available a set of data D = {X,Ŷ,,...,X,Ŷ,} where the oly assumptio o Ŷ,,...,Ŷ, is that the measuremet error Y i Ŷi, 2 7 i= is small. I particular we do ot assume that the radom variables i D are idepedet or idetically distributed. I the sequel we are iterested i the ifluece of the measuremet error 7 o the L 2 error of a regressio estimate applied to the data D. 5

As we do ot assume aythig o the differece betwee the true y-values Y i ad the observed values Ŷi, besides the assumptio that 7 is small, it is clear that there is o chace to get rid of this measuremet error completely. But a atural cojecture is that a small measuremet error 7 does oly slightly ifluece the L 2 error of suitably defied regressio estimates. That this cojecture is ideed true was prove for the least squares estimates i Kohler 2006a. Next we describe the part of this result, which will be eeded i the proof of our mai result. Assume Y i,ŷi, [ L,L] a.s. i =,..., ad defie the estimate m by m = arg mi f F fx i Ŷi, 2, where F is a set of fuctios f : IR d IR. The the followig result holds. Lemma 2 Assume that Y mx is sub-gaussia i the sese that { } C 2 E e Y mx2 /C 2 X σ0 2 almost surely 8 for some C,σ 0 > 0. Let β,l ad assume that the regressio fuctio is bouded i absolute value by L ad that β satisfies β. Let F be a set of fuctios f : IR d [ β,β ] ad defie the estimate m as above. The there exist costats c,c 2,c 3 > 0 depedig oly o σ 0 ad C such that for ay δ which satisfies ad c δ β 2 δ c 2 δ/β 2 δ 0 ad i= δ β 2 i= /2 u log N 2, {f g : f F, fx i gx i 2 δ 4β β 2 },x du for all δ δ, all x,...,x IR d ad all g F {m} we have { P m x mx 2 µdx > } c 3 Y i Ŷi, 2 + δ + if fx mx µdx 2 0 f F i= 6

for. Proof. See proof of Theorem i Kohler 2006a ad observe that we ca assume β L sice β for. The above lemma eables us to aalyze the rate of covergece of the estimate for fixed fuctio space. Next we explai how we ca use the data to choose a appropriate fuctio space from a fiite collectio {F,k : k P } of fuctio spaces. To do this we split the sample ito a learig sample ˆD l = { } X,Ŷ,,...,X l,ŷ l, of size l = /2 ad a testig sample { } X l +,Ŷ l +,,...,X,Ŷ, of size t = l. For fixed k P we use the learig sample to defie a estimate m k l by m k l l = arg mi fx i Ŷi, 2 f F,k l Next we choose ˆk P by miimizig the empirical L 2 risk o the testig sample, i.e., we set where The the followig result holds. i= m x = mˆk l x x IR d, ˆk = arg mi k P t i= l + m k l X i Ŷ i, 2. Lemma 3 Assume that Y mx is sub-gaussia i the sese that 8 holds for some C,σ > 0 ad assume P. Assume furthermore that coditioed o X,...,X the data sets ˆD l ad {Y l +,...,Y } 7

are idepedet. Let for each k P a set F,k of fuctios f : IR d IR be give ad let the estimate m be defied as above. The t i= l + = O P m X i mx i 2 log P t + t i= l + Y i Ŷi, 2 + mi k P t i= l + m k l X i mx i 2. Proof. The results follows by applyig Lemma 2 i Kohler 2006a coditioed o ˆD l ad X,...,X ad with F = {m k l : k P }. Here we boud the coverig umber by the fiite cardiality P of the set of estimates. 5.2 Proof of Theorem Before we start with the proof, observe that the boudedess of the discouted payoff fuctio f t by L implies q t x L for x IR d. W.l.o.g. assume β L sice β for tedig to ifiity. I the sequel we will show ˆq,s x q s x 2 P Xs dx = O P β 4 k log + max if t {s,s+,...,t } f F k β fx q t x 2 P Xt dx 9 for all s {0,,...,T }. For s = T we have ˆq,T x = 0 = q T x, so the assertio is trivial. So let t < T ad assume that the assertio holds for s {t +,...,T }. By iductio it suffices to show 9 for s = t, which we will show i the sequel i seve steps. I the first step of the proof we show ˆq,t x q t x 2 P Xt dx = O P t Let D,t be the set of all X r j,s i= l + ˆq,t X t i,t q tx t i,t 2 + β4 log P. with either r t +,s {0,...,T } ad j {,...,} or r = t,s {0,...,T } ad j {,..., l }. Coditioed o D,t, 8 t

{ˆq k l,t : k P } cosists of P differet fuctios. Furthermore, because of boudedess of ˆq k l,t ad q t by β we have σ 2 k := Var{ ˆq k l,tx t l +,t q tx t l +,t 2 D,t } E{ ˆq k l,t Xt 4β 2 l +,t q tx t l +,t 4 D,t } ˆq k l,tx q t x 2 P Xt dx. Usig this ad the Berstei iequality cf., e.g., Lemma A.2 i Györfi et al. 2002 we get usig the otatio ǫ = c 4 β 4 log P / t : P{ ˆq,t x q t x 2 P Xt dx > 4β 2 + t P max k P P{ ˆq k l,tx q t x 2 P Xt dx > 4β 2 + t i= l + i= l + ˆq,t X t i,t q tx t i,t 2 + ǫ D,t } ˆq k l,t Xt i,t q tx t i,t 2 + ǫ D,t } P max P{ ˆq k k P l,tx q t x 2 P Xt dx + 4β 2 ˆq k l,tx q t x 2 P Xt dx > 4β 2 + t ˆq k l,tx t i,t q tx t i,t 2 + ǫ + σk 2 D,t} i= l + P max k P P{ ˆq k l,tx q t x 2 P Xt dx t P max «exp t σ 2 2 k +ǫ 4β 2+ k P 2σk 2+2 σ2 k +ǫ 4β 2 4β2 + 3 P max k P exp P exp 0 tσ 2 k +ǫ 24β 2 + 2 +24β 2 + 4β2 3 c 4 i= l + > 4β 2 + σ 2 k + ǫ D,t } 24β 2 + 2 +24β 2 + 4β2 3 β 4 log P provided we choose c 4 sufficietly large. ˆq k l,tx t i,t q tx t i,t 2 9

I the secod step of the proof we show t ˆq,t X t i,t q tx t i,t 2 i= l + = O P ˆq,t+ X t i,t+ q t+x t i,t+ 2 + t i= l + + mi k P t i= l + q k l,t Xt i,t q tx t i,t 2. To do this we apply Lemma 3. I the cotext of Lemma 3 we have log P t X i = X t i,t,y i = max{f t+ X t i,t+,q t+x t i,t+ } ad Ŷi, = max{f t+ X t i,t+, ˆq,t+X t i,t+ }. Observig t i= l + Y i Ŷi, 2 t i= l + q t+ X t i,t+ ˆq,t+X t i,t+ 2 the assertio follows from Lemma 3 if we apply it coditioed o D,t. t Usig I the third step of the proof we show i= l + P{ t i= l + = P{ t ˆq,t+ X t i,t+ q t+x t i,t+ 2 = O P ˆq,t+ x q t+ x 2 P Xt+ dx + β4 log P t. i= l + ˆq,t+ X t i,t+ q t+x t i,t+ 2 > 4β 2 + ˆq,t+ x q t+ x 2 P Xt+ dx + ǫ D,t } ˆq,t+ X t i,t+ q t+x t i,t+ 2 ˆq,t+ x q t+ x 2 P Xt+ dx 20 > 4β 2 ˆq,t+ x q t+ x 2 P Xt+ dx + ǫ D,t } this follows as i the first step by a applicatio of the Berstei iequality. I the fourth step of the proof we show mi k P t i= l + ˆq k l,t Xt i,t q tx t i,t 2 = O P ˆq k l,t x q tx 2 β4 P Xt dx + log P t. To see this, we observe that we have as i the third step of the proof t i= l + ˆq k l,t Xt i,t q tx t i,t 2 = O P ˆq k l,t x q tx 2 P Xt dx + β4 log P t, hece the assertio follows from mi k P t i= l + ˆq k l,t Xt i,t q tx t i,t 2 t 20 i= l + ˆq k l,t Xt i,t q tx t i,t 2.

I the fifth step of the proof we show ˆq k l,t x q tx 2 P Xt dx l = O P l i= Y i Ŷi, 2 + δ + if f Fk β fx qt x 2 P Xt dx where δ = c 6 β 4 k log/ ad Y i ad Ŷi, are defied i 20. To do this we show that with this choice of δ the coditios of Lemma 2 are satisfied. Observe that Y is bouded i absolute value by L, hece 8 holds. By Lemma we get for g F k β {q t } u { N 2, f g : f F k β, fx i gx i 2 δ } 4β β 2,X i= u N 2, F k β,x 4β 48eβ 2 4d+9k+ k +, u, thus δ c 2 δ/β 2 δ u { {log N 2, f g : f F k β, 4β i= { } 48eβ 2 4d+9k+ /2 log k + du. u c 2 δ/β 2 } fx i gx i 2 δ } /2 β 2,X du Let δ > /. The by boudig u from below by c 2 δ/β 2 ad usig costat c 5 > 0 we get δ c 2 δ/β 2 δ { } 48eβ 2 4d+9k+ /2 log k + du u { } 48eβ 4 4d+9k+ /2 log k + du c 2 δ/β 2 c 2 δ { } 48eβ 4 4d+9k+ /2 log k + du δ c 2 δ/β 2 c 5 δ k log /2. c 2 2

This together with shows that satisfies the coditio of Lemma 2. c δ β 2 δ c 6 β 4 k I the sixth step of the proof we show l l i= First we observe To show l l i= c 5 δ k log log δ := c 6 β 4 k log Y i Ŷi, 2 = O P ˆq,t+ x q t+ x 2 P Xt+ dx + l l Y i Ŷi, 2 l ˆq,t+ X t i,t+ q t+x t i,t+ 2. l i= i= β4 log P l. ˆq,t+ X t i,t+ q t+x t i,t+ 2 = O P ˆq,t+ x q t+ x 2 P Xt+ dx + β4 log P l we coditio o all data poits X r j,s with r t +,s {0,...,T } ad j {,...,}. The the assertio follows by a applicatio of Berstei iequality as i steps ad 3. I the seveth ad last step of the proof we observe that we get by iductio ˆq,t+ x q t+ x 2 P Xt+ dx β = O k log 4 P + max s {t+,...,t } if f Fk β fx qt x 2 P Xs dx. We complete the proof by gatherig the above results. 6 Appedix Lemma 4 Let F ad G be two families of real fuctios o IR m. If F G deotes the set of fuctios {f + g : f F,g G}, the for ay z IR m ad ǫ,δ > 0, we have N 2 ǫ + δ, F G,z N 2ǫ, F,z N 2δ, G,z. 22

Proof of Lemma 4. Let {f,...,f K } ad {g,...,g Λ } be a ǫ-cover ad a δ-cover of F ad G, respectively, o z κ {,..., K} ad λ {,...,Λ} such that of miimal size. The, for every f F ad g G, there exist ad /2 fz i f κ z i 2 < ǫ i= /2 gz i g λ z i 2 < δ. i= By the triagle iequality for orms we have /2 fz i + gz i f κ z i g λ z i 2 i= ǫ + δ /2 fz i f κ z i 2 + i= /2 gz i g λ z i 2 i= which proves that {f κ + g λ : κ K, λ Λ} is a ǫ + δ-cover of F G o z. Lemma 5 Let F ad G be two families of real fuctios o IR m such that fx M ad gx M 2 for all x IR m, f F, g G. If F G deotes the set of fuctios {f g : f F,g G} the, for ay z IR m ad ǫ,δ > 0 we have N 2 ǫ + δ, F G,z N 2ǫ/M 2, F,z N 2δ/M, G,z. Proof of Lemma 5. Let {f,...,f K } ad {g,...,g Λ } be a ǫ/m 2 -cover ad a δ/m - cover of F ad G, respectively, o z of miimal size. By the boudedess of f ad g we ca assume w.l.o.g. f κ z M, g λ z M 2. For every f F ad g G, there exist κ {,..., K} ad λ {,...,Λ} such that ad /2 fz i f κ z i 2 i= < ǫ M 2 /2 gz i g λ z i 2 < δ. M i= 23

We have, by the triagle iequality of orms /2 fz i gz i f κ z i g λ z i 2 i= = M 2 ǫ + δ /2 fz i g λ z i + gz i g λ z i f κ z i g λ z i 2 i= /2 g λ z i fz i f κ z i 2 + i= /2 fz i f κ z i 2 + M i= /2 fz i gz i g λ z i 2 i= /2 gz i g λ z i 2 i= which implies that {f κ g λ : κ K, λ Λ} is a ǫ + δ-cover of F G o z. Proof of Lemma : Defie the followig classes of fuctios: G = {a T x + b : a IR d,b IR}, G 2 = {σa T x + b : a IR d,b IR}, G 3 = {c σa T x + b : a IR d,b IR,c [ β,β ]}, where σ : IR [0,] is a sigmoid fuctio i.e. σ is a odecreasig fuctio with the property lim x σx = 0 ad lim x σx = ad β > 0. G is a liear vector space of dimesio d+, thus Theorem 9.5 i Györfi et al. 2002 implies V G + d + 2, where G + deotes the set G + = {{z,t IR d IR,t gz};g G} for all subgraphs of fuctios of G ad V G + is the so-called VC-dimesio of G + see Györfi et al. 2002, Defiitio 9.6. Sice σ is a odecreasig fuctio, Lemma 6.3 i Györfi et al. 2002 yields V G + 2 d + 2. 24

Thus, by Theorem 9.4 i Györfi et al. 2002, we have for 0 < ǫ < /4 N 2 ǫ, G 2,x M 2ǫ, G 2,x 2e 3e 3 log ǫ2 ǫ 2 3e 2d+4 3 ǫ 2. d+2 By Lemma 5 we have for 0 < ǫ/2β < /4 or equivaletly 0 < ǫ < β /2 N 2 ǫ, G 3,x ǫ ǫ N 2 2, {c : c β },x N 2, G 2,x 2β 2β ǫ/2 3 3e 2d+4 ǫ/2β 2 4d+9 2eβ. ǫ By applyig Lemma 4 we obtai for 0 < ǫ < k + β /2 N 2 ǫ, F k β,x k ǫ ǫ N 2 k +, {c 0 : c 0 β },x N 2 k +, G 3,x 2β k + 2eβ k + 4d+9k ǫ ǫ 2eβ k + 4d+9k+. 2 ǫ By boudedess of F β, the proof is trivial for ǫ k+ β /2 β, which completes the proof. Refereces [] Barty, K., Girardeau, P., Roy, J.-S., ad Strugarek, C. 2006. Applicatio of kerelbased stochastic gradiet algorithms to optio pricig. Preprit. [2] Breima, L., Friedma, J. H., Olshe, R., ad Stoe, C. J. 984. Classificatio ad regressio trees. Wadsworth, Belmot, CA. 25

[3] Chow, Y. S., Robbis, H., ad Siegmud, D. 97. Great Expectatios: The Theory of Optimal Stoppig. Houghto Miffli, Bosto. [4] Cover, T. 968. Rates of covergece for earest eighbor procedures. I: Proceedigs of the Hawaii Iteratioal Coferece o System Scieces, pp. 43-45, Hoolulu, HI. [5] Devroye, L. 982. Necessary ad sufficiet coditios for the almost everywhere covergece of earest eighbor regressio fuctio estimates. Zeitschrift für Wahrscheilichkeitstheorie ud verwadte Gebiete 6, pp. 467-48. [6] Egloff, D. 2005. Mote Carlo Algorithms for Optimal Stoppig ad Statistical Learig. Aals of Applied Probability 5, pp. -37. [7] Egloff, D., Kohler, M., ad Todorovic, N. 2006. A dyamic look-ahead Mote Carlo algorithm for pricig America optios. Submitted for publicatio. [8] Elliott, R. J., ad Kopp, P. E. 999. Mathematics of Fiacial Markets. Spriger. [9] Frake, J., ad Diage, M. 2002. Estimatig market risk with eural etworks. Techical report, Uiversity of Kaiserslauter. To appear i Statistics & Decisio 2006/07. [0] Glasserma, P., ad Yu, B. 2004. Number of paths versus umber of basis fuctios i America optio pricig. Aals of Applied Probability 4, pp. -30. [] Györfi, L., Kohler, M., Krzyżak, A., ad Walk, H. 2002. A Distributio-Free Theory of Noparametric Regressio. Spriger Series i Statistics, Spriger. [2] Kohler, M. 2006a. Noparametric regressio with additioal measuremet errors i the depedet variable. Joural of Statistical Plaig ad Iferece 36, pp. 3339-336. [3] Kohler, M. 2006b. A regressio based smoothig splie Mote Carlo algorithm for pricig America optios. Submitted for publicatio. [4] Logstaff, F. A., ad Schwartz, E. S. 200. Valuig America optios by simulatio: a simple least-squares approach. Review of Fiacial Studies 4, pp. 3-47. 26

[5] Shiryayev, A. N. 978. Optimal Stoppig Rules. Applicatios of Mathematics, Spriger Verlag. [6] Stoe, C.J. 994. The use of polyomial splies ad their tesor products i multivariate fuctio estimatio. Aals of Statistics 22, pp. 8 84. [7] Tsitsiklis, J. N., ad Va Roy, B. 999. Optimal stoppig of Markov processes: Hilbert space theory, approximatio algorithms, ad a applicatio to pricig highdimesioal fiacial derivatives. IEEE Tras Autom. Cotrol 44, pp. 840-85. 27