Portfolio Optimization for Options

Similar documents
Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

Overlapping Generations

Subject CT1 Financial Mathematics Core Technical Syllabus

5. Best Unbiased Estimators

Productivity depending risk minimization of production activities

Anomaly Correction by Optimal Trading Frequency

AY Term 2 Mock Examination

Statistics for Economics & Business

Online appendices from The xva Challenge by Jon Gregory. APPENDIX 10A: Exposure and swaption analogy.

CAPITAL PROJECT SCREENING AND SELECTION

An Empirical Study of the Behaviour of the Sample Kurtosis in Samples from Symmetric Stable Distributions

Hopscotch and Explicit difference method for solving Black-Scholes PDE

Faculdade de Economia da Universidade de Coimbra

Models of Asset Pricing

Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

CAPITAL ASSET PRICING MODEL

Models of Asset Pricing

Linear Programming for Portfolio Selection Based on Fuzzy Decision-Making Theory

CHAPTER 2 PRICING OF BONDS

Models of Asset Pricing

The Time Value of Money in Financial Management

A New Approach to Obtain an Optimal Solution for the Assignment Problem

Maximum Empirical Likelihood Estimation (MELE)

FINM6900 Finance Theory How Is Asymmetric Information Reflected in Asset Prices?

1 Random Variables and Key Statistics

Monetary Economics: Problem Set #5 Solutions

ECON 5350 Class Notes Maximum Likelihood Estimation

STRAND: FINANCE. Unit 3 Loans and Mortgages TEXT. Contents. Section. 3.1 Annual Percentage Rate (APR) 3.2 APR for Repayment of Loans

Proceedings of the 5th WSEAS Int. Conf. on SIMULATION, MODELING AND OPTIMIZATION, Corfu, Greece, August 17-19, 2005 (pp )

Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

Unbiased estimators Estimators

The Valuation of the Catastrophe Equity Puts with Jump Risks

Success through excellence!

SUPPLEMENTAL MATERIAL

DESCRIPTION OF MATHEMATICAL MODELS USED IN RATING ACTIVITIES

Estimating Proportions with Confidence

Sampling Distributions and Estimation

Appendix 1 to Chapter 5

1 Estimating sensitivities

A New Constructive Proof of Graham's Theorem and More New Classes of Functionally Complete Functions

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

of Asset Pricing R e = expected return

Optimizing of the Investment Structure of the Telecommunication Sector Company

APPLICATION OF GEOMETRIC SEQUENCES AND SERIES: COMPOUND INTEREST AND ANNUITIES

of Asset Pricing APPENDIX 1 TO CHAPTER EXPECTED RETURN APPLICATION Expected Return


First determine the payments under the payment system

ad covexity Defie Macaulay duratio D Mod = r 1 = ( CF i i k (1 + r k) i ) (1.) (1 + r k) C = ( r ) = 1 ( CF i i(i + 1) (1 + r k) i+ k ) ( ( i k ) CF i

Forecasting bad debt losses using clustering algorithms and Markov chains

Hannan and Blackwell meet Black and Scholes: Approachability and Robust Option Pricing

. (The calculated sample mean is symbolized by x.)

Economic Computation and Economic Cybernetics Studies and Research, Issue 2/2016, Vol. 50

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

0.07. i PV Qa Q Q i n. Chapter 3, Section 2

Chapter Four Learning Objectives Valuing Monetary Payments Now and in the Future

FOUNDATION ACTED COURSE (FAC)

Chapter Four 1/15/2018. Learning Objectives. The Meaning of Interest Rates Future Value, Present Value, and Interest Rates Chapter 4, Part 1.

1 + r. k=1. (1 + r) k = A r 1

We learned: $100 cash today is preferred over $100 a year from now

Institute of Actuaries of India Subject CT5 General Insurance, Life and Health Contingencies

A Technical Description of the STARS Efficiency Rating System Calculation

Limits of sequences. Contents 1. Introduction 2 2. Some notation for sequences The behaviour of infinite sequences 3

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

Course FM/2 Practice Exam 1 Solutions

KEY INFORMATION DOCUMENT CFD s Generic

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

Parametric Density Estimation: Maximum Likelihood Estimation

Control Charts for Mean under Shrinkage Technique

Chapter 8: Estimation of Mean & Proportion. Introduction

The material in this chapter is motivated by Experiment 9.

NPTEL DEPARTMENT OF INDUSTRIAL AND MANAGEMENT ENGINEERING IIT KANPUR QUANTITATIVE FINANCE END-TERM EXAMINATION (2015 JULY-AUG ONLINE COURSE)

Decision Science Letters

14.30 Introduction to Statistical Methods in Economics Spring 2009

1 The Power of Compounding

SETTING GATES IN THE STOCHASTIC PROJECT SCHEDULING PROBLEM USING CROSS ENTROPY

Chapter 11 Appendices: Review of Topics from Foundations in Finance and Tables


0.1 Valuation Formula:

Stochastic Processes and their Applications in Financial Pricing

III. RESEARCH METHODS. Riau Province becomes the main area in this research on the role of pulp

On Regret and Options - A Game Theoretic Approach for Option Pricing

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

Valuing Real Options in Incomplete Markets

Calculation of the Annual Equivalent Rate (AER)

Subject CT5 Contingencies Core Technical. Syllabus. for the 2011 Examinations. The Faculty of Actuaries and Institute of Actuaries.

Optimal Risk Classification and Underwriting Risk for Substandard Annuities

Non-Inferiority Logrank Tests

Bayes Estimator for Coefficient of Variation and Inverse Coefficient of Variation for the Normal Distribution

We analyze the computational problem of estimating financial risk in a nested simulation. In this approach,

Chapter 5: Sequences and Series

MS-E2114 Investment Science Exercise 2/2016, Solutions

Introduction to Financial Derivatives

living well in retirement Adjusting Your Annuity Income Your Payment Flexibilities

Estimating Forward Looking Distribution with the Ross Recovery Theorem

EU ETS Hearing, European Parliament Xavier Labandeira, FSR Climate (EUI)

REVISIT OF STOCHASTIC MESH METHOD FOR PRICING AMERICAN OPTIONS. Guangwu Liu L. Jeff Hong

Portfolio selection problem: a comparison of fuzzy goal programming and linear physical programming

Neighboring Optimal Solution for Fuzzy Travelling Salesman Problem

Multi-Criteria Flow-Shop Scheduling Optimization

Transcription:

Portfolio Optimizatio for Optios Yaxiog Zeg 1, Diego Klabja 2 Abstract Optio portfolio optimizatio for Europea optios has already bee studied, but more challegig America optios have ot We propose approximate dyamic programmig algorithms for both Europea ad America optio portfolio with piecewise liear fuctios ad a iterative progressive hedgig method combied with approximate dyamic programmig usig a quadratic approximatio to Q-values By meas of experimets, we show that our algorithms perform great Keywords: Portfolio optimizatio; Optio portfolio; Approximate dyamic programmig 1 Itroductio Portfolio optimizatio is a classic topic i fiacial egieerig sice the iceptio of the Moder Portfolio Theory by Markowitz (1952) With differet objectives ad costraits, a large body of literature has discussed the optimal capital allocatio to fiacial assets such as stocks ad bods i a portfolio (see Bradt (2010) for a survey) Albeit the wide recogitio that optios ca help complete the market, oly a hadful of papers discuss a specific portfolio cosistig maily or oly of optios Oe may argue that we may apply broad researched portfolio optimizatio methods to obtai the weight allocatio amog optios However, it is already hard to fid a stochastic process that captures the behavior of optio returs exactly, which would ofte hit zero Furthermore, the early exercise feature of America optios makes it more difficult to apply methods of classic portfolio optimizatio to optio portfolios Whe addig optios ito a portfolio, very limited existig literature oly cosiders Europea optios, which are more coveiet to icorporate thaks to its structural simplicity However, with flexible exercise timig, America optios are more flexible ad thus how to model ad solve America optio portfolios is much more challegig Ivestors solve a America optio portfolio i two steps: first determie whe to exercise the optios i absece of all other cosideratios ad the fid the weights However, i a simple example give ext, we show that this is ot ecessarily a good strategy Suppose there are two idepedet America optios i a portfolio, deoted by optio 1 ad optio 2 By the pricig algorithm for America optios i Logstaff ad Schwartz (2001), we ca fid their optimal exercise time The we implemet a portfolio optimizatio for that particular exercise time of 1 PhD Cadidate, Idustrial Egieerig ad Maagemet Scieces, orthwester Uiversity; E-mail: yaxiogzeg2015@uorthwesteredu 2 Professor, Idustrial Egieerig ad Maagemet Scieces; Director, MS i Aalytics, orthwester Uiversity; E-mail: d-klabja@orthwesteredu 1

both optios with a objective to miimize the variace of the portfolio retur based o the Markowitz mea-variace model We have three costraits The first costrait requires the completeess of weights, ie the sum of weights equals to 1 The ext costrait requires that the portfolio returs at least 95% of the average of the maximum of the two optios returs The last oe is the o-short-sellig costrait, which traslates ito oegativity of the two weights We set the iitial values of the uderlyig asset for optio 1 ad 2 to be $20 ad $30 with volatility 02 ad 03, respectively We simulate the uderlyig asset returs usig a geometric Browia motio (GBM) model I a discrete time settig, optios ca oly be exercised at the ed of moth 1, 2,, 6, where moth 6 is the maturity time Their exercise prices are $198 ad $29, respectively, ad the risk free rate is 6% From the pricig algorithm, the optio prices are $08 ad $17, respectively ad the optimal exercise time for both optios is i moth 5 Table 1 presets the variaces of the portfolio retur uder differet combiatios of the exercise time It is apparet that if we exercise both optios i moth 5, we would ot obtai the miimum variace Istead, we should exercise optio 1 i moth 5 ad optio 2 i moth 2 Therefore, the optimal exercise time ca be differet if we cosider the weight ad timig decisios together We should ideed optimize the exercise time ad weights joitly istead of sequetially I fact, this is the motivatio ad goal of our Approximate Dyamic Programmig (ADP) algorithm for America optio portfolios Table 1: Variace of portfolio returs with differet exercise times Moth 1 2 3 4 5 6 1 216 374 334 281 266 257 2 259 518 308 225 208 206 3 493 216 247 188 203 219 4 309 230 192 224 251 280 5 217 167 217 260 293 333 6 187 179 242 292 335 382 I this paper, we desig ADP algorithms for a portfolio of Europea optios ad a portfolio of America optios For a Europea optio portfolio, we apply a piecewise liear approximatio to fid the value fuctio of wealth of each optio ad the obtai the weights for each optio through optimizatio, where the slopes of the piecewise liear fuctios are updated Iteratios are performed to tue the slopes It turs out that such a approach is much better tha Faias ad Sata-Clara (2011), where myopic sigle-period optimizatio is used This is the oly prior work o portfolio optimizatio of Europea optios For America optio portfolios, we cosider two stages: the optimizatio ad evaluatio stage I the optimizatio stage, we employ a iterative progressive hedgig algorithm to fid the weights ad exercise time of all optios at each time period, where the Q-values are approximated by regressio For America optios Q-learig with regressio is more appropriate tha piecewise liear approximatio of value fuctios sice the state space ad actios are discrete (exercise time) Each iteratio of progressive hedgig requires solvig a ADP Particularly, we iclude a pealty term i the myopic problem with approximate Q-values to drive the weights ito covergece sice we eed oly 2

oe set of weights (ad ot weights per period) I the evaluatio stage, we mimic the situatio of realtime trade ad refie whe to exercise the optios for each simulated sample path give the weights from the optimizatio stage The algorithm at this stage is similar to the precedig oe except that the progressive hedgig part is omitted while regressio-based Q-value fuctio approximatio remais The evaluatio stage refies the values of exercise times For this, we have desiged two algorithms I oe we use a modified existig algorithm for exercisig America optios (our modificatio of the existig algorithm is eeded because it caot hadle a portfolio of America optios) The secod algorithm uses a variat of our Q-learig algorithm We compare our algorithms agaist a existig algorithm for Europea optios ad for a small umber of America optios ad time periods with a quasi optimal bechmark algorithm that eumerates all possible weights ad exercise time periods based o perfect iformatio We coclude by meas of a computatio study that our algorithms perform very well To the best of our kowledge, this is the first algorithm for solvig a portfolio of America optios Uder the assumptio of o short sellig, our algorithm for a Europea optio portfolio gives a average retur of 67% over a 30-moth time horizo, while the algorithm from Faias ad Sata-Clara (2011) almost loses all the moey For a America optio portfolio, our algorithm yields a small gap from quasi optimal i a relatively log time horizo, at aroud 10% The mai cotributios of this work are as follows 1 This is the first paper that desigs ADP algorithms for optio portfolios, both Europea ad America 2 It is the first work that adds America optios ito optio portfolio ad explicitly takes the optimal exercise time ito accout cocurretly with weights 3 We develop a o-stadard progressive hedgig algorithm combied with ADP for solvig the uderlyig optio portfolio problem 4 Alog the way, we also exhibit a ew algorithm for fidig exercise times of a portfolio of America optios It sigificatly outperforms a adaptio to the portfolio settig of the oly existig algorithm for fidig the time to exercise a America optio The structure of this paper is as follows Sectio 2 provides a literature review Sectio 3 itroduces the models ad algorithms used for optio portfolios Sectio 4 discusses the umerical results Sectio 5 draws coclusios ad presets future work 2 Literature Review I portfolio theory, Markowitz proposed the mea-variace model, a ituitive method that ca hadle sigle-period models well However, ivestmet is ot simply a oe-period decisio Arrival of ew iformatio or chages i the overall objective ca prompt adjustmets i tradig strategies Thus, a multi-period model is more appropriate to cope with curret complex portfolio optimizatio problems Usually, researchers treat portfolio optimizatio i a cotiuous or discrete time Merto (1969, 1971, 3

1975) first itroduces portfolio choice problems i cotiuous time usig stochastic calculus The cotiuous settig eables to fid a closed-form solutio i some simple cases, for example, Merto (1990) gives a aalytical solutio to a portfolio optimizatio problem with a Browia motio model usig logarithm or power utility fuctios I this work we discuss the discrete-time settig, which ca be formulated as a dyamic programmig problem For a practical perspective, we solve the optio portfolio optimizatio problem i discrete time to circumvet potetial high trasactio costs whe tradig optios cotiuously There are a umber of ADP methods used i the portfolio optimizatio literature, see Birge (2008) for a survey, but oe whe optios are preset I terms of optio portfolios, there is very limited work Sice o existig literature adds America optios ito a portfolio, we oly summarize papers that build portfolios with Europea optios Liu ad Pa (2003) itroduces derivatives ito portfolios comprised with oly primitive assets such as bods ad stocks I a cotiuous time model, they obtai aalytical results ad coclude that optios ca improve the portfolio performace because they ca complete the markets by addig risk factors such as stochastic volatility ad price jumps Ilha et al (2004) builds a portfolio model cosistig of oly oe optio ad oe stock based o stochastic volatility, while our model does ot limit the umber of derivatives ad thus o close form expressios exist By utility-idifferece pricig mechaism, they further attai the optimal static compositio Faias ad Sata-Clara (2011) first formally itroduces the cocept of a optio portfolio They provide a method that models Europea optio portfolios i a discrete settig with a fiite time horizo Although they fid weights for multiple time periods, they solve the problem as a sequece of pure myopic problems They treat each time period as a sigle-period model ad obtai the weights oly for that particular period rather tha cosiderig the etire time horizo I other words, their solutio is myopic (see Sectio 3 for detail) We overcome their weakess by proposig a ADP method Costatiides et al (2012) discusses a optio portfolio costituted to maitai targeted maturity, moeyess ad market beta Their focus is to explai the cross-sectioal variatio of idex optio returs rather tha to improve the portfolio performace Other relevat papers are Joes (2006), Driesse ad Maehout (2013) ad Eraker (2013), who also, to some extet, discuss the role of Europea optios i a portfolio Their perspectives of optimizig portfolios vary, such as put mispricig, portfolio isurace, ad optio tradig These papers cosider optios oly as Europea optios o literature has take America optios ito accout whe costructig portfolios We are the first to add America optios ito portfolios ad particularly deal with the optimal exercise time I this paper, we apply a piecewise liear approximatio i the ADP algorithm for Europea optio portfolios ad least-square recursive regressio i the ADP algorithm for America optio portfolios The geeral idea of these two approximatio methods ca be foud i Powell (2011) As the ame suggests, algorithmic piecewise liear approximatio uses piecewise liear fuctios to approximate value fuctios The mai task is to update the slopes appropriately O the other had, regressio-based approximatios eed to update regressio coefficiets I the America optio algorithm, besides regressio, we itroduce progressive hedgig (PH) to fid weights i the optimizatio stage PH is 4

proposed by Rockafellar ad Wets (1991), which uses a pealty term to lead optimizatio ito covergece See Biachi et al (2009) for a survey of recet applicatios usig PH 3 Models ad Algorithms I this sectio we discuss the Europea ad America optio portfolio separately We build two differet models for each portfolio ad solve them usig ADP algorithms 31 Model ad Algorithm for Europea Optio Portfolio A Europea optio portfolio cosists of a umber of Europea optios that deped o oe or more uderlyig assets It also cotais a risk-free accout that stores part of the total wealth while the remaider is allocated to the optios I geeral, we wat to maximize the utility of the portfolio termial wealth We assume that the time horizo is fiite ad ivestors ca oly trade optios at discrete times t=1, T The maturity time of each optio is oe time period Suppose the umber of optios is, ad they are based o the uderlyig asset whose price A t = ( A 1,t, A 2,t,, A,t ) evolves based o a stochastic process The risk-free asset follows the retur process r 0,t The portfolio is self-fiacig The price of optio i is p i = ( p i,1, p i,2,, p i,t ), ad the strike price is K i = ( K i,1,k i,2,,k i,t ) The the retur of optio i durig time t is simply where( i) + = max( i,0) r i,t = A i,t K i,t p i,t K i,t A i,t p i,t + + if call optio; if put optio, (4) The portfolio strategy over the etire horizo is represeted by w = ( w 1,w 2,,w T ) X = w! T + : w i,t = 1,for every t i=0, where w t = w 0,t,w 1,t,,w,t is the weight of the risk-free asset durig time t ad w i,t,for every i {1,, } is the weight of optio i durig time t i the portfolio Particularly, we do ot allow short sellig ad borrowig total The total wealth of the portfolio thus follows W t+1 retur of the portfolio durig time period t ( ) T Weight w 0,t = W total t r portfolio t, where r portfolio t = r i,t w i,t is the rate of i=0 5

The objective fuctio the reads ad U i ( ) max E U W total 1,W total total 2,,W T w X, wherew t total total = W t ( w 1,,w t 1 ), (5) ( ) is the utility fuctio Clearly, the termial wealth is a fuctio of the weights throughout the horizo We assume a additive utility fuctio i the followig form: T total ( ) = U ( W t ) U W 1 total,w 2 total,,w T total Without loss of geerality, we suppose there are Europea optios based o a sigle uderlyig asset i the followig aalysis I the begiig of each time period, oly oe decisio is made: the weights of wealth allocated to each optio ad the risk-free accout 311 Optimal Optio Portfolio Strategy (OOPS, Faias ad Sata-Clara, 2011) As we have itroduced, OOPS itroduces a simple but ituitive method that solves a portfolio problem explicitly for Europea optios The resultig solutio is ot optimal but we use this term i order to be cosistet with the terms used by Faias ad Sata-Clara We ext summarize their key ideas By simulatig 1 series of uderlyig asset values ad substitutig them ito (4), they first get 1 series of optio returs r i,t for each optio i durig each time period t, ad sample 1,, 1 t=1 { } The from time 1 to the last period T, they perform the followig ucostraied optimizatio to obtai optimal weights for the ext time period where r portfolio, t = r w i,t i,t i=0 max E U W total portfolio t r t w t ( ) max 1 1 w t 1 =1 ( ) U W total portfolio, t r t After the optimizatio, they the update the portfolio wealth ad step forward ito the ext time period ote that they do ot have ay costraits i the optimizatio, which meas the weights ca be egative If the weight of a optio is egative, it correspods to short sellig However, if the weight of the riskfree accout turs egative, it requires et borrowig moey We do ot allow such features i our model, eve though the ca be easily icorporated Aother poit is that their method is myopic By treatig each time period idepedetly, they oly optimize the utility for ext time period istead of the etire time horizo Essetially, their method is a repeatig process that solves multiple sigle-period models 312 ADP Algorithm We ow propose a ADP method that treats problem (5) as a multi-period model This method is a extesio to the portfolio optimizatio method i Powell (2011), 6

We first defie relevat variables Let the wealth of risk free accout ad each optio durig every time period be deoted byw i,t for every i { 0,1,, },t { 1,,T } The decisio variable, actio, is the ( ) m, wealth trasfer x m,,t betwee assets m ad i time period t We deote x t = x m,,t ote that istead of the weights i the geeral model ad OOPS, we defie wealth trasfers to be the decisio variables The weights ca be implied fromw i,t as w i,t = W i,t W t total We also defie the followig post decisio state variables W x i,t = W i,t + x m,i,t After a decisio is made durig some time period t, exogeous iformatioω t+1 = ω t+1 ( A i,t,k i,t, p i,t ) realizes With A i,t, K i,t ad p i,t, we calculate the retur r i,t of optio i by (4) The we update the state variable usig the followig trasitio fuctio m=0 W i,t+1 = r i,t W i,t x The objective is to maximize the total utility over time, ie T max E U W i,t+1 x 1,x 2,,x t i=0 t=0 Based o the objective, the optimality equatio reads x V t 1 W x x total ( 0,t 1,,W,t 1 ) = E max U W t x t ( ) + V x t W x x ( 0,t,,W,t ) W x x { 0,t 1,,W,t 1 } Post decisio state variables elimiate the eed to fid the oe-step trasitio matrix by pullig the expectatio operator out of the max operator, which makes the algorithm computatioally tractable To solve this ADP, we apply the piecewise liear approximatio method to estimate the value fuctio We let the estimatio V i,t x, W i,t x, 1 ( W i,t ) = vi,t m=1 1 x, ( )vi,t ( W i,t ) ( m 1) + W x, x, i,t W i,t approximate the value fuctio of W x, i,t i the th 1 iteratio otatio i is the floor operator Here vi,t is the slope of the piecewise liear fuctio of asset i durig time t i the th iteratio Algorithm 1 lists the etire algorithm for computig the weights We add up the value fuctios of each asset to get the value fuctio approximatio of the portfolio wealth (Step 21) Across iteratios, we wat to improve the performace of the piecewise liear approximatio i estimatig the value fuctios, 1 which is doe by updatig slopes vi,t based o dual variables i the mathematical program The 7

mathematical program also gives the optimal decisio of wealth trasfer (Step 22) Agai, i every iteratio we update the slope values ad use them i the ext iteratio (Step 24) We follow the trick provided by Powell (2011) to calculate the slopes (Step 23) It is straightforward calculus to verify Step 23 To make the slopes form a cocave fuctio, we apply the separable, projective approximatio routie (SPAR) algorithm provided i Powell et al (2004) Cocavity implies that maximizatio i Step 22 is a liear program Thus v! i,t Algorithm 1: 0 Iitialize vt, 0 W1, =1 is the derivative of δv! t x δw i,t 1 Step 1 Choose a sample path of the uderlyig asset price A t ad determie the optio ad strike prices Based o these data, calculate returs r i,t Step 2 For t=1,,t 1 Let 2 Solve Let x t 3 Compute V! t = max U(W total, t ) + V t st x t 1 x, V t ( W i,t ) = 1 V i,t x, W i,t i=0 ( ) 1 x, ( ( W i,t )) = U(W total, t ) + max V 1 x, t W i,t x t ( ( )) x i, j,t = W i,t, for every i; (6) j=0 x i, j,t 0, for every i, j be a optimal solutio to the maximizatio problem, ad v i,t be the dual variable of (6) 4 Update vt 1 usig vi,t 1 v! i,t total ( ) δw i,t δv " t = = δu W t x δw i,t 1 ( )vi,t 1 ( ) + α 1 v i,t W total total, + v # i,t t =W t r i,t ( m) = 1 α 1 1 m, if m = x, Wi,t 1 vi,t 1 ( m), otherwise 5 Fid the post-decisio statew x, i,t, the ext pre-decisio statew i,t+1 solutio i Step 22 ad update portfolio wealthw total, t+1 = Step 3 = + 1 If 1, go back to Step 1 i=0 W i,t+1 based o a optimal 8

T ( ) t=1 Step 4 Retur value fuctios V! 1 t 32 Model ad Algorithms for America Optio Portfolio I solvig the portfolio optimizatio problem comprisig America optios, we face challeges ot oly to come up with the optimal weights, but also the optimal exercise timig As a result, direct optimizatio from the Europea optio case is ot applicable ay more To tackle the challege, we create a Q-value fuctio that is a product of optio weights, exercise time ad the uderlyig asset price, which are the most crucial factors for a America optio portfolio Q-learig is more appropriate i this cotext sice exercise times are discrete values They are preset i both the state space ad actios Sice actios are discrete (to exercise a optio or ot), regressio is more appropriate to approximate the Q-value fuctio; it does ot ivolve derivatives While i Europea optios a time period correspods to the maturity time, i the America optio case the etire time horizo correspods to maturity time T of the optios cosidered Each time period correspods to a decisio epoch about which subset of optios to exercise For this reaso the America optio algorithm oly provides oe set of optimal weights, but the Europea versio has a set of weights for each time period (at the ed of each time period Europea optios expire) Give a set of weights, the optimal exercise times have to be determied by meas of a ADP Followig such a approach it is ot clear how to chage or adjust the weights sice the uderlyig fuctio is the value fuctio of a ADP The idea is to relax the restrictio that we have a sigle weight vector Istead we assume that there is a weight vector per time period which are the adjusted i each iteratio (icludig time period) I summary, the weights are for each optio ad every time period To drive the weight optimizatio ito covergece, we itroduce a progressive hedgig compoet ito the optimality equatio PH adds a factor to the myopic optimizatio problem that pealizes the weights to differ across time periods At the ed of the PH algorithm with embedded ADP, we the fix the portfolio weights by takig the average across all time periods ad deploy a evaluatio stage, which oly allows exercise time as the decisio variable This provides a more accurate evaluatio of the average retur tha the umber give by optimizatio We measure our performace accordig to the evaluatio stage I the followig aalysis, we retai the otatio from the Europea case For simplicity we do ot iclude a risk-free accout 321 Optimizatio Stage I this stage, our mai task is to fid optio weights that will be used at the evaluatio stage Both weights ad exercise times are decisio variables i the optimizatio We first defie relevat variables Let the exercise time of each optio ( ) S t = s 1,t,s 2,t,,s,t 9

ad uderlyig asset prices A t be the state variables Here s i,t is the exercise status idicator which equals to t if optio i is exercised i time period t, ad 0 otherwise Particularly, s i,0 = 0, for every i = 1,, Asset prices are part of the state space for algorithmic purposes (the Q- value approximatio is also a fuctio of these prices) This also allows the opportuity to stochastically geerate them based o a time depedet process without a eed to chage the algorithm Oe of the decisio variables is the set of optios that should be exercised i time period t ote that the exercise times deped o the realizatio of asset prices ad returs but for simplicity we omit this depedecy i our otatio We represet the optio set by a vector of idex variables, deoted by y t For example, if optios 1 ad 2 are to be exercised, the y t = { 1,2} Hece, the correspodig post decisio state variable is S t y ( ) = S t + t 1 y t, (7) S t, y t where1 y t is a -elemet vector of idicator variables with elemet values equal to 1 if correspodig ( ) optios are to be exercised, ad 0 otherwise I the previous example, 1 y t = 1,1,0,0,,0 Oce the state variable s i,t of optio i chages from 0 to a positive iteger of time, it is fixed to this iteger i later time periods Aother decisio variable is the optio weight i each time period ( ) w t = w 0,t,w 1,t,,w,t ote that i this settig we have differet weights for each optio at differet time periods, which deviates from the practice that oly oe set of weights is required before a ivestmet To obtai the optimal oe set of weights w i for every i, we take the average of w i,t for every t This set of weights is fixed ad the used i the evaluatio stage Here we follow the strategy outlied i the itroductio of the sectio I other words, we have side costraits w 1 = w 2 = = w T = w These costraits are relaxed i the PH spirit ( ( w) ) The objective fuctio for our problem is max E U W T w termial portfolio wealth The optimality equatio reads where V t ( ) = max S t, A t w t T { } t=1, where w = w t ( ) S t, A t E max a,b, Here W T is the a = max y t { i: s i,t =0} V S y t+1 ( t (S t, y t ), A t ), b = U w i,t r i,si,t + w i,t r i,t i: s i,t 0 i: s i,t =0 + V S t+1 ( y t ( S t,{ i : s i,t = 0} ), A t ) Cotributio a is the value fuctio if less tha (R 1) optios are exercised, where R is the umber of uexercised optios up till ow It correspods to the case whe ot all optios are exercised Hece, y t is a proper subset of { i : s i,t = 0} Term b is the value fuctio if all the remaiig optios are exercised, thereby o optimizatio is ivolved i this equatio Moreover, oly after all optios are exercised we total ca add the utility of portfolio wealth Without loss of geerality, we assumew 1 = 1, ad hece 10

W T total = w i,t r i,si,t + w i,t r i,t i: s i,t 0 i: s i,t =0 as the argumet of the utility fuctio The first sum of W T is the weighted retur of exercised optios before time period t The secod sum is the weighted retur of the remaiig optios exercised i time period t The maximum of a ad b is the cosidered as the objective fuctio that is beig optimized to fid the best weights ad exercised times Istead of approximatigv t, here we use Q-learig by approximatigv t+1 i a ad b as a fuctio of S t, A t ad w t I essece, we rewrite b = U a = max y t { i: s i,t =0} Q S y t+1 ( t ( S t, y t ), A t,w t ), w i,t r i,si,t + w i,t r i,t i: s i,t =0 + Q y t+1 ( S t ( S t,{ i : s i,t = 0} ), A t,w t ) i: s i,t 0 This is ot quite the stadard Q-value approximatio but a mior variatio To approximateq t+1, we use recursive least square regressio for ostatioary data I the regressio, Q t+1 ( S y t, A t,w t ) = θ i,t+1 ( w i,t s y i,t A t ), whereθ are the regressio coefficiets, ad iside the bracket is the product of three features weights, exercise times ad asset prices We also iclude a progressive hedgig mechaism to lead our algorithm ito covergece We would like to impose that w 1 = w 2 = = w T = w, which meas all time periods should have the same set of weights To achieve this, we add a pealty term i the optimality equatio to drive the optimal weights to coverge to w Hece, the adjusted optimality equatio becomes V t ( S t, A t ) = max E max( a,b) z t w t where parameter z t represets the cumulative differece betwee w t ad w i=0 ( ) T w t ρ 2 w t w 2 2 S, A t t, The algorithm called IPH (iterative progressive hedgig) is preseted i Algorithm 2 I Step 2, we fid the updated value of V t based o the curret approximatio to Q t+1 Step 3 exhibits stadard formulas for updatig regressio coefficiets whe a sigle ew observatio is added After each iteratio of progressive hedgig, the ew average weight w EW ad cumulative deviatio z t are updated i Steps 5 ad 6 We termiate the algorithm if the orm betwee w EW ad w is less tha a give threshold g term (Step 7); otherwise, we let w take the ew value (Step 1) The obtaied sigle set of weights w are further used i the evaluatio stage as a iput 11

Algorithm 2: a Iitializeθ t 0,λ, B 0 t = εi, create samples A t,r i,t b Set g k = 1, w EW = { 1/ } 1, z = { 0} T Loop 1 Let w = w EW For =1,, 1 For t=1,,t Ed Ed 2 Solve V! t = max w t b = U max( a,b) ( z t ) T w t + ρ 2 w t a = 2 w 2 where max { =0} Q 1 t+1 S y t (S t, y t ), A ( t,w t ), y t i: s i,t w i,t r i,si,t + w i,t r i,t i: s i,t 0 i: s i,t =0 1 Q t+1 + Q 1 y t+1 S t ( S t,{ i : s = 0 i,t }), A ( t,w t ), S y, t, A 1 ( t,w t ) = θ i,t+1 ( w i,t s y, A i,t ) t Let w,* t be a optimal solutio ad y,* t be a optimal solutio to the maximizatio problem for computig a or{ i : s i,t = 0}, depedig o which term attais the maximum i max a,b i=0 ( ) By usig y t,* ad (7), we update the exercise time of each optio i to s y*, i,t We the defie φ i,t = w,* i,t s y*, i,t A t for every optio i 3 Update where θ t 1 = θ t H t φ! t ε t,! 1 ε t = ( θ t ) T φ t V " t, H t = 1 γ B 1 t, γ t = λ + ( φ t ) T B 1 t φ t, t B t = 1 λ 4 Fid the ext pre-decisio state 5 Update w EW = 1 T w t t, ( ) 6 Update z t = z t + ρ w t w EW for all, t B 1 t 1 γ B t t S t+1 1 φ t ( φ t ) T 1 B t y = S t S,* ( t, y t ) 12

7 If w EW w < g term, exit Ed The most importat output of IPH are weights w (although the approximate Q-value fuctio is also a output) 322 Evaluatio stage With weights from the optimizatio stage, we ow move o to the evaluatio stage By fixig optio weights, we mimic real-world practice that does ot allow weights varyig over time I other words, exercise time ow becomes the oly decisio variable This stage basically evaluates the weights more precisely by usig two differet algorithms tailored specifically for fidig exercise times The first algorithm is a stripped-dow versio of IPH, which omits the loop of progressive hedgig ad does ot eed to solve for the weights sice they are give from the optimizatio stage With these simplificatios, the ew algorithm called IPH-ADP oly fids the exercise time of each optio i every sample path (give weights) The secod evaluatio algorithm is a modificatio of Logstaff ad Schwartz (2001) They proposed a Least Square Mote Carlo (LSMC) algorithm, which prices a America optio by regressio ad returs optimal exercise time for each sample path We modify their method, apply it i the portfolio settig ad evaluate their performace i fidig the optimal exercise time To capture risk aversio, cash flows i the origial algorithm ow chage to the utility of optio returs The algorithm assumes a additive utility fuctio so that the portfolio utility is the sum of idividual utility of each optio Algorithms 3 ad 4 preseted i Appedix are the modified LSMC algorithm The resultig overall algorithm is labeled as IPH-LSMC (weights obtaied by IPH ad evaluatio doe by our versio of LSMC) ote that the two evaluatio algorithms oly provide two ways to evaluate the weights by determiig exercise times i two differet ways Oe is based o our ow ADP (stripped-dow versio of IPH where weights are fixed but subsets of optios to exercise are explicitly captured i states ad actios) ad the other oe is a modificatio of a kow algorithm based o LSMC 323 Quasi-Optimal Bechmark Algorithm To bechmark our IPH algorithm at the optimizatio stage, we also itroduce a quasi optimal algorithm that sweeps all possible combiatios of portfolio weights i fie-graular discrete steps Sice there are may weight values, the algorithm works oly for a small umber of optios For each portfolio weight, we first search the best exercise time of each optio i every sample path, which is foud by eumeratig all possible sets of time periods This implies that the umber of time periods also 13

eeds to be reasoably low The we select the weight combiatio with the largest portfolio utility, give the optimal exercise times based o sampled paths The weights obtaied from this process are further used at the evaluatio stage We wat to assess the quality of IPH i compariso to the quasi optimal weights obtaied by eumeratio The two resultig versios based o the two evaluatio methods are deoted as QO-ADP ad QO-LSMC For example, comparig QO-ADP with IPH-ADP assesses the quality of the weights by usig the same evaluatio algorithm To add o top of this the quality of the evaluatio, we ca also assume perfect iformatio at the evaluatio stage to fid the optimal exercise time The correspodig algorithm the simply eumerates all possible sets of exercise times ad picks the time that yields the largest portfolio utility, give the quasi optimal weights We call the resultig algorithm QO-PERFECT We stress that to obtai quasi optimal weights at the optimizatio stage ad optimal exercise times at the evaluatio stage requires perfect iformatio, which is impossible i reality Except for eumeratig weights, thus, QO-PERFECT provides a lower boud o a optimal solutio I what follows we use QO-PERFECT as the baselie ad all other solutios are measured agaist it 4 umerical Results I this sectio we preset umerical comparisos of the various algorithms proposed We use the CRRA utility fuctio U ( W ) = W 1 γ 1 γ with risk aversio parameterγ = 9, based o Roseberg ad Egle (2002) All algorithms have bee implemeted i MATLAB o a Apple Mac computer with 40 GHz Itel Core i7 processor ad 16 GB of RAM 41 Results for Europea Optio Portfolio We assume there are four optios i the portfolio: a ATM put optio, a 5% OTM put optio, a ATM call optio ad a 5% OTM call optio, which have higher liquidity They are based o a sigle uderlyig asset S&P 500 idex Besides, there is also a risk free accout We perform a 30-moth test for both algorithms, where oe moth is a time period I the begiig, we fit the parameters by usig historical data usig the geeralized extreme value (GEV) distributio ad the sample the iitial idex returs 500 times As we move forward i time from a moth to the ext oe, we re-fit the parameters of GEV based o the historical values available up to the curret time For example, we use 15-years of historical S&P 500 data to trai the iitial GEV parameters Optios expire at the ed of the 1 st moth ad we eter a ew time period The we add the kow idex value durig the 1 st moth to fit a ew 14

set of GEV parameters The ew parameters are used to sample asset returs for the 2 d moth The 30 sets of GEV parameters are ot listed here The simulated idex (asset) values ca be obtaied by A t = r t A t 1 Without the data of historical optio prices, the optio prices are computed based o Black Scholes formula by iputtig the historical idex volatilities The risk free rate is defied to be the oemoth Lodo Iterbak Offered Rate (LIBOR) We use the same sample paths ad data to evaluate both the ADP ad OOPS algorithms For simplicity, the same set of samples is used for optimizatio ad evaluatio It takes 75 secods to ru ADP ad 6 secods to ru OOPS Figure 1 shows the compariso of portfolio returs durig each time period Mea values of portfolio returs are used We fid that our ADP method outperforms the OOPS method Out of the 30 moths, the ADP algorithm surpasses OOPS i 28 moths The across-time mea portfolio retur of OOPS is - 618%, while ADP returs 7251% From Figure 2 we see that the ADP algorithm also outperforms OOPS i terms of the cumulative wealth The mea cumulative retur for OOPS is -9263%, while 6675% for ADP Figure 1: Compariso of portfolio returs Figure 2: Portfolio wealth overtime Figure 3 shows the covergece of the ADP algorithm We pick the mea rate of retur i moth 4 ad plot it agaist the icreasig umber of iteratios We fid the mea values to stabilize after 300 iteratios This is also the case for the cumulative portfolio retur, see Figure 4 Figure 3: Mea rate of retur durig moth 4 Figure 4: Cumulative portfolio retur 15

42 Results for America Optio Portfolio I this sectio, we show umerical results for a America optio portfolio by comparig the differet algorithms We simulate 20 moths of log returs of uderlyig asset usig GEV distributio with parameter k = -0149, σ =00153, µ=-000545 These parameters are fitted based o 15 years of historical S&P500 idex values After simulatig the mothly returs, we the calculate the cumulative mothly idex returs for each sample path Figure 5 shows that the cumulative idex retur is positively skewed based o the calibrated parameters The portfolio cotais four America optios: a ATM put optio, a 5% OTM put optio, a ATM call optio ad a 5% OTM call optio, as i the Europea case All deped o the same uderlyig idex Their prices are determied usig the algorithm proposed by Logstaff ad Schwartz (2001), with LIBOR rates ad historical volatilities as the pricig iputs At the optimizatio stage, we use 50 iteratios (sample paths) to fid weights with g term = 01ad ρ = 1for T = 5,10 ad g term = 015 ad ρ = 2 for T = 15,20 These parameters are chose to trade off executio time ad utility performace At the evaluatio stage, we use 1,000 iteratios A larger umber of iteratios are used at the evaluatio stage to create a histogram of exercise times (Figure 8) with a reasoable umber of bis ote that we resample for the evaluatio stage I the bechmark algorithm we discretize the weights by 005 Figure 5: Mothly cumulative retur of simulated idex fitted by GEV distributio To see how well the algorithm performs, we calculate the gap betwee achieved utility usig IPH-ADP or IPH-LSMC ad the bechmark algorithm Sice the QO-PERFECT algorithm has perfect iformatio for both weights ad exercise times, we eed to pealize it i order to get a fair gap We therefore add a gradiet-based pealty to the QO-PERFECT (baselie) portfolio utility The gradiet-based pealty was proposed by Brow ad Smith (2011) It takes 80 secods to ru IPH, 2 secods to ru ADP-based evaluatio ad 1 secod to ru LSMC 16

Figure 6: Utility gap from baselie Figure 7: Certaity equivalet of returs ote that both IPH-ADP ad IPH-LSMC use the weights from the IPH algorithm Usig the IPH-ADP algorithm, the utility gap without pealty teds to stabilize aroud 25 31% With pealty, the gap is reduced by 7 15% (Figure 6) We see that as the umber of time periods icreases, the utility gap with pealty teds to decrease IPH-LSMC leads to higher utility gap, with pealized gap eve higher tha the upealized gap from the IPH-ADP algorithm The ext experimet oly allows perfect iformatio to determie portfolio weights at the optimizatio stage We wat to fid how far the IPH weights are from quasi optimal As we have discussed i 323, despite differet weight choices, we still use the ADP ad LSMC algorithms to measure performaces I other words, we compare IPH-ADP agaist QO-ADP, ad IPH-LSMC agaist QO-LSMC From Figure 7, the differece of certaity equivalet (CE) betwee the IPH-ADP algorithm ad QO-ADP is arrowig ad decreases from aroud 10% to 46% However, although risig slowly as the umber of time periods icreases, CE of the IPH-LSMC algorithm is very low, less tha 8% 17

Figure 8: Exercise time of each optio I Figure 8, we oly plot the exercise times for optios that are exercised i each sample path usig the IPH-ADP algorithm Optios ted to be exercised i late periods before maturity Figure 9: Utility gap from baselie Figure 10: Certaity equivalet of retur I the 3 rd experimet, we vary the CRRA parameter (Figures 9 ad 10) The utility gap betwee IPH- ADP ad QO-PERFECT is less tha 16% with pealty, while the CE of IPH-ADP is aroud 5% less tha the pealized QO-PERFECT value As for fial experimet, we icrease the umber of optios to 10 ad evaluate the performace of the IPH-ADP algorithm The added 6 optios are: a 5% ITM call optio, a 5% ITM put optio, a 25% OTM call optio, a 25% OTM put optio, a 25% ITM call optio ad a 25% ITM put optio The portfolio is ow symmetric; there are both put ad call, ITM ad OTM for all values of moeyess With a larger umber of optios, we set g term = 025, ρ = 4 ad T = 20 Due to the expoetially icreased ruig time of QO (eumeratig all optios), we did ot ru the QO bechmark because it would take days to fiish a sigle ru CE of IPH-ADP is 141%, similar to the 4-optio case The algorithm takes 8 miutes to ru which shows its scalability 18

5 Coclusios ad Future Work I this paper, we have proposed the model of Europea ad America optio portfolios ad use ADP algorithms to fid excellet portfolio compositios Our algorithms outperform the existig OOPS algorithm i terms of the cumulative retur ad perform better tha the LSMC algorithm with regard to the utility gap from optimal ad CE of retur The gap ad CE are better with a loger time horizo, while with a icreasig CRRA parameter, the gap is relatively stable ad the CE decreases The future work ca focus o two parts 1) Addig trasactio cost ito the model: The optio market has a high trasactio cost Cosequeces of this frictio ca be pricig aomalies such as violatio of the put-call parity (Phillips ad Smith, 1980) Hece, it ca be a ext step to take trasactio cost ito accout i our model We could follow Eraker (2013) to use bid-ask spread as the trasactio cost 2) Allowig borrowig ad short-sellig optios: So far we do ot allow borrowig from the bak ad short-sellig optios is also ot permitted Usually, eterig ito a short sale trasactio is a way for traders to profit This feature ca also be explored i future work 19

Refereces Biachi, L, Dorigo, M, Gambardella, L M, & Gutjahr, W J (2009) A survey o metaheuristics for stochastic combiatorial optimizatio atural Computig: a iteratioal joural, 8(2), 239-287 Birge, J R (2007) Optimizatio methods i dyamic portfolio maagemet Hadbooks i Operatios Research ad Maagemet Sciece, 15, 845-865 Bradt, M W (2009) Portfolio choice problems Hadbook of Fiacial Ecoometrics, 1, 269-336 Brow, D B, & Smith, J E (2011) Dyamic portfolio optimizatio with trasactio costs: Heuristics ad dual bouds Maagemet Sciece, 57(10), 1752-1770 Costatiides, G M, Jackwerth, J C, & Savov, A (2013) The puzzle of idex optio returs Review of Asset Pricig Studies Driesse, J, & Maehout, P (2013) The world price of jump ad volatility risk Joural of Bakig & Fiace, 37(2), 518-536 Ilha, A, Josso, M, & Sircar, R (2004) Portfolio optimizatio with derivatives ad idifferece pricig Idifferece Pricig (ed Carmoa), 181-210 Eraker, B (2013) The performace of model based optio tradig strategies Review of Derivatives Research, 16(1), 1-23 Faias, J, & Sata-Clara, P (2011) Optimal optio portfolio strategies I AFA 2011 Dever Meetigs Paper http://papersssrcom/sol3/paperscfm?abstract_id=1569380 Joes, C S (2006) A oliear factor aalysis of S&P 500 idex optio returs The Joural of Fiace, 61(5), 2325-2363 Liu, J, & Pa, J (2003) Dyamic derivative strategies Joural of Fiacial Ecoomics, 69(3), 401-430 Logstaff, F A, & Schwartz, E S (2001) Valuig America optios by simulatio: a simple leastsquares approach Review of Fiacial studies, 14(1), 113-147 Markowitz, H (1952) Portfolio selectio The Joural of Fiace, 7(1), 77-91 20

Merto, R C (1969) Lifetime portfolio selectio uder ucertaity: The cotiuous-time case The Review of Ecoomics ad Statistics, 51(3), 247-257 Merto, R C (1971) Optimum cosumptio ad portfolio rules i a cotiuous-time model Joural of Ecoomic Theory, 3(4), 373-413 Merto, R C (1975) Theory of fiace from the perspective of cotiuous time Joural of Fiacial ad Quatitative Aalysis, 10(04), 659-674 Merto, R C (1990) Cotiuous-time fiace Cambridge, Mass, B Blackwell Phillips, S M, & Smith, C W (1980) Tradig costs for listed optios: The implicatios for market efficiecy Joural of fiacial ecoomics, 8(2), 179-201 Powell, W, Ruszczyński, A, & Topaloglu, H (2004) Learig algorithms for separable approximatios of discrete stochastic optimizatio problems Mathematics of Operatios Research, 29(4), 814-836 Powell, W B (2011) Approximate Dyamic Programmig: Solvig the curses of dimesioality Joh Wiley & Sos Rockafellar, R T, & Wets, R J B (1991) Scearios ad policy aggregatio i optimizatio uder ucertaity Mathematics of operatios research, 16(1), 119-147 Roseberg, J V, & Egle, R F (2002) Empirical pricig kerels Joural of Fiacial Ecoomics, 64(3), 341-372 21

Appedix Here we preset the modified algorithm to fid exercise times of a portfolio of America optios assessed by a utility fuctio The algorithm is a modificatio of LSMC from Logstaff ad Schwartz (2001) where a sigle optio with o utility is dealt with Algorithm 3 fids a set of liear regressio parameters for each optio i every time period By comparig the holdig value agaist the exercise value, Algorithm 3 determies what optios we should exercise i each time period After Algorithm 3 computes the regressio coefficiets based o a set of sample paths, Algorithm 4 resamples the paths ad computes exercise times by usig the regressio coefficiets from Algorithm 3 ote that Algorithm 3 starts from the last time period T, while Algorithm 4 goes forward from the first time period The weights from IPH are used oly i Algorithm 4 Algorithm 3: a Sample A t,r i,t for = 1,, 1 b Calculate the exercise value fuctiov,e i,t = U r i,t ( ) for all i,, t c SetV i,t,h = V i,t,e, e * = T for all, i (superscript e stads for exercise ; h stads for hold ; e * stads for optimal exercise time ) For t=t-1 to 1 1 Apply the least square regressio based o 1 realizatios ad fid the values for c, b 1, b 2, by solvig For =1 to 1 Ed Ed 4 Retur c, b 1, b 2 ξ m + c i,t + b i,1,t A t m + b i,2,t 2 SetV i,t,h = c i,t + b i,1,t A t + b i,2,t A t 3 CompareV i,t,h adv i,t,e : m ( A t ) 2 = V i,me * ( ) 2 for all i { } m,e for all i, m 1,, 1 IfV i,t,h V i,t,e, update e * = t,v i,e *,e = V i,t,e for all i 22

Algorithm 4: a Sample A t,r i,t for = 1,, 1 b Apply Algorithm 3 to obtai vectors For =1,, 1 For t=0,1,,t 1 Set I = { 1,, } Ed Ed 2 Solve where 3 Set ( ) i=1 c t = c i,t max y t 0,1 { },b1,t = b i,1,t ( ),b2,t i=1 = ( b i,2,t ) i=1 ( )V i,t,h ( ) + y i,t ( ) w i 1 y i,t A,e t V i,t A t, i I,h V i,t ( A t ) = c i,t + b i,1,t A t + b i,2,t A t ( ) 2,e, V i,t ( A t ) = U r i,t Let y t,* be a optimal decisio of the optimizatio problem I = I \ { i : y,* i,t = 1}, t,* i = t for i with y,* i,t = 1 4 Retur exercise times t i,* for each sample path ad optio i ( ) 23