MULTISTAGE STOCHASTIC PROGRAMS WITH A RANDOM NUMBER OF STAGES: DYNAMIC PROGRAMMING EQUATIONS, SOLUTION METHODS, AND APPLICATION TO PORTFOLIO SELECTION

Size: px

Start display at page:

Download "MULTISTAGE STOCHASTIC PROGRAMS WITH A RANDOM NUMBER OF STAGES: DYNAMIC PROGRAMMING EQUATIONS, SOLUTION METHODS, AND APPLICATION TO PORTFOLIO SELECTION"

Marilynn Malone
5 years ago
Views:

1 MULTISTAGE STOCHASTIC PROGRAMS WITH A RANDOM NUMBER OF STAGES: DYNAMIC PROGRAMMING EQUATIONS, SOLUTION METHODS, AND APPLICATION TO PORTFOLIO SELECTION Vincent Guigues School of Applied Mathematics, FGV Praia de Botafogo, Rio de Janeiro, Brazil vguigues@fgv.br Abstract. We introduce the class of multistage stochastic optimization problems with a random number of stages. For such problems, we show how to write dynamic programming equations and how to solve these equations using the Stochastic Dual Dynamic Programming algorithm. Finally, we consider a portfolio selection problem over an optimization period of random duration. For several instances of this problem, we show the gain obtained using a policy that takes the randomness of the number of stages into account over a policy built taking a fixed number of stages (namely the maximal possible number of stages). Keywords: Stochastic programming and Random number of stages and SDDP and Portfolio selection. AMS subject classifications: 9C, 9C9. 1. Introduction Multistage Stochastic Programs(MSPs) are common in finance and in many areas of engineering, see for instance [44 and references therein. These models are useful when a sequence of decisions has to be taken over an optimization period of T stages knowing that these decisions must satisfy almost surely random constraints and induce some random costs [44, 3, 36, 41, 39. To the best of our knowledge all MSPs considered so far in the literature have a known (finite in general) number of stages. However, for many applications the number of stages, i.e., the real optimization period, is not known in advance. It is easy to name a few of these applications: A company may want to determine optimal investments over its lifetime [31, 28, 4, 3. In this situation, the optimization period ends when the company disappears either because it goes bankrupt, or because it is bought by another company, or because it decides to stop its activities [9. These three stopping times, which determine the number of stages T, are indeed random. A fund manager can decide to stop his investments when the fund reaches a given target. This stopping time is again random, depending on the random returns of the investments and of the investment strategy. Multistage stochastic portfolio optimization [1, 2, 12: an individual may invest his money in financial assets until his death or until he obtains a given amount used for covering some expense [4. Again, both stoppping times are random. An hedge fund may have to deal with longevity risk [32, 8, 42, 27, with payout ratios for a given set of individuals spreading over random time windows. 1

2 2 The examples above are examples in finance but in many other areas, for instance logistics [6, power management [33, 34, 19,, and capacity planning and expansion [, 11,, MSPs with a random optimization period could be useful, especially for long-term optimization [13 when the optimization period depends on the lifetime of individuals or companies. It is therefore natural to consider multistage stochastic programs having a random number of stages. The study of these problems passes through two successive steps: (i) a modelling step to define a policy and (ii) an algorithmic step to build a policy, i.e., a solution method allowing us to compute decisions on any realization of the uncertainty. Guided by the fact that there exist efficient solution methods for MSPs based on dynamic programming equations (for instance Stochastic Dual Dynamic Programming (SDDP) [34 and Approximate Dynamic Programming (ADP) [4), our goal for (i) is to write dynamic programming equations for a multistage stochastic program with a random number of stages. The paper is organized as follows. In Section 2, we define multistage stochastic optimization problems with a random number of stages and show how to write dynamic programming equations for these problems. We show in particular, that compared to the case where the number of stages is fixed, two new features appear: first we need to add an extra state variable, denoted by D t 1 for stage t, allowing us to know if the optimization period is already over or not and second for each stage instead of just one cost-to-go function we have two cost-to-go functions, one if the optimization period is already over (this is the null function) and another one when there remains additional stages for the optimization period. In Section 3, we write dynamic programming equations for a slightly larger class of MSPs, still having a random number of stages, but where the cost function for the last (random) stage and the cost functions for the remaining stages are taken from two sets of functions. We provide a portfolio selection model as an example of such problems. In the case when the underlying stochastic process ξ t neither depends on its past (ξ 1,...,ξ t 1 ) nor on D t and D t only depends on D t 1, we detail in Section 4 the SDDP algorithm to solve the dynamic progamming equations written in Section 2. This variant of SDDP, called SDDP-TSto, is very similar to the variants of SDDP presented in [37, 29 where the underlying stochastic process depends on a Markov Chain (process (D t ) in our case) and a value function is used for each stage and in each state of the Markov chain. Finally, in Section, we consider a portfolio problem with a random optimization period and the corresponding dynamic programming equations, given in Section 3. We detail the SDDP algorithm applied to these equations and present the results of numerical tests which compare for several instances the performance of a policy that takes the randomness of the number of stages into account with the performance of a policy built taking a fixed value for the number of stages, namely the maximal possible value. 2. Writing dynamic programming equations for multistage stochastic programs with a random number of stages Consider a risk-neutral multistage stochastic optimization problem with T max known stages of form (1) [T max inf E ξ2,...,ξ Tmax t=1 f t (x t,x t 1,ξ t ) x t X t (x t 1,ξ t ) a.s.,x t F t -measurable, t = 1,...,T max, where x is given, ξ 1 is deterministic, (ξ t ) Tmax t=2 is a stochastic process, F t is the sigma-algebra F t := σ(ξ j,j t), and X t (x t 1,ξ t ) is a subset of R n. In the objective, the expectation is computed with respect to the distribution of ξ 2,...,ξ Tmax. We assume that the problem above is well defined. We will come back in Section 4 to the necessary assumptions for problem (1) to be well defined and to apply SDDP as a solution method.

3 3 Our goal in this section is to define multistage stochastic optimization problems where the number of stages is not fixed (T max in (1)) anymore but is stochastic, and to derive dynamic programming equations, under several assumptions, for such problems. We will assume that (H1) the number of stages T is a discrete random variable taking values in {2,...,T max }. The number of stages T, or stopping time, induces the Bernoulli process D t, t = 1,...,T max (a Death process), where D t = 1 T>t is the indicator of the event {T > t}: { if the optimization period ended at t or before, (2) D t = 1 T>t = 1 otherwise. Therefore T can be written as the following function of process (D t ): { } (3) T = min 1 t T max : D t =. Clearly, D t,t = 1,...,T max, are dependent random variables and the distribution of D t given D t 1 is known as long as the distribution of T is known. More precisely, since we have at least 2 stages, D 1 takes value 1 with probability 1. Next, denoting p t = P(T = t) and q t = P(D t = D t 1 = 1), we have q 2 = P(T = 2) = p 2, and for t {2,...,T max } we get p t = P(T = t) = P(D 2 = 1;D 3 = 1;...;D t 1 = 1;D t = ) [ t 1 = P(D 2 = 1) P(D k = 1 D 2 = 1;D 3 = 1;...;D k 1 = 1) P(D t = D 2 = 1;...;D t 1 = 1) = (1 q 2 ) k=3 [ t 1 t 1 P(D k = 1 D k 1 = 1) P(D t = D t 1 = 1) = q t (1 q k ). k=3 Therefore transition probabilities q t,t = 2,3,...,T max, can be computed using the recurrence (4) q t = p t,t = 3,...,T t 1 max, (1 q k ) k=2 starting from q 2 = p 2 (note that q Tmax = 1). BydefinitionofD t,wealsohavep(d t = D t 1 = ) = 1orequivalentlyP(D t = 1 D t 1 = ) =. We represent in Figure 1 the scenariotree of the realizationsof D 1,D 2,...,D Tmax (for an example where T max = ), as well as the transition probabilities between the nodes of this scenario tree. In the case when the number of stages is stochastic, the decision x t for stage t is not only a function of the history ξ [t = (ξ 1,ξ 2,...,ξ t ) of process (ξ t ), as in (1), but also depends on the history of process (D t ). Therefore, we come to the following definition of a risk-neutral multistage stochastic optimization problem with a random number T of stages: () where F t is the sigma-algebra inf E ξ2,...,ξ Tmax,D 2,...,D Tmax [ T t=1 f t (x t,x t 1,ξ t ) k=2 x t X t (x t 1,ξ t ) a.s., x t F t -measurable, t = 1,...,T max, (6) F t = σ(ξ j,d j,j t) and where T is the function of (D t ) given by (3). Note that in the objective of () the expectation is computed with respect to the distribution of ξ 2,...,ξ Tmax,D 2,...,D Tmax. Plugging (3) into (),

4 4 problem () can be written (7) inf E ξ2,...,ξ Tmax,D 2,...,D Tmax [ 1 t min{1 τ T max:d τ=} x t X t (x t 1,ξ t ) a.s., x t F t -measurable, t = 1,...,T max. f t (x t,x t 1,ξ t ) To write dynamic programming equations for (7) we now define the state vectors. The state vector at stage t is given by x t 1 (decision taken at the previous stage) and the relevant history of processes (ξ t ) and (D t ). Though all the history ξ [t 1 = (ξ 1,...,ξ t 1 ) of process (ξ t ) until stage t 1 may be necessary, we argue that it is enough to put in the state vector for stage t past value D t 1 of (D t ). Indeed, if D t 1 = 1 then the whole history of (D t ) until t 1 is known: we know that D j = 1 for 1 j t 1; on the other hand, if D t 1 = then whatever the history of (D t ) until t 1, we know that the cost function is null for stage t because the optimization period ended at t 1 or before. Consequently the state vector at stage t is (x t 1,ξ [t 1,D t 1 ) and we introduce for each stage t = 2,...,T max, two functions: Q t such that Q t (x t 1,ξ [t 1,D t 1,ξ t,d t ) is the optimal mean cost from t on starting at t from state (x t 1,ξ [t 1,D t 1 ) and knowing the values ξ t and D t of processes (ξ t ) and (D t ) at t; Q t given by [ (8) Q t (x t 1,ξ [t 1,D t 1 ) = E ξt,d t Q t (x t 1,ξ [t 1,D t 1,ξ t,d t ) D t 1,ξ [t 1, i.e., Q t (x t 1,ξ [t 1,D t 1 ) is the optimal mean cost from t on starting at t from state (x t 1,ξ [t 1,D t 1 ). We also set Q Tmax+1(x Tmax,ξ [Tmax,D Tmax ). With these definitions, clearly for t = 2,...,T max, we have (9) Q t (x t 1,ξ [t 1,) =. Next, for t = 2,...,T max, functions Q t (,,1) satisfy the recurrence (1) Q t (x t 1,ξ [t 1,1) = E ξt,d t [ Q t (x t 1,ξ [t 1,1,ξ t,d t ) D t 1 = 1,ξ [t 1 where (11) Q t (x t 1,ξ [t 1,1,ξ t,) = { infxt f t (x t,x t 1,ξ t ) x t X t (x t 1,ξ t ), and { infxt f (12) Q t (x t 1,ξ [t 1,1,ξ t,1) = t (x t,x t 1,ξ t )+Q t+1 (x t,ξ [t 1,ξ t,1) x t X t (x t 1,ξ t ). The reasons for equations (1)-(12) are clear: Q t (x t 1,ξ [t 1,1,ξ t,) is the optimal mean cost from stage t on, knowing that the optimization period ends at t and that ξ t is the value of process (ξ t ) at stage t. Therefore it is obtained minimizing the immediate stage t cost while satisfying the constraints for stage t. Q t (x t 1,ξ [t 1,1,ξ t,1) is the optimal mean cost from stage t on, knowing that the optimization period continues after t and that ξ t is the value of process (ξ t ) at stage t. Therefore it is obtained minimizing the immediate stage t cost plus the future optimal mean cost (which is Q t+1 (x t,ξ [t,d t ) = Q t+1 (x t,ξ [t 1,ξ t,1) since D t = 1) while satisfying the constraints for stage t.

5 We observe that equations (9)-(12) can be written under the following compact form: for t = 2,...,T max, [ (13) Q t (x t 1,ξ [t 1,D t 1 ) = E ξt,d t Q t (x t 1,ξ [t 1,D t 1,ξ t,d t ) D t 1,ξ [t 1 where { infxt D (14) Q t (x t 1,ξ [t 1,D t 1,ξ t,d t ) = t 1 f t (x t,x t 1,ξ t )+Q t+1 (x t,ξ [t,d t ) x t X t (x t 1,ξ t ). Setting D = 1, recalling that D 1 = 1 and that F t is given by (6), it is straighforwardly seen that the optimal value of (7) can be expressed as () inf x1 D f 1 (x 1,x,ξ 1 )+Q 2 (x 1,ξ 1,D 1 ) x 1 X 1 (x,ξ 1 ), and that (13)-(14) are dynamic programming equations for the problem (16) [T max inf E ξ2,...,ξ Tmax,D2,...,D Tmax t=1 D t 1 f t (x t,x t 1,ξ t ) x t X t (x t 1,ξ t ) a.s., x t F t -measurable, t = 1,...,T max, which is an equivalent reformulation of (7). The impact of the randomness of T on the dynamic programming equations is clear from reformulation (16) of problem (7). In this reformulation, the number of stages is fixed and known: it is the maximal possible number of stages T max for T. Therefore, it takes the form of a usual multistage stochastic optimization problem where random variable T was replaced by the interstage dependent random process (D t ) and the cost function f t at stage t was replaced by the random cost function D t 1 f t. Indeed, when the optimization period ended at t 1 or before the cost function is null for stage t or equivalently can be expressed as D t 1 f t since D t 1 = in this case. On the other hand, if the optimization period did not end at t 1 then D t 1 = 1 and again the cost function for stage t can be expressed as D t 1 f t (= f t in this case). Note that in these equations, (ξ t,d t ) can depend on ξ [t 1. Clearly D t depends on D t 1 but (ξ t,d t ) can be independent on ξ [t 1. In this situation, ξ [t 1 is not needed in the state vector at t and the dynamic programming equations simplify as follows: Q Tmax+1(x Tmax,D Tmax ) and for t = 2,...,T max, we have [ (17) Q t (x t 1,D t 1 ) = E ξt,d t Q t (x t 1,D t 1,ξ t,d t ) D t 1 where { infxt D (18) Q t (x t 1,D t 1,ξ t,d t ) = t 1 f t (x t,x t 1,ξ t )+Q t+1 (x t,d t ) x t X t (x t 1,ξ t ). Finally, let us consider the case when ξ t does not depend on (ξ [t 1,D t ) and D t only depends on D t 1. In this setting, (D t ) is an inhomogeneous Markov chain with two states: an absorbing state corresponding to the case when the optimization period is over and a second state where the optimization period is still not over. We assume that the distribution of ξ t is discrete with finite support {ξ t1,...,ξ tmt } with p tj = P(ξ t = ξ tj ). The scenario trees for (ξ 1,...,ξ Tmax ) and ((D 1,ξ 1 ),(D 2,ξ 2 ),...,(D Tmax,ξ Tmax )) (nodes and transition probabilities) are represented in Figure 1 (right and bottom left plots) on an example where T max = 3 and where ξ t has two possible realizations for all t = 2,...,T max. With these assumptions dynamic programming equations (17)-(18) can be written as follows: Q Tmax+1(x Tmax,D Tmax ), (19) Q t (x t 1,) =, t = 2,...,T max,

6 6 q 2 q q 2 1 q 3 1 q t = 1 t = 2 t = 3 t = 4 t = T max = Scenario tree for (D 1,D 2,...,D Tmax ) with T max = q q 2 p 22 p 31 (,ξ 22 ) q 2 p 21 (,ξ 21 ) p 32 p 32 p 31 (,ξ 32 ) (,ξ 31 ) (,ξ 32 ) (,ξ 31 ) p 32 ξ 32 (1,ξ 1 ) (1 q 2 )p 22 p 32 (,ξ 32 ) ξ 1 p 22 ξ 22 p 31 p 21 p 32 ξ 31 ξ 32 (1 q 2 )p 21 (1,ξ 22 ) p 31 p 32 (,ξ 31 ) (,ξ 32 ) ξ 21 p 31 ξ 31 t = 1 t = 2 t = 3 Scenario tree for (ξ 1,ξ 2,...,ξ Tmax ) with T max = 3 p (1,ξ ) (,ξ 31 ) t = 1 t = 2 t = 3 Scenario tree for ((D 1,ξ 1 ),(D 2,ξ 2 ),...,(D Tmax,ξ Tmax )) with T max = 3 Figure 1. Scenario trees (assuming that ξ t does not depend on (ξ [t 1,D t )). and M t M t (2) Q t (x t 1,1) = (1 q t ) p tj Q t (x t 1,1,ξ tj,1)+q t p tj Q t (x t 1,1,ξ tj,), where q t is given by (4), { infxt f (21) Q t (x t 1,1,ξ tj,1) = t (x t,x t 1,ξ tj )+Q t+1 (x t,1) x t X t (x t 1,ξ tj ), and (22) Q t (x t 1,1,ξ tj,) = { infxt f t (x t,x t 1,ξ tj ) x t X t (x t 1,ξ tj ). Remark 2.1. Observe that the dynamic programming equations above correspond to a model that minimizes the expected cost with respect to the distribution of (ξ 1,ξ 2,...,ξ Tmax,D 1,D 2,...,D Tmax ). From the Law of Large Numbers, this model is useful when the corresponding policy is repeatedly applied by individuals sharing the same distribution of the number of stages T. An example would be a group of companies sharing the same distribution for their lifetime, see [9 and Section Dynamic programming equations for more general models 3.1. Writing dynamic programming equations. In the previous section, we considered models where the cost functions for stages t T are taken from the collection of functions (f t ), namely

7 7 f t for stage t as long as t T. It is possible to write dynamic programming equations for more general risk-neutral stochastic programming models having a random number T of stages where for 1 t min(t 1,T) the cost function is f t,1 for some random variable T 1, for min(t 1,T)+1 t min(t 1 +T 2,T) the cost function is f t,2 for some random variable T 2, and so on... As a special case, assume that T 1 = T 1: for t = 1,...,T 1, the cost function is f t (x t,x t 1,ξ t ) and for t = T the cost function is f t (x t,x t 1,ξ t ). In this situation, recalling definition (2) of D t, the cost function for stage t can be written (23) D t f t (x t,x t 1,ξ t )+(D t 1 D t )f t (x t,x t 1,ξ t ). Indeed, if t < T we have D t 1 = D t = 1 and D t f t (x t,x t 1,ξ t ) + (D t 1 D t )f t (x t,x t 1,ξ t ) = f t (x t,x t 1,ξ t ); if t = T we have D t 1 = 1,D t =, and D t f t (x t,x t 1,ξ t ) + (D t 1 D t )f t (x t,x t 1,ξ t ) = f t (x t,x t 1,ξ t ); if t > T no costs are incurred, we have D t 1 = D t = and D t f t (x t,x t 1,ξ t ) + (D t 1 D t )f t (x t,x t 1,ξ t ) = ; as required. In the next section we present a simple portfolio problem modelled by a MSP of this type. Therefore, for the special case we are dealing with now, with cost function (23) for stage t, we obtain the multistage stochastic program (24) [T max inf E ξ2,...,ξ Tmax,D 2,...,D Tmax t=1 D t f t (x t,x t 1,ξ t )+(D t 1 D t )f t (x t,x t 1,ξ t ) x t X t (x t 1,ξ t ) a.s.,x t F t -measurable, t = 1,...,T max, where F t is the sigma-algebra given by (6). Observe that when f t (x t,x t 1,ξ t ) = f t (x t,x t 1,ξ t ), we are back to the stochastic programs considered in the previous section, i.e., problem (24) becomes problem (16). Clearly, (24) is obtained replacing in (16) the cost function D t f t (x t,x t 1,ξ t ) for stage t by (23). Therefore dynamic programming equations for (24) are obtained updating correspondingly the cost functions in dynamic programming equations (13)-(14) for (16), i.e., the dynamic programming equations for (24) are: Q Tmax+1(x Tmax,ξ [Tmax,D Tmax ) and for t = 2,...,T max, [ () Q t (x t 1,ξ [t 1,D t 1 ) = E ξt,d t Q t (x t 1,ξ [t 1,D t 1,ξ t,d t ) D t 1,ξ [t 1 where (26) { infxt D Q t(x t 1,ξ [t 1,D t 1,ξ t,d t) = tf t(x t,x t 1,ξ t)+(d t 1 D t)f t (x t,x t 1,ξ t)+q t+1(x t,ξ [t,d t) x t X t(x t 1,ξ t). Now assume that ξ t does not depend on (ξ [t 1,D t ), D t only depends on D t 1, and the distribution of ξ t is discrete with finite support {ξ t1...,ξ tmt }. Denoting p tj = P(ξ t = ξ tj ), equations ()-(26) simplify as follows: Q Tmax+1(x Tmax,) = Q Tmax+1(x Tmax,1), for t = 2,...,T max, Q t (x t 1,), and for t = 2,...,T max, we have M t M t (27) Q t (x t 1,1) = (1 q t ) p tj Q t (x t 1,1,ξ tj,1)+q t p tj Q t (x t 1,1,ξ tj,), where q t is given by (4), { infxt f (28) Q t (x t 1,1,ξ tj,1) = t (x t,x t 1,ξ tj )+Q t+1 (x t,1) x t X t (x t 1,ξ tj ),

8 8 and (29) Q t (x t 1,1,ξ tj,) = { infxt f t (x t,x t 1,ξ tj ) x t X t (x t 1,ξ tj ) Example: a simple portfolio problem. We consider the portfolio selection problem with direct transaction costs given in [22. When the number of stages is random we obtain a problem from the class of problems introduced in the previous Section 3.1. We first recall the dynamic programming equations for this model when the number of stages T max is fixed and known. Let x t (i) be the dollar value of asset i = 1,...,n+1, at the end of stage t = 1,...,T max, where asset n+1 is cash; let ξ t (i) be the return of asset i at t; let y t (i) be the amount of asset i sold at the end of t; let z t (i) be the amount of asset i bought at the end of t with η t (i) > and ν t (i) > the respective proportional selling and buying transaction costs at t. Each component x (i),i = 1,...,n+1, of x is known. The budget available at the beginning of the investment period is n+1 i=1 ξ 1(i)x (i) and u(i) representsthe maximal percentageof capitalthat canbe investedin asseti. Fort = 1,...,T max, given a portfolio x t 1 = (x t 1 (1),...,x t 1 (n),x t 1 (n+1)) and ξ t, we define the set X t (x t 1,ξ t ) as the set of portfolios (x t,y t,z t ) R n+1 R n R n satisfying x t (n+1) = ξ t (n+1)x t 1 (n+1)+ n i=1 x t (i) = ξ t (i)x t 1 (i) y t (i)+z t (i),i = 1,...,n, x t (i) u(i) n+1 ξ t (j)x t 1 (j),i = 1,...,n, x t (i),y t (i),z t (i),i = 1,...,n. ( (1 η t (i))y t (i) (1+ν t (i))z t (i) With this notation, the following dynamic programming equations of a risk-neutral portfolio model can be written: for t = T max, we solve the problem (3) Q Tmax (x Tmax 1,ξ Tmax ) = while at stage t = T max 1,...,1, we solve (31) Q t (x t 1,ξ t ) = where inf f Tmax (x Tmax,x Tmax 1,ξ Tmax ) := E X Tmax X Tmax (x Tmax 1,ξ Tmax ), { inf Qt+1 (x t ) x t X t (x t 1,ξ t ), (32) Q t (x t 1 ) = E ξt [Q t (x t 1,ξ t ), t = 2,...,T max. [ n+1 i=1 Now for t = 1,...,T max, define [n+1 (33) f t (x t,x t 1,ξ t ) and f t (x t,x t 1,ξ t ) = E ξ t+1 (i)x t (i). i=1 ), ξ Tmax+1(i)x Tmax (i) Since the number of stages is fixed to T max then D t = 1,t = 1,...,T max 1,D Tmax = almost surely, and the porfolio problem we have just described is of form (24) with f t,f t as in (33) and D t = 1,t = 1,...,T max 1,D Tmax = almost surely, i.e., we obtain the portfolio problem inf E (34) ξ2,...,ξ Tmax [f Tmax (x Tmax,x Tmax 1,ξ Tmax ) x t X t (x t 1,ξ t ) a.s.,x t F t -measurable, t = 1,...,T max. With this model, we minimize the expected loss of the portfolio (or equivalently maximize the mean income) taking into account the transaction costs, non-negativity constraints, and bounds imposed on the different securities.

9 9 Now assume that the number of stages is random with discrete distribution on {2,...,T max } and define D t by (2). We obtain the portfolio problem (24) with f t,f t as in (33). If ξ t does not depend on (ξ [t 1,D t ), D t only depends on D t 1, and the distribution of ξ t is discrete with finite support {ξ t1...,ξ tmt }, denoting p tj = P(ξ t = ξ tj ), we can write the following dynamic programming equations for the corresponding portfolio problem: Q Tmax+1(x Tmax,) = Q Tmax+1(x Tmax,1), for t = 2,...,T max, Q t (x t 1,) and for t = 2,...,T max, we have M t M t (3) Q t (x t 1,1) = (1 q t ) p tj Q t (x t 1,1,ξ tj,1)+q t p tj Q t (x t 1,1,ξ tj,), where q t is given by (4), (36) Q t (x t 1,1,ξ tj,1) = and (37) Q t (x t 1,1,ξ tj,) = { infxt Q t+1 (x t,1) x t X t (x t 1,ξ tj ), { infxt E[ξ T t+1 x t x t X t (x t 1,ξ tj ). In the case when the number of stages is T max (deterministic) then q t = for t = 2,...,T max 1, q Tmax = 1 and dynamic programming equations (3), (36), (37) become, as expected, (3), (31), (32) (with the notation Q t (x t 1 ) instead of Q t (x t 1,1), Q t (x t 1,ξ tj ) instead of Q t (x t 1,1,ξ tj,1), and Q Tmax (x Tmax 1,ξ Tmax ) instead of Q Tmax (x Tmax 1,1,ξ Tmax,)). 4. SDDP for multistage stochastic risk-neutral programs with a random number of stages 4.1. Assumptions. Consider optimization problem (16) where ξ t does not depend on (ξ [t 1,D t ) andd t onlydepends ond t 1. Weassumethat thedistributions oft andξ t arediscrete: thesupport of T is {2,...,T max } and the support of ξ t is Θ t = {ξ t1,...,ξ tmt } with p ti = P(ξ t = ξ ti ) >,i = 1,...,M t. In this context, equations (19), (2), (21), (22) are the dynamic programming equations for (16). We can now apply Stochastic Dual Dynamic Programming (SDDP, [34), to solve these dynamic programming equations as long as recourse functions Q t (,1) are convex. SDDP has been used to solve many real-life problems and several extensions of the method have been considered such as DOASA [38, CUPPS [7, ReSA [23, AND [3, risk-averse([2, 21, 26, 37, 43, 44) or inexact ([18) variants; see also [16, 24 for adaptations to interstage dependent processes and [46 for extensions for integer stochastic programs. SDDP builds approximations for the cost-to-go functions which take the form of a maximum of affine functions. To ensure convexity of functions Q t (,1), we need convexity of functions f t (,,ξ t ) and of multifunctionsx t (,ξ t )foralmosteveryξ t. Wewillconsidertwosettings: linearandnonlinearprograms. Linear problems. In this setting, f t (x t,x t 1,ξ t ) = c T t x t is linear, (38) X t (x t 1,ξ t ) := {x t R n : A t x t +B t x t 1 = b t, x t }, and random vector ξ t corresponds to the concatenation of the elements in random matrices A t,b t which have a known finite number of rows and random vectors b t,c t. We assume: (H2-L) The set X 1 (x,ξ 1 ) is nonempty and bounded and for every x 1 X 1 (x,ξ 1 ), for every t = 2,...,T, for every realization ξ 2,..., ξ t of ξ 2,...,ξ t, for every x τ X τ (x τ 1, ξ τ ),τ = 2,...,t 1, the set X t (x t 1, ξ t ) is nonempty and bounded.

10 1 Nonlinear problems. In this case, (39) X t (x t 1,ξ t ) = {x t R n : x t X t, g t (x t,x t 1,ξ t ), A t x t +B t x t 1 = b t }, and ξ t contains in particular the random elements in matrices A t,b t, and vector b t. Of course, as a special case (and as is often the case in applications), the nonlinear problems we are interested in can have nonlinear cost and constraint functions for stage t that do not depend on x t 1, namely of form f t (x t,ξ t ) and g t (x t,ξ t ). We assume that for t = 1,...,T, there exists ε t > such that: (H2-NL)-(a) X t is nonempty, convex, and compact. (H2-NL)-(b) For every x t,x t 1 R n the function f t (x t,x t 1, ) is measurable and for every j = 1,...,M t, the function f t (,,ξ tj ) is convex, lower semicontinuous, and finite on X t X εt t 1. (H2-NL)-(c) for every j = 1,...,M t, each component g t,i (,,ξ tj ),i = 1,...,p, of the function g t (,,ξ tj ) is convex, lower semicontinuous, and finite on X t X εt t 1. (H2-NL)-(d) For every j = 1,...,M t, for every x t 1 X εt t 1, the set X t(x t 1,ξ tj ) is nonempty. (H2-NL)-(e) If t 2, for every j = 1,...,M t, there exists such that x t,j,t X t ( x t,j,t 1,ξ tj ). x t,j = ( x t,j,t, x t,j,t 1 ) ri(x t ) X t 1 ri({g t (,,ξ tj ) }) Assumptions (H2-NL)-(a),(b),(c) in the nonlinear case imply the convexity of cost-to-go functions Q t (,1). The assumptions above also ensure (both in the linear and nonliner cases) that SDDP applied to dynamic programming equations (2), (21), (22) will converge, as long as samples in the forward passes are independent, see [38, 17, 14 for details Algorithm. We now describe the steps of SDDP applied to dynamic programming equations (19), (2), (21), (22). We denote by SDDP-TSto this SDDP method for solving (16) 1. SDDP-TSto is very similar to the variants of SDDP presented in [37, 29 where the underlying stochastic process depends on a Markov Chain. 2 This Markov chain (process (D t ) for our DP equations) has only two states in our case. The cost-to-go function is null in one of these states (when D t 1 = ) and the goal of SDDP-TSto is to approximate the cost-to-go function in the other state (when D t 1 = 1) for all stages, i.e., cost-to-go functions Q t (,1),t = 2,...,T max. In the end of iteration k, the algorithm has computed for cost-to-go functions Q t (,1),t = 2,...,T max, theapproximationsq k t(,1),t = 2,...,T max, whicharemaximumofk+1affinefunctions called cuts: (4) Q k t (x t 1,1) = max j k θj t + βj t,x t 1. At iteration k, a realization of the number of stages and a sample for (ξ t ) 1 t Tmax, are generated. Decisions x k t,t = 1,...,T max, are computed on this sample in a forward pass replacing (unkown) function Q t (x t 1,1) by Qt k 1 (x t 1,1). In the backward pass of iteration k, decisions x k t are then used to compute coefficients θt,β k t,t k = 2,...,T max. SDDP-TSto, Step 1: Initialization. For t = 2,...,T max, take for Q t(,1) a known lower bounding affine function θ t + β t, for Q t (,1). Set the iteration count k to 1 and Q T max+1 (,1) = 1 TSto in acronym SDDP-TSto refers to the fact that this SDDP method solves stochastic programs with T stochastic, T being the number of stages. 2 However, in [37,29, problemsare linearwhereas we willdetail SDDP-TSto both forlinearand nonlinear stochastic programs.

11 11 Q T max+1 (,), Q t(,),t = 2,...,T max. Fix a parameter < Tol < 1 (for the stopping criterion). Compute q t,t = 2,...,T max using (4) starting from q 2 = P(T = 2). SDDP-TSto, Step 2: Forward pass. We generate a sample from the distribution of (( ξ k 1, D k 1 ),( ξ k 2, D k 2 ),...,( ξ k T max, D k T max )), γ k = ((ξ k 1,Dk 1 ),(ξk 2,Dk 2 ),...,(ξk T max,d k T max )) ((ξ 1,D 1 ),(ξ 2,D 2 ),...,(ξ Tmax,D Tmax )), with the convention that ξ k 1 = ξ 1, Dk 1 = 1. Cost k =. For t = 1,...,T max, we compute an optimal solution x k t of (41) { infxt R n Dk t 1 f t (x t,x k t 1, ξ k t )+Qk 1 t+1 (x t, D k t ) x t X t (x k t 1, ξ k t ), where D k = 1 and x k = x. Cost k Cost k + D k t 1 f t(x k t,xk t 1, ξ k t ). End For Upper bound computation: If k N compute and the upper bound Cost k = 1 N k j=k N+1 Cost j, ˆσ 2 N,k = 1 N k j=k N+1 [Cost j Cost k 2 U k = Cost k + ˆσ N,k N t N 1,1 α where t N 1,1 α is the (1 α)-quantile of the Student distribution with N 1 degrees of freedom. SDDP-TSto, Step 3: Backward pass. Let Q k t (x t 1,D t 1,ξ t,d t ) be the function given by { (42) Q k t (x infxt D t 1,D t 1,ξ t,d t ) = t 1 f t (x t,x t 1,ξ t )+Q k t+1 (x t,d t ) x t X t (x t 1,ξ t ). Set Q k T (,1) = max+1 Qk T max+1 (,). For t = T max down to t = 2, Set Q k t(,). For j = 1,...,M t, Compute Q k t (xk t 1,1,ξ tj,1), compute Q t (x k t 1,1,ξ tj,) = { infxt f t (x t,x k t 1,ξ tj) x t X t (x k t 1,ξ tj ), compute a subgradient β k tj of Qk t (,1,ξ tj,1) at x k t 1 and a subgradient γ k tj of Q t (,1,ξ tj,) at x k t 1. End For

12 12 (43) Compute End For M t ( ) θt k = (1 q t ) p tj Q k t (xk t 1,1,ξ tj,1) βtj k,xk t 1 M t ( +q t p tj Q t (x k t 1,1,ξ tj,) γtj k,xk t 1 ), M t M t βt k = (1 q t ) p tj βtj k +q t p tj γtj. k Lower bound computation: compute the lower bound L k on the optimal value of (16) given by { infx1 f L k = 1 (x 1,x,ξ 1 )+Q k 2 (x 1,1) x 1 X 1 (x,ξ 1 ). SDDP-TSto, Step 4: If k N and U k L k U k Step 2. Tol then stop otherwise do k k +1 and go to We now show that the cuts computed by SDDP-TSto are valid, that L k is a lower bound on the optimal value of the problem and that the sequence of approximate first stage problems optimal values converges almost surely to the optimal value of (16). Theorem 4.1. Consider optimization problem (16) where for t = 1,...,T max, ξ t does not depend on (ξ [t 1,D t ) and D t only depends on D t 1. Assume that (H1) holds and that the distribution of ξ t is discrete for t = 1,...,T max. In the case of linear problems (X t as in (38)) assume that (H2-L) holds and in the case of nonlinear problems (X t as in (39)) assume that (H2-NL)-(a)-(e) holds. Consider the sequences (x k t) k 1,t = 1,...,T max and (Q k t(,1)) k,t = 2,...,T max, generated by SDDP-TSto to solve the corresponding dynamic programming equations (19), (2), (21), (22). Assume that samples in the forward passes are independent: the sample (( ξ k 1, D k 1 ),( ξ k 2, D k 2 ),...,( ξ k T max, D k T max )) in the forward pass of iteration k is a realization of random vector γ k = ((ξ k 1,Dk 1 ),(ξk 2,Dk 2 ),...,(ξk T max,d k T max )) which has the distribution of ((ξ 1,D 1 ),(ξ 2,D 2 ),...,(ξ Tmax,D Tmax )) and γ 1,γ 2,... are independent. Then (i) for t = 2,...,T max +1, for all k, Q k t(,1) is a lower bounding function for Q t (,1): for all x t 1 we have Q t (x t 1,1) Q k t (x t 1,1) almost surely. (ii) L k computed in Step 3 of SDDP-TSto is a lower bound on the optimal value of (16). (iii) Almost surely the limit of the sequence (f 1 (x k 1,x,ξ 1 ) +Q k 2(x k 1,1)) k 1 is the optimal value of (16). Proof. (i) The proof is by induction on k and t. For k =, we have Q t (,1) Q t (,1),t = 2,...,T max +1. Now assume that (44) Q t (,1) Q k 1 t (,1),t = 2,...,T max +1, forsomek 1. We showbybackwardinduction ontthat Q t (,1) Q k t(,1),t = 2,...,T max +1. For t = T max +1wehaveQ t (,1) = Q k t (bothfunctionsarenull). NowassumethatQ t+1(,1) Q k t+1 (,1)

13 13 for some t {2,...,T max } (induction hypothesis). We want to show that (4) Q t (,1) Q k t (,1). Theinductionhypothesis, togetherwiththedefinitionsofq t andq k t implythatforallj = 1,...,M t: (46) Q t (,1,ξ tj,1) Q k t (,1,ξ tj,1). Therefore, we get (47) Q t (,1) M t M t = (1 q t ) p tj Q t (,1,ξ tj,1)+q t p tj Q t (,1,ξ tj,), (2) (46) M t (1 q t ) p tj Q k t (,1,ξ M t tj,1)+q t p tj Q t (,1,ξ tj,), M t (1 q t ) p tj [ Q k t (xk t 1,1,ξ tj,1)+ β k tj, xk t 1 M t [ +q t p tj Q t (x k t 1,1,ξ tj,)+ γtj k, xk t 1, = θ k t + βk t, by definition of θk t,βk t, where for the second inequality we have used the subgradient inequality and the definition of βtj k,γk tj. Combining (44), (47), and the relation Q k t(,1) = max(qt k 1 (,1),θt k + βt, ), k we obtain (4), which achieves the induction step and the proof of (i). (ii) It suffices to observe that the optimal value of (16) is the optimal value of () and that, due to (i), Q 2 (x 1,1) Q k 2(x 1,1) (recall that under our assumptions Q 2 does not depend on ξ 1 ). (iii) can be proved following the convergenceproofs of SDDP from [38 in the linear case and from [17 in the nonlinear case which apply under our assumptions. In the steps of SDDP-TSto above, we have not detailed the computation of β k tj and γk tj. In the linear and nonlinear settings mentionned above, the formulas for these coefficients are given below. When ξ t = ξ tj, we will denote by A tj,b tj, and b tj the realizationsofa t,b t, and b t, respectively. Computation of βtj k and γk tj in the nonlinear case. Formulas for cuts computed by SDDP when X t is of form (39) were given in [17, see Lemma 2.1 in [17. We recall these formulas below. For the optimization problem inf xt f t (x t,x k t 1,ξ tj)+q k t+1 (x t,1) Q k t (xk t 1,1,ξ A tj,1) = tj x t +B tj x k t 1 = b tj, [λ k1 tj g t (x t,x k t 1,ξ tj), [µ k1 tj x t X t, denote by x k1 tj an optimal solution, consider the Lagrangian L(x t,λ,µ;x k t 1,ξ tj ) = f t (x t,x k t 1,ξ tj )+Q k t+1(x t,1)+λ T (b tj A tj x t B tj x k t 1)+µ T g t (x t,x k t 1,ξ tj ), and optimal Lagrange multipliers (λ k1 tj,µk1 tj ). Similarly, for the optimization problem inf xt f t (x t,x k t 1,ξ tj) Q t (x k A tj x t +B tj x k t 1 t 1,1,ξ tj,) = = b tj, [λ k2 tj g t (x t,x k t 1,ξ tj ), [µ k2 tj x t X t,

14 14 denote by x k2 tj an optimal solution, consider the Lagrangian L(x t,λ,µ;x k t 1,ξ tj) = f t (x t,x k t 1,ξ tj)+λ T (b tj A tj x t B tj x k t 1 )+µt g t (x t,x k t 1,ξ tj), and optimal Lagrange multipliers (λ k2 tj,µk2 tj ). Let f t,x t 1 (x k1 tj,xk t 1,ξ tj ) (resp. f t,x t 1 (x k2 tj,xk t 1,ξ tj )) be a subgradient of convex function f t (x k1 tj,,ξ tj) (resp. f t (x k2 tj,,ξ tj)) at x k t 1. Let g t,i,x t 1 (x k1 tj,xk t 1,ξ tj) (resp. g t,i,x t 1 (x k2 be a subgradient of convex function g t,i (x k1 tj,,ξ tj) (resp. g t,i (x k2 setting β k tj = f t,x t 1 (x k1 tj,x k t 1,ξ tj ) B T tjλ k1 tj + γ k tj = f t,x t 1 (x k2 tj,xk t 1,ξ tj) B T tj λk2 m i=1 tj + m i=1 tj,xk t 1,ξ tj)) tj,,ξ tj)) at x k t 1. With this notation, µ k1 tj (i)g t,i,x t 1 (x k1 tj,x k t 1,ξ tj ), µ k2 tj (i)g t,i,x t 1 (x k2 tj,xk t 1,ξ tj), then β k tj is a subgradient of Qk t (,1,ξ tj,1) at x k t 1 and γk tj is a subgradient of Q t(,1,ξ tj,) at x k t 1 (see Lemma 2.1 in [17 for a justification). Computation of βtj k and γk tj in the linear case. Formulas for the cuts in the linear case are well known. Due to (H1-L) the optimal value of the linear program inf xt c T Q k tj x t +Q k t+1 (x t,1) t (xk t 1,1,ξ tj,1) = A tj x t +B tj x k t 1 = b tj, [λ k1 tj x t, is the optimal value of the corresponding dual problem: (48) Q k t (xk t 1,1,ξ tj,1) = µ i,i = 1,...,k. sup λ,µ λ T (b tj B tj x k t 1)+ k i=1 µ iθt+1 i i=1 µ i = 1, A T tj λ+ k i=1 µ iβ i t+1 c tj, k Let (λ k1 tj,µk1 tj ) be an optimal solution of dual problem (48). Similarly, due to (H1-L) the optimal value of the linear program inf xt c T Q t (x k t 1,1,ξ tj x t tj,) = A tj x t +B tj x k t 1 = b tj, [λ k2 tj x t, is the optimal value of the dual problem { (49) Q t (x k t 1,1,ξ supλ λ tj,) = T (b tj B tj x k t 1 ) A T tj λ c tj. Let (λ k2 tj ) be an optimal solution of dual problem (49). With this notation, setting β k tj = BT tj λk1 tj and γk tj = BT tj λk2 tj, then β k tj is a subgradient of Qk t (,1,ξ tj,1) at x k t 1 and γk tj is a subgradient of Q t(,1,ξ tj,) at x k t 1. Remark 4.2. Clearly, in SDDP-TSto, we can eliminate functions Q t (,) and Q k t (,) since they are known and replace them by the null functions. Therefore, all we have to do is to approximate cost-to-go functions Q t (,1) and we can alleviate notation in SDDP-TSto writing Q t ( ) instead of Q t (,1) and Q k t ( ) instead of Qk t (,1).

15 For the sake of completeness, we now write SDDP-TSto using Remark 4.2. SDDP-TSto, Step 1: Initialization. For t = 2,...,T max, take for Q t( ) = θt + βt,x t 1 a known lower bounding affine function for Q t (,1). Set the iteration count k to 1 and Q T max+1 ( ). Fix a parameter < Tol < 1 (for the stopping criterion). Compute q t,t = 2,...,T max using (4) starting from q 2 = P(T = 2). SDDP-TSto, Step 2: Forward pass. We generate a sample (( ξ k 1, D k 1 ),( ξ k 2, D k 2 ),...,( ξ k T max, D k T max )), from the distribution of ((ξ 1,D 1 ),(ξ 2,D 2 ),...,(ξ Tmax,D Tmax )), with the convention that ξ 1 k = ξ 1, D 1 k = 1. Cost k =. t 1. While D t k = 1, we compute an optimal solution xk t of { infxt R () n f t(x t,x k t 1, ξ t)+q k t+1 k 1 (x t) x t X t (x k t 1, ξ t k) where x k = x. Cost k Cost k +f t (x k t,x k t 1, ξ t). k t t+1. End While We compute an optimal solution x k t of { infxt R (1) n f t(x t,x k t 1, ξ t k) x t X t (x k t 1, ξ t k). Cost k Cost k +f t (x k t,x k t 1, ξ t). k t t+1. While (t T max 1), we compute an optimal solution x k t of { infxt R (2) n x t X t (x k t 1, ξ t k) (note that D k t = and the objective function is null, we only need a feasible point). t t+1. End While Upper bound computation: same as before. SDDP-TSto, Step 3: Backward pass. Let Q k t (x t 1,ξ t ) be the function given by { (3) Q k t (x infxt f t 1,ξ t ) = t (x t,x t 1,ξ t )+Q k t+1 (x t) x t X t (x t 1,ξ t ). Set Q k T max+1 ( ). For t = T max down to t = 2, For j = 1,...,M t, Compute Q k t (xk t 1,ξ tj), compute Q t (x k t 1,1,ξ tj,) = { infxt f t (x t,x k t 1,ξ tj) x t X t (x k t 1,ξ tj),

16 16 compute a subgradient β k tj of Qk t (,ξ tj) at x k t 1 and a subgradient γ k tj of Q t (,1,ξ tj,) at x k t 1. End For Compute θ k t,β k t replacing in (43) Q k t (xk t 1,1,ξ tj,1) by Q k t (xk t 1,ξ tj ). End For Lower bound computation: same as before. SDDP-TSto, Step 4: same as before. Numerical experiments: portfolio selection with a random investment period.1. SDDP-TSto for the portfolio problem. We consider the portfolio problem given in Section 3.2 and the corresponding dynamic programming equations (3), (36), (37) when the number of stages is stochastic. We now write the SDDP algorithm to solve these dynamic programming equations with the assumptions used to write them in Section 3.2. Step 1: Initialization. For t = 2,...,T max, take for Q t ( ) a known lower bounding affine function θt + βt, for Q t (,1). Set the iteration count k to 1 and Q T max+1 ( ). Fix a parameter < Tol < 1 (for the stopping criterion). Compute q t,t = 2,...,T max using (4) starting from q 2 = P(T = 2). Step 2: Forward pass. We generate a sample (( ξ k 1, D k 1),( ξ k 2, D k 2),...,( ξ k T max, D k T max )), from the distribution of ((ξ 1,D 1 ),(ξ 2,D 2 ),...,(ξ Tmax,D Tmax )), with the convention that ξ 1 k = ξ 1, D 1 k = 1. t 1. While D t k = 1, we compute an optimal solution xk t { of infxt R (4) n Qk 1 t+1 (x t) x t X t (x k t 1, ξ t) k where x k = x. t t+1. End While We compute an optimal solution x k t of () Cost k E[ n+1 i=1 ξ t+1 (i)x k t (i). inf xt R n E[n+1 ξ t+1 (i)x t (i) i=1 x t X t (x k t 1, ξ t k). t t+1. While (t T max 1), we compute an optimal solution x k t { of infxt R (6) n x t X t (x k t 1, ξ t k) (note that D k t = and the objective function is null, we only need a feasible point). t t+1.

17 17 End While Upper bound computation: If k N compute and the upper bound Cost k = 1 N k j=k N+1 Cost j, ˆσ 2 N,k = 1 N k j=k N+1 U k = Cost k + ˆσ N,k N t N 1,1 α [Cost j Cost k 2, where t N 1,1 α is the (1 α)-quantile of the Student distribution with N 1 degrees of freedom. Step 3: Backward pass. Set Q k T max+1 ( ). For t = T max down to t = 2, For j = 1,...,M t, Solve the optimization problem inf E[ n+1 i=1 ξ t+1 (i)x t (i) x t X t (x k t 1,ξ tj ), with Lagrangian L(x t,y t,z t,λ 1,µ 1,δ 1 ) given by [ E[ n+1 y ξ t+1 (i)x t (i)+ λ 1,ξ tj x t 1 x t + t +z t i=1 (e η t ) T y t (e+ν t ) T z t + µ 1,y t ξ tj (1 : n) x t 1 (1;n) + δ 1,x t (1 : n) (ξtj Tx t 1)u where e is a vector in R n of ones, λ 1 R n+1,µ 1,δ 1 R n, and for vectors x,y, the vector x y has components (x y)(i) = x(i)y(i) and x,y = x T y. For this Lagrangian, let (λ k 1tj,µk 1tj,δk 1tj ) be optimal Lagrange multipliers. Solve the optimization problem { infq k t+1 (x t ) x t X t (x k t 1,ξ tj), with Lagrangian L(x t,y t,z t,λ 2,µ 2,δ 2 ) given by [ Q k y t+1(x t )+ λ 2,ξ tj x t 1 x t + t +z t (e η t ) T y t (e+ν t ) T z t + µ 2,y t ξ tj (1 : n) x t 1 (1;n) + δ 2,x t (1 : n) (ξtj Tx t 1)u where e is a vector in R n of ones, λ 2 R n+1,µ 2,δ 2 R n. For this Lagrangian, let (λ k 2tj,µk 2tj,δk 2tj ) be optimal Lagrange multipliers. Compute γ k tj = (λ k 1tj (ut δ k 1tj )e [ µ k 1tj β k tj = (λ k 2tj (ut δ k 2tj )e [ µ k 2tj where e is a vector in R n+1 of ones. End For ) ξ tj, ) ξ tj,

18 18 Compute 3 End For M t M t θt k = and βt k = (1 q t ) p tj βtj k +q t p tj γtj. k Lower bound computation: compute the lower bound L k on the optimal value of the portfolio problem given by { infx1 Q L k = k 2 (x 1) x 1 X 1 (x,ξ 1 ). Step 4: If k N and U k L k U k Tol then stop otherwise do k k +1 and go to Step Numerical results. Our goal in this section is to compare SDDP and SDDP-TSto on the risk-neutral portfolio problem with direct transaction costs given in Section 3.2. All subproblems in the forward and backward passes of SDDP and SDDP-TSto were solved numerically using the interior point solver of the Mosek Optimization Toolbox [1. The following parameters are chosen for our experiments. Distributions of T and of returns (ξ t ). In [9, the lifetime of more than publicly traded North American companies, from 19 to 29, was analyzed. It was shown that mortality rates are independent of the company s age, the typical half-life of a publicly traded company is about a decade, and the exponential distribution is a good fit for the lifetime of these companies on this period. Therefore, for such companies, the exponential distribution makes sense for the duration of an optimization period of a portfolio problem. However, since the number of stages is almost surely in the set {2,...,T max }, instead of an exponential distribution, we take for T a translation of a discretization of an exponential distribution conditioned on the event that this exponential distribution belongs to [ 1 2,T max 1 2 [ with T max = 1. 4 More precisely, let X E(λ) be the exponential distribution with parameter λ =. with expectation E[X = 1 λ Setting Y = X A where A is the event A = {ω : 1 2 X(ω) T max 1 2 } and defining the random variable T by T = t+1 if and only if t 1 2 Y < t+ 1 2 for t = 1,...,T max 1, then the distribution of the number of stages T is given by ( P(T = t+1) = P t 1 2 Y < t+ 1 ) = P(t 1 2 X < t+ 1 2 ) 2 P(A) = e λ(t 1 2 ) e λ(t+1 2 ) e λ/2 e λ(tmax 1 2 ), for t = 1,...,T max 1. The histogram of the distribution of T is represented in Figure 2 together with the graph of the density of 2+X over the interval [2,1. The return of the risk-free asset n +1 is 1.1 for every stage. Returns ξ t,t = 2,...,T max, have discrete distributions with M t = M = 2 realizations, each having probability 1 M =. (in the notation ofsection 4, wehavep tj = 1 M,t = 2,...,T max,j = 1,...,M). The realizationsareobtained as follows. Let n be the number of assets (in the instances chosen n {4,8,2}, i.e., n is even). We define a matrix A of average returns by A(t,i) = 1.6 if 1 t 4,1 i n/2, 1.4 if t T max +1 = 11,1 i n/2, 1. if 1 t T max +1 = 11,n/2+1 i n. 3 Observe that the intercept for the cuts is zero for that application. 4 Considering the values chosen for the realizations of the returns and today s usual asset returns, we can consider that a stage corresponds to a year and that the maximal duration of the optimization period is T max = 1 years.

19 Figure 2. Histogram of the distribution of T with support {2,...,T max } = {2,...,1} and density of 2 + E(λ) (in dotted line) on the interval [2,1 with λ =.. We generate independently for every t = 2,...,T max +1, i = 1,...,n, a sample ξ tj (i),j = 1,...,M, of size M from the distribution of the Gaussian random variable with expectation A(t, i) and standard deviation.2. These samples define the supports {ξ tj,j = 1,...,M} of the distributions of ξ t,t = 2,...,T max, recalling that {ξ tj (n + 1) = 1.1 (risk-free asset return). The vector of expectations E[ξ t,t = 2,...,T max + 1, is given by E[ξ t (i) = 1 M M ξ tj(i),i = 1,...,n, while E[ξ t (n+1) = ξ t (n+1) = 1.1. The first stage return ξ 1 (i) is obtained generating, independently from previous samples, a realization from the Gaussian distribution with expectation A(1, i) and standard deviation.2 for i = 1,...,n. Remaining parameters of the portfolio problem. The initial portfolio x has components x (i),i = 1,...,n+1, uniformly distributed in [,1 (vector of initial wealth in each asset). The largest position in any security is set to 1%, i.e., u(i) = 1 for i = 1,...,n. Several values for the transaction costs will be considered, see below. Parameters of SDDP and SDDP-TSto methods. Using the notation of the previous section, SDDP and SDDP-TSto are run with parameters N = 2, α =., and Tol=.. Comparison of the distributions of income and of the mean income of both policies on Monte-Carlo simulations. We generate instances of the portfolio problem using the parameters described above and varying n in the set {4,8,2} with the transaction costs in the set {.1,.1,.3,.,.7}. For each instance, we run the (traditional) SDDP algorithm considering that the number of stages is fixed to the maximal possible number of stages T max = 1. We end up with approximate cost-to-go functions which can be used to define a policy called SDDP policy. Similarly running SDDP-TSto given in Section.1 on the portfolio problem we obtain approximate cost-to-go functions for each stage that can be used to define a policy called SDDP-TSto policy. These policies Observe that all problem data are simulated. Our objective in this experiment is not to test the model on a single set of real data but rather to use the portfolio problem as an example of a multistage stochastic optimization problem for which the stochasticity assumption on T makes sense and to present preliminary results generating a few instances of this problem with simulated data. All parameters used to generate the instances are given.

20 2 n T ν t (i) = µ t (i) SDDP-TSto SDDP Table 1. Empirical mean income obtained with SDDP and SDDP-TSto on several instances portfolio problem (3), (36), (37). are compared on a set of Monte-Carlo simuations. For each simulation, a value of T is sampled and a sample of size T is generated for the returns. Applying both policies on these trajectories, we obtain for each simulation an income at the end of the (stochastic) optimization period with each policy. The empirical mean income (for this set of simulations) for both policies is given in Table 1 on the instances of portfolio problems tested. We also represent in Figures 3, 4, the empirical distribution of the income obtained with SDDP- TSto policy minus the income obtained with SDDP policy as well as the empirical distribution of the income obtained with SDDP-TSto policy divided by the income obtained with SDDP policy. We see that compared to SDDP, the mean income with SDDP-TSto is (as expected) larger in all instances and the income is larger in nearly all scenarios for all instances. This illustrates the advantage of using an appropriate model that takes the stochasticity of the number of stages into account or, equivalently, the loss entailed by the use an inadequate model. 6. Conclusion We introduced the class of multistage stochastic programs with a random number of stages. We explained how to write dynamic programming equations for such problems and detailed the SDDP algorithm to solve these dynamic programming equations. We have shown the applicability and interest of the proposed models and methodology for portfolio selection. As a future work, it would be interesting to consider more general hybrid stochastic programs with transition probabilities between objective and cost functions, meaning that at each stage not only parameters but also cost and constraint functions are random, possibly depending on past values of parameters and cost and constraint functions. It would also be interesting to use the proposed models and methodology for other applications, for instance Asset Liability Management or the applications mentionned in the introduction.

21 ν t (i) = µ t (i) =.1, n = 4 ν t (i) = µ t (i) =.1, n = ν t (i) = µ t (i) =.1, n = 4 ν t (i) = µ t (i) =.1, n = ν t (i) = µ t (i) =.3, n = 4 ν t (i) = µ t (i) =.3, n = ν t (i) = µ t (i) =., n = 4 ν t (i) = µ t (i) =., n = ν t (i) = µ t (i) =.7, n = 4 ν t (i) = µ t (i) =.7, n = 4 Figure 3. Empirical distribution of the income obtained with SDDP-TSto policy minus the income obtained with SDDP policy (right plots) and empirical distribution of the income obtained with SDDP-TSto policy divided by the income obtained with SDDP policy (left plots) for several instances of the portfolio problem with n = 4 assets and several values of the transaction costs.

DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS

DASC: A DECOMPOSITION ALGORITHM FOR MULTISTAGE STOCHASTIC PROGRAMS WITH STRONGLY CONVEX COST FUNCTIONS Vincent Guigues School of Applied Mathematics, FGV Praia de Botafogo, Rio de Janeiro, Brazil vguigues@fgv.br