Dynamic portfolio choice: a simulation-and-regression approach

Optim Eng (07) 8:369 406 DOI 0.007/s08-07-9347-4 Dynamic portfolio choice: a simulation-and-regression approach Michel Denault, Erick Delage, Jean-Guy Simonato 3 Received: 9 April 05 / Revised: 9 February 06 / Accepted: December 06 / Published online: March 07 Ó Springer Science+Business Media New York 07 Abstract Simulation-and-regression algorithms have become a standard tool for solving dynamic programs in many areas, in particular financial engineering and computational economics. In virtually all cases, the regression is performed on the state variables, for example on current market prices. However, it is possible to regress on decision variables as well, and this opens up new possibilities. We present numerical evidence of the performance of such an algorithm, in the context of dynamic portfolio choices in discrete-time (and thus incomplete) markets. The problem is fundamentally the one considered in some recent papers that also use simulations and/or regressions: discrete time, multi-period reallocation, and maximization of terminal utility. In contrast to that literature, we regress on decision variables and we do not rely on Taylor expansions nor derivatives of the utility function. Only basic tools are used, bundled in a dynamic programming framework: simulations which can be black-boxed as a representation of exogenous state variable dynamics; regression surfaces, as non-anticipative representations of expected future utility; and nonlinear or quadratic optimization, to identify the best portfolio choice at each time step. The resulting approach is simple, highly flexible and offers good performance in time and precision. Keywords Simulation-and-regression methods Least-squares Monte Carlo methods Dynamic programming Portfolio choice Portfolio optimization & Michel Denault michel.denault@hec.ca 3 Department of Decision Sciences, HEC Montréal, Montréal, Canada GERAD, Montréal, Canada Department of Finance, HEC Montréal, Montréal, Canada

370 M. Denault et al. Introduction The characterization of portfolio choices is a classical and important topic in finance. With few exceptions, the vast majority of dynamic portfolio choice models (multi-period setting with a possible portfolio reallocation at each time point), do not have closed form solutions and must rely on numerical methods. Examples of these can be found in Brennan et al. (997) with the numerical solution of partial differential equations, Dammon et al. (000) with binomial trees, or Balduzzi and Lynch (999) with numerical integrations by quadratures. The main drawbacks of these lattice approaches is the difficulty to handle many state variables and realistic stock return processes. More recently, some authors have proposed the use of Monte Carlo simulations to handle these portfolio problems. These methods offer a high level of flexibility in modeling the behaviour of the stochastic variables. Typically, they can even be black-boxed, meaning that no information is required about the underlying stochastic processes, as long as simulation samples can be generated. One example of the use of simulations is Detemple et al. (003), who use Malliavin calculus and Monte Carlo simulations to compute optimal portfolios in a continuous-time (complete-market) dynamic setting. Their approach can handle a large number of state variables and is shown to converge to the optimal solutions. This method is however best suited to complete-market settings with diffusion processes. Incomplete market problems can be handled, but only if an analytical solution is available for the dual problem. In a discrete-time, incomplete market setting, Brandt et al. (005) and Garlappi and Skoulakis (00) propose simulation-and-regression algorithms to compute optimal portfolios for a variety of problems. These algorithms use a Taylor expansion of the value function at every time point, and a combination of leastsquares regressions and Monte Carlo simulations or quadratures to obtain the required expected values entering these approximate value functions. The algorithm proposed here is related to these last two papers and falls within the category of simulation-and-regression approaches. However, unlike the above algorithms, we avoid the need of a Taylor series expansion of the value function. This gives a simpler algorithm which does not require the work and complication associated with the computation of the derivatives of the utility function. These simplifications are achieved by introducing in the regressions the weights of the assets as independent variables. The resulting regression surfaces then become functions of the weights, which can be optimized to find the optimal portfolios. The proposed approach can handle several risk sources, besides the risky asset returns. These risk sources can take the form of state variables that would impact the returns in the simulations. Depending on whether or not these sources of risk help in As outlined in Brandt (00), a financial market is complete when all future outcomes (states) are spanned by the payoffs of traded assets. As shown in Cox and Huang (989, 99), the assumption of complete markets simplifies dynamic portfolio choices and can lead to closed form solutions for the investment policies. Such complete markets are however typically associated with a continuous time setting. Here, we examine a discrete time trading problem for which the market is incomplete. This framework does not allow explicit solutions except for few very specific cases.

Dynamic portfolio choice: a simulation-and-regression 37 predicting some characteristics of the returns, they can be included in the independent variables of the regressions to condition the expected values. Although the present paper mainly deals with problems looking at the utility of terminal wealth with positivity constraints on the portfolio weights, other objective functions and constraints could be handled. For example, it could be possible to define a dynamic problem attempting to minimize the mismatch between asset and liabilities in the context of a pension fund trying to allocate different categories of assets. Such a problem would require redefining the objective function in the procedure outlined in the paper, but would keep the overall procedure fairly similar. Constraints that might be relevant to the problem and that are functions of the portfolio weights can be included and imposed during the optimization stages of the regression functions. We also note that, although we work with a constant interest rate, the proposed method can accommodate a stochastic interest rate, which can be simulated and used straightforwardly in the simulation-and-regression procedure. As with other simulation-and-regression procedures, we use a recursive dynamic programming algorithm. Simulation-based approaches offer great flexibility to handle realistic discrete-time stochastic processes (more so than lattice methods). For example, GARCH processes with non-normal errors or discrete-time processes with jumps can be used in these procedures without much modifications. However, in opposition to Taylor expansion based methods, the proposed algorithm can handle utility functions that are not differentiable everywhere, such as piecewise linear utility functions; this has not been attempted with Taylor expansion based approaches. Although it is impossible to completely avoid the curse of dimensionality, the tests performed here also indicate that the proposed method, like other simulation-and-regression methods, suffers much less from this curse than quadrature approaches. There is a well established literature on the use of simulations and regressions to solve generic dynamic programs, which often go under the name of approximate dynamic programming. For example, 0 years ago, Bradtke and Barto (996) introduced their least-squares temporal differences learning algorithm. Including decision variables in the value function is not new either as can be witnessed in the literature on Q-learning, with the fundamental contributions of Watkins (989) and Watkins and Dayan (99); later on, Lagoudakis et al. (00) introduced what could be described as least-squares Q-learning (for discrete states, discrete decisions and infinite horizon problems). This dynamic programming literature which uses simulations, and often regressions, was in fact developed by different communities, and goes under the various names of reinforcement learning, artificial intelligence, machine learning, optimal control, neuro-dynamic programming, Markov decision processes and approximate dynamic programming. The books by Bertsekas (0) and Sutton and Barto (998) offer a wide and deep coverage of these fields. Volume II of Bertsekas (0) contains, in the third and last part of section 6.6.6, a generic, discrete-state algorithm that corresponds quite well to the method developed here, although it is not clear whether the author implied the possibility of a simulationbased variant; it appears to be an original contribution of the author. Much closer to finance problems, but still embedded in the Q-learning literature, we mention Tsitsiklis and Van Roy (00), who tackle (finite horizon) option

37 M. Denault et al. pricing problems. Such problems typically have a simple decision space: exercise the option, or continue holding it. The methods of Approximate Dynamic Programming were not often applied to portfolio optimization problems. We found two papers in all. The first is Weissensteiner (009) who applies Q-learning on a consumption and investment problem, though without regressions and in a forward pass manner that is standard in Q-learning. Numerical experiments are very limited to a single asset, closed-form problem. The second paper is that of Nijman et al. (007), cited in Koijen et al. (00). It is published as a finance paper, as compared to a methodological one, hence the algorithm is somewhat hidden in the companion working paper. The authors approach, which they say is a combination of ideas taken from Brandt et al. (005) and Carroll (006), is similar to the one adopted here, including regression on the decision variables. The work presented here, however, differs from it in many ways. First, it focuses on exploring the possibilities of the method, and it provides results on a diverse series of portfolio choice problems that are compared using numerical quadrature or analytical solutions. These examples show various difficulties coming from the size of the problem (number of risky assets, number of periods), the returns stochastic behavior (serial dependence) and utility function (dependence on wealth). To the best of our knowledge, incomplete-market portfolio choice problems of this size and complexity (see Sect. 4.4) have not been solved previously in the literature. We also illustrate the limits of Taylor expansion approaches with an example on conditional value at risk and expectile minimization. Second, we find that with a careful implementation, the computation of the regression surfaces themselves (not an approximation) is not computationally unattractive as Koijen et al. suggest (see more details in Sect. 3.4). Third, we work directly on the optimization problems, not on a set of Karush Kuhn Tucker equations. This offers more insight and flexibility in choosing the bases and incorporating constraints, and is arguably simpler as well. Fourth, we introduce an inverse utility transformation similar to that proposed in Garlappi and Skoulakis (00). This transformation allows for an important reduction in the number of discrete wealth points required to achieve such precise results. Fifth, we compare two different approaches to evaluate a decision, dubbed realized values and regression surface values. Sixth, we give preliminary evidence that a two-step procedure with regression surfaces on smaller domains can provide enhanced precision, see example. We point out briefly the main algorithmic alternative for the type of problems we study, which is stochastic programming. The two approaches differ importantly in their treatment of the uncertainty, with stochastic programming relying on scenario trees. Such a difference makes a meaningful comparison of the two techniques rather non-trivial. See for example Chapter 6 of Zenios (007) for an overview of stochastic programming for portfolio optimization. Given this brief literature review, this paper s contribution to the literature is best viewed as the application of a variant of relatively well-studied algorithms, to a We found the papers by Koijen et al. and the section 6.6.6 of Bertsekas (0) after the paper was written up; in fact, we came to the method we propose by generalizing simulation-and-regression techniques used in option pricing.

Dynamic portfolio choice: a simulation-and-regression 373 class of problems on which they have not been applied, i.e. non-simplistic portfolio choice problems (including auto-regressive stochastics, multi-dimensionality, nonhomogenous and/or non-differentiable utility functions, etc.) In the next section, we outline the problem of the investor. We describe the approach in Sect. 3 with a simple example first, to provide the intuition, and the general algorithm next. Numerical results are presented in Sect. 4 and the paper concludes with Sect. 5. Problem description In this section, we describe the problem, indicate the main hypotheses, and introduce the notation. We consider an investor endowed with wealth W 0 at time t ¼ 0 who wants to maximize the expected utility of his terminal wealth at date T. The investor can trade N risky assets and a risk-free security at times t ¼ 0; ;...; T. The investor s decisions should thus account for the possibilities of rebalancing his portfolio at future time points. In general, for this situation, the optimal investment decision will be different from the one of an investor with a myopic single-period horizon, and a dynamic approach is required. 3 We also assume that there are no frictions like taxes or transaction costs, but we note that the proposed computational approach could be amended to include these costs. 4 With these assumptions in mind, the investor s problem can be expressed as V 0 ðw 0 ; S 0 Þ ¼ max x min fx t g T t¼0 x max E 0 ½uW ð T ÞŠ with the following constraints for all t : W tþ ¼ W t x > t R tþ þ R f where E t denotes the expectation conditional on the information available at time t (here the initial wealth W 0 and initial state variable value S 0 ), uðþ is a concave utility function with the usual regularity conditions, R f is the gross risk-free rate with simple compounding, R tþ is a N vector of simple returns in excess of the risk-free rate for the time period between t and t þ. Portfolio weights between 3 It can be optimal for a multi-period investor to invest myopically under specific circumstances. A first circumstance is the case of constant investment opportunities, i.e. (i.i.d.) returns. A second circumstance is the case of stochastic but unhedgable investment opportunities. In this situation, the available assets cannot be used to hedge against changes in future investment opportunities and a myopic policy will be optimal. Finally, a logarithmic utility function will also yield an optimal myopic policy. See Brandt (00) for more details. 4 Such a modification would however be outside the scope of this paper as it would require non-trivial modifications. For example, with proportional transaction costs as in Balduzzi and Lynch (999), the decisions of the investor, in addition to be conditioned on potential state variable values, should also be conditioned on an inherited allocation which is the result of the previous allocation and the realization of random returns. At each reallocation point, the investor should assess if the potential gain in expected utility from a new optimal allocation is sufficient to justify the loss associated with the transaction costs triggered by the new allocation.

374 M. Denault et al. times t and t þ are represented by the N vector x t. Any linear constraints are represented as x X, which usually contains lower and upper bounds, and an allocation constraint (sum to 00%). We assume for simplicity that the vector of excess stock returns is generated from a known discrete-time stochastic process; however, the only requirement is the ability to simulate excess stock returns. It is also assumed that the process is Markovian; Example 4, for instance, relies on an autoregressive process. The state variable process S t is Markovian; it includes the excess returns and eventually more than the returns (see Example 3). For notational simplicity, we keep state variables W t and S t separate however. As in most of the literature, the investor does not have market power, i.e. his decisions do not impact the returns. In the above problem, wealth is an endogenous variable that varies stochastically with time according to the portfolio weights x t and the exogenous random returns R tþ. By endogenous, we mean that such a state variable is influenced by the decisions; exogenous variables are not. Depending on the utility function, the current wealth level may or may not influence the optimal portfolio weights. Similarly, depending on the return process, the current state of the process might influence the optimal weights. Using the principle of optimality of dynamic programming, the above value function can be rewritten as the following recursion for t ¼ T ;...; 0: V t ðw t ; S t Þ ¼ max E t V tþ W t x > x min x t x t R tþ þ R f ; Stþ max ðþ with the terminal condition V T ðw T ; S T Þ ¼ uw ð T Þ: 3 A simulation-and-regression algorithm that regresses on decision variables In this section, we describe the approach adopted here to solve the above dynamic programming recursion. Section 3. discusses the three main tasks at each time step of the recursion. Section 3. introduces the algorithm with a short, intuitive numerical example. Section 3.3 provides the algorithm itself, and Sect. 3.4 discusses some implementation details. 3. Overview Solving the dynamic program () with the proposed approach involves three main tasks at each time step, as discussed below. It is important to first make a distinction between the state variable wealth and other state variables such as past returns (or dividends, economic indicators, etc.). The former is endogenous in that its value depends on portfolio decisions; the later are exogenous, in that they are independent of portfolio decisions. Endogenous state variables do not lend themselves easily to

Dynamic portfolio choice: a simulation-and-regression 375 simulation-and-regression schemes, at least not in the nonstationary case we consider (see Denault et al. 03 for a pseudo-simulation treatment of endogenous state variables). As such, wealth is discretized, as (all) state variables usually are in classical dynamic programming. Exogenous state variables are treated by simulation-and-regression. The three main tasks that build the approach are as follows. Build a representation of the value function In the absence of analytical solutions, the value function V t needs to be represented compactly. We first build a parametric representation of future utilities, in terms of bases formed with both exogenous state variables and the decision variables. Using the exogenous state variables is rather natural, see the second task (expectations). Using the decisions variables as bases is not as common, but the rationale is simple, see below. Note that this parametric representation is not an approximate value function, it is broader in scope: it approximates future utilities under all decisions, optimal and suboptimal. See the third task. Computation of expectations Expectations are used to account for the unknown future. They enforce non-anticipativity, i.e. decisions must not rely on future information. This is where simulations and regressions do their work. Monte Carlo paths simulations serve to represent the dynamics of the exogenous state variables and returns. The parametric value function approximation is adjusted by regression to implement the expectation. Optimization of the portfolio positions The above parametric representation of future utilities, conditional on any simulated state variable value and wealth level, can be maximized (under constraints) with respect to the portfolio decision. This optimal portfolio decision can be used to compute the value function (for that specific state variable value, and wealth level). To conclude, the value function representation at any point in time is therefore a discrete set of numbers, each associated with a specific path of the exogenous state variable, and a specific wealth level. 3. Introductory example An intuitive understanding of the algorithm is best conveyed by a simple example; the reader who so wishes can skip to the general algorithm in Sect. 3.3. Here, all the computations have been performed with full precision. The reported values are however rounded to the fourth digit (or more if required). Consider the two-period case of a portfolio consisting of one risky and one riskfree asset with a constant gross risk-free rate R f ¼ :05. The investor, endowed with current wealth W 0 ¼ at time t ¼ 0, wants to maximize the utility of his wealth at time t ¼. Asset allocation takes place at time t ¼ 0 and t ¼. No short sale nor borrowing is allowed. The algorithm requires a preliminary step where grids of discrete portfolio weights and wealths are built, and excess returns are simulated. Let us use the grid

376 M. Denault et al. of portfolio weight x for the risky asset (while the weight of the risk free asset should simply be x): x ¼ f0:0; 0:3; 0:7; :0 g; while the grid of wealth w k;t, with t f0; g, is provided in Table. We keep this introduction as simple as possible and use only two sample paths of excess returns at each period (Period from t ¼ 0tot¼, Period from t ¼ to t ¼ ), as in Table : In a second step, the recursive dynamic programming procedure is implemented with the above quantities. Starting at t ¼, one period before maturity, we perform the following operations, for each state of wealth at this time period. We first compute a sample of wealth at t ¼. For example, from the wealth level w ; ¼ :3, we compute a wealth at t ¼ using the first weight and path a of the simulated excess return : W x ; R ðaþ ¼ w ; x R ðaþ þ R f ¼ :3 ð0:00 0:0 þ :05Þ ¼ :395: Using all the possible combinations of simulated returns at t ¼ and portfolio weights on the grid, the end-of-period sample of wealths at t ¼ is: h i Wðx ; R ðaþ Þ; Wðx ; R ðaþ Þ; Wðx 3; R ðaþ Þ; Wðx 4; R ðaþ Þ; Wðx ; R ðbþ Þ; Wðx ; R ðbþ Þ; Wðx 3; R ðbþ Þ; Wðx 4; R ðbþ Þ ¼ ½:395 :38 :3395 :348 :395 :37 :303 :935 Š : Computing the utility of these end-of-period wealths with the constant absolute risk aversion (CARA) utility function uðwþ ¼ expð cwþ with c ¼ 3, the following linear regression system can be formed: ðþ Table Grid of wealths t ¼ 0 t ¼ w 0 ¼ w ; ¼ :3 w ; ¼ 0:8 Table Sample paths of excess returns Path Period Period a b R ðaþ ¼ 0:030 R ðaþ ¼ 0:0 R ðbþ ¼ 0:040 R ðbþ ¼ 0:00

Dynamic portfolio choice: a simulation-and-regression 377 3 0:09 :0 0:0 0:0 3 0:086 :0 0:3 0:3 0:080 :0 0:7 0:7 0:075 :0 :0 :0 6 ¼ 0:09 :0 0:0 0:0 4 0:095 :0 0:3 0:3 6 7 6 4 0:00 5 4 :0 0:7 0:7 7 5 0:006 :0 :0 :0 b b b 3 3 7 5 þ 6 4 where the elements on the left-hand side are the utility of the computed wealths at t ¼, and the lines of the independent variable matrix are formed with a constant, the discretized portfolio weight for the risky asset and its squared value. In this system, e i are random errors with zero expected value, and b to b 3 are unknown coefficients, whose estimated values obtained with an ordinary least square regression are: e e e 3 e 4 e 5 e 6 e 7 e 8 3 ; 7 5 bb ¼ 0:09; b b ¼ 7:44 0 5 ; b b3 ¼ 6:39 0 5 : Using these estimated coefficients, the approximate expected utility given a w ; ¼ :3 wealth level at t ¼, can be written as a continuous function of the portfolio weights, 5 which can be used to find the optimal proportion as : bx ; ¼ arg max 0:09 þð7:44 0 x 0 5 Þx ð6:39 0 5 Þx : The optimal portfolio weight can easily be verified to be bx ; ¼ 0:58 with a function value at the optimum of bv ; ¼ 0:09. This represents the highest achievable expected utility value, conditional on the information available at t ¼, and a wealth value of.3. It is thus a numerical estimate of a point on the value function for a wealth of.3. Doing similar calculations for the other state of wealth w ; ¼ 0:8 att ¼, we obtain Table 3 summarizing the computed quantities at t ¼ for each state of wealth. In this example, the occurrence of identical optimal portfolios for different paths at the same wealth level is normal, and due to the fact that we do not regress on exogenous state variables; if we did introduce exogenous state variables, there would be one optimization problem per path (and wealth level), and the solutions would likely be different. The fact that the values are the same for different paths at the same wealth level is also normal; note however that in an alternate specification of the algorithm illustrated at the end of this section, the values will likely be different, even though we do not regress on exogenous state variables. Using the above values, the optimal weight at t ¼ 0 can be computed using a similar procedure started at the wealth value w 0 ¼. We first compute a sample of 5 This continuous function is a diminutive version of the regression surface L k;t ðx; sþ that will be introduced in Sect. 3.3.

378 M. Denault et al. Table 3 Values associated with two wealths at t ¼ w k; bx k; bv ðþ j k;.3 bx ðþ a ; bx ðþ b ; 0.8 bx ðþ a ; bx ðþ b ; a ¼ 0:58 bv ðþ ; ¼ 0:09 b ¼ 0:58 bv ðþ ; ¼ 0:09 a ¼ 0:945 bv ðþ ; ¼ 0:0874 b ¼ 0:945 bv ðþ ; ¼ 0:0874 Table 4 Values for interpolation at wealth,050 Wealth Value.3 0:09.050? 0.8 0:0874 wealths at t ¼ using the combinations of simulated returns and portfolio weights. This sample of wealths at t ¼ is h i Wðx ; R ðaþ Þ; Wðx ; R ðaþ Þ; Wðx 3; R ðaþ Þ; Wðx 4; R ðaþ Þ; Wðx ; R ðbþ Þ; Wðx ; R ðbþ Þ; Wðx 3; R ðbþ Þ; Wðx 4; R ðbþ Þ ¼ ½:050 :040 :0360 :0450 :050 :070 :0430 :0550 Š : Using these, we can generate a sample of points on the value function at t ¼ that will be used as dependant variables in a regression. This can be done by interpolation with the pairs of wealths and points on the value function, which are available in the above table summarizing the computed quantities at t ¼. For example, a first point on the value function at t ¼, associated with decision x ¼ 0 and path a, can be obtained by a linear interpolation computed with the quantities in Table 4: With these, the linearly interpolated value is 0:0580. A sample of points on the value function at t ¼ can be generated in a similar fashion to obtain the linear system 3 0:0580 :0 0:0 0:0 3 3 e 0:0568 :0 0:3 0:3 e 0:055 :0 0:7 0:7 3 e 3 0:0539 :0 :0 :0 b 6 7 e 4 ¼ 0:0580 :0 0:0 0:0 4 b 5 þ ; e 5 0:0564 :0 0:3 0:3 b 3 e 6 6 7 6 4 0:054 5 4 :0 0:7 0:7 7 6 7 5 4 e 7 5 0:056 :0 :0 :0 and the optimal portfolio allocation as the solution of: e 8 ð3þ

Dynamic portfolio choice: a simulation-and-regression 379 bx 0 ¼ arg max 0 x 0:0580 þ 0:0048x ð9:47 0 7 Þx which yields bx 0 ¼ :0, the optimal portfolio at t ¼ 0. This completes the dynamic program. Alternate specification of the algorithm: realized values The algorithm illustrated above is straightforward; we later refer to it as the regression surface value specification (see Sect. 3.4). However, the regression surface values bv k; from Table 3 used above in the interpolations are not the unique choices that could be made for such calculations. At t ¼, alternate values, which we call realized values, can be computed using the optimal portfolio weights bx k;. These values also represent a maximum expected utility value, but with the additional information provided by the realized paths at t ¼. For path a and wealth at t ¼ w ; ¼ :3, such a realized value is computed with while it is ; ¼ u w ; bx ; R ðaþ þ R f ¼ uð:3 ð0:58 0:0 þ :05ÞÞ ¼ 0:08: v ðaþ v ðbþ ; ¼ u ð :3 ð 0:58 ð 0:00Þþ:05 ÞÞ ¼ 0:000 for sample path b (and still w ; ¼ :3). The information that would be available at t ¼ could then be summarized in Table 5. Then, setting up to solve the problem at t ¼ 0, the required interpolations could be done with the values for Wðbx k; ; R ðjþ Þ and vðjþ k; found in Table 5. For example, for the first component of the vector of generated wealth (wealth =.050) used in the previous interpolation case, the linearly interpolated value would have been computed with the quantities in Table 6: Table 5 Values associated with two wealths at t ¼, for the realized values alternate specification w k; bx k; bv k; Wðbx k; ; R ðjþ Þ vðjþ k;.3 0.58 0:09 Wðbx ; ; R ðaþ Þ¼:336 vðaþ ; ¼ 0:08 Wðbx ; ; R ðbþ Þ¼:3044 vðbþ ; ¼ 0:000 0.8 0.945 0:0874 Wðbx ; ; R ðaþ Þ¼0:886 vðaþ ; ¼ 0:0833 Wðbx ; ; R ðbþ Þ¼0:7969 vðbþ ; ¼ 0:096 Table 6 Values for interpolation with realized value (first generated wealth) Wealth Value.3000 0:08.050? 0.8000 0:0833

380 M. Denault et al. and would have yielded a value of 0:0553: However, for the fifth component of the vector of generated wealths, also of value,050 but associated with path b this time, the realized value approach would have interpolated with the quantities in Table 7: Table 7 Values for interpolation with realized values (fifth generated wealth) Wealth Value.3000 0:000.050? 0.8000 0:096 and would have yielded a value of 0:0608. These two alternate specifications and their consequences will be discussed in details in Sect. 3.4 but let us already underline the fact that the algorithm remains non-anticipative in both cases, since the differences occur only after the optimal solution has been chosen. Finally, it should be noticed that the above example does not include state variables. For problems with state variables, the regression should include monomial basis involving these variables, since the optimal portfolio depends on their values on a given sample path. In this case, one optimization per sample path must be performed. We provide in the Appendix an example similar to the one above, but which includes a state variable in the regressions. 3.3 Complete algorithm This section provides the complete algorithm, as applied to the portfolio choice problem. We refer to it as LSMC S&D to emphasize that it is a Least-Squares Monte Carlo method with regression on State variables and Decision variables. As already outlined in the above subsection, two sets of operations are involved in the algorithm. A first set of operations performs the preliminary work which mainly consists of simulations. A second set of operations does the backward recursive work.

Dynamic portfolio choice: a simulation-and-regression 38 Preliminary work: simulations and grid formation.. Compute and store a grid of N portfolio weight vectors x i for i ¼ ton x with elements satisfying linear constraints x i X, for some feasible set X which includes an allocation constraint (sum to 00%) and usually lower and upper bounds on each portfolio weight.. Simulate and store n r paths of N excess return vectors fr ðjþ t g n r j¼ for t ¼ to T, and M state variable vectors fs ðjþ t g n r j¼ for t ¼ 0toT. The number of state variables (M) may be different from the number of risky assets (N), but both are simulated n r times. Note that the returns R ðjþ tþ associated with path j will in most cases depend on the information set at time t contained in vector s ðjþ t. The state variable vectors contain the known information at a given time point with which an optimal decision can be taken. This set of variables is problem dependent and will vary with the time series process associated with the returns. For example, with a predictable return process, the state variable vector might contain up-to-time t returns, or an exogenous mean-reverting market price of risk which determines the return levels for the next period. 3. Compute and store a grid of representative future wealth values fw k;t g n wðtþ k¼ for t ¼ 0toT. To do so, the weights grid and simulated returns can be used. The number of points in the grid is time varying to allow a uniform coverage of the range of possible wealth; it grows from a unique wealth at t ¼ 0to n w ðt Þ grid points at t ¼ T. See the Notes on the algorithm below for details. 4. Choose the set of n b bases for the regression surfaces. For example, it could be all monomials of first and second degree involving the decision variables and of the exogenous state variables.

38 M. Denault et al. Recursive work: backward recursive computations. Begin Time loop,fromt = T to 0 : Begin Wealth loop, from k = to n w (t) :. Generate a sample of n x n r wealths at t + using the simulated returns and the discretized portfolio weights : W k,t+ x i, R ( j) t+ := w k,t x i R ( j) t+ + R f for i = ton x and j = ton r. (4) is computed by interpo-. Generate a sample of n x n r values v k,t+ (x i, s ( j) t ). For all but the last time period, the value v k,t+ x i, s t ( j) lation through a set of n w (t + ) pairs v ( j) k,t+,w k,t+ n w (t+) k= where each v ( j) k,t+ is a previously computed point estimate of the value function, as discussed in step 6. The interpolation is done between the first components of the pairs at the values of the second component W k,t+ x i, R ( j) t+ defined in Eq. (4). For the single case of t = T, the estimates v k,t+ x i, s t ( j) are computed from the utility function applied to the wealth obtained from Eq. (4) i.e. v k,t x i, s ( j) T = u W k,t x i, R ( j) T. 3. Construct n x n r basis vectors B i, j, each of length n b. Each vector B i, j is associated to a pair x i, s t ( j) for i = ton x and j = ton r. 4. Regress the dependent values v k,t+ x i, s ( j) t on the independent basis vectors B i, j. There are n x n r such pairs. The resulting regression surface is denoted as L k,t (x, s) and is specified by the n b vector β k,t of regression coefficients (one coefficient per basis). 5. Optimize the regression surface L k,t (x, s) with respect to the positions x X for each of the j =,...,n r information sets s ( j). (See X in the first preliminary step) x ( j) k,t = arg max L k,t x; s t ( j) for j = ton r. x X which gives n r estimates of optimal portfolio weight vectors x ( j) k,t. 6. Compute the n r values v ( j) k,t associated with optimal positions x ( j) k,t by: Computing the realized wealth W k,t+ x ( j) j) k,t, R( t+ for j = ton r. Interpolating those realized wealths through the v ( j) k,t+,w n w (t+) k,t+ pairs. k= This step is like the previous interpolation step, but on a smaller scale (n r instead of n x n r interpolations). End Wealth loop. End Time loop. t

Dynamic portfolio choice: a simulation-and-regression 383 Notes on the algorithm The wealth grid that is produced in the Preliminary Work should be wide enough to cover the range of possibilities, but not so large as to become an excessive computational burden. We provide details on our attempts to strike a balance in Examples 3 and 4 below. In step 4, note that we regress a value at t þ on vectors of positions x i that apply from t to t þ, and on information sets available at t, i.e. s ðjþ t. The regression surface L k;t ðx; sþ is a conditional expectation; more precisely, it is an approximate representation of the expected utility when taking position x at time t, conditional on the information available at that time. At the completion of the algorithm, the value at wealth level W 0;0 and time t ¼ 0 can be approximated as the average over j of the values v ðjþ 0;0 assuming of course that all paths were started with the same state variable value. An optimal policy is available and defined by the set of regression parameters b k;t : they describe a surface which must simply be optimized over the decision variables. Notice that for the first period, the policy and initial wealth are unique. This policy is typically what one is interested in: after the first period, the investor would re-optimize, so that in practice, the policies associated with later periods hold little interest. If one nevertheless wishes to find those later policies, given the (stored) regression parameters an optimal policy can be computed at each of two neighbor wealth grid levels; then an approximate optimal policy at intermediate wealth levels can be derived by interpolation. 3.4 Implementation details A few implementation details are worth mentioning, which motivate the approach or improve its performance. Interpolation and inverse utility transformation As noted in Garlappi and Skoulakis (00), approximating an almost linear function is much easier than doing so for a nonlinear function. We introduce a very simple inverse utility transformation at the two interpolation steps above: simply apply the inverse utility function to the v ðjþ k;tþ values before interpolating, interpolate (linearly), and apply back the utility function to the solution of the interpolation. This small procedure is quite effective at reducing the number of wealth points that are required for a specified precision. Unless specifically noted, the inverse utility transformation was always used. Realized values versus regression surface values In the last step of the algorithm, it is important to note that v ðjþ k;t is not the value L k;tðbx ðjþ k;t ; sðjþ t Þ that is an output of the maximization problem. Indeed, once the optimal solution is found, there are (at least) two ways that a value can be associated with that solution (and path). One can use the technique described in the algorithm, giving what we call realized values.

384 M. Denault et al. One could also simply use the objective value at the optimal solution, L k;t ðbx ðjþ k;t ; sðjþ t Þ, as the new values of v ðþ j k;t. We call these regression surface values. Essentially, the regression surface approach avoids step 6 of the algorithm, but uses a value that is, in way, tainted by the imperfect fit of the regression surface. It is important to mention that this use of future information is not anticipative: the optimal choice does not rely on this information, only the value associated to the optimal choice, after it has been made. This alternative is akin to the issue discussed by Longstaff and Schwartz (00) in the context of option pricing; there the authors argue in favor of realized values, in opposition to Tsitsiklis and Van Roy (00) who used regression surface values. While the regression surface approach is believed to bear more bias in option pricing, we find no such evidence in portfolio choice. We tested both approaches and found very little difference between the answers, and certainly no systematic bias, for our portfolio choice problems. All numerical results below were performed with the realized values approach. Finally, we note that a closely related issue is discussed in van Binsbergen and Brandt (007), Garlappi and Skoulakis (009) and Denault and Simonato (07). In those papers, the use of a specific utility function (i.e., CRRA) means that the solutions are independent of wealth, which in turn makes the issue of realized values versus regression surface values more meaningful. See the papers for details. Regression procedure The regression step must be implemented with some care, but it is easy to do so. Essentially, the building of the matrix of the so-called normal equation must be performed as a sum of rank-one matrices (the number of such matrices being n x n r ), and this operation is best done with a compiled language (we used C ). It is also worth mentioning that it is possible to ensure that most of the regression work is done only once per time step. We differ from Nijman et al. (007) on this topic of regression. Indeed, the authors do a two-step regression: one regression in the space of the state variables and then one in the space of the decision variables. They find the resulting approximate regression surface more computationally attractive. Here, the optimizations, not the regressions, still claim the majority of the computing time, even with a comparatively large problem with six decision variables and dozens of bases. For smaller problems, the time spent on regressions is small even in absolute time, of the order of seconds, tens of seconds at the most. Finally, when simulating the paths, note that it is practical to ensure that the state variable before t ¼ 0, S 0, does not have the same value for all paths even if it would be coherent with a real-life situation; otherwise, the matrix composed of all the basis vector B i;j has (at least) two constant columns, and the regression will fail because of singularity. Optimization Note that a large number of optimizations must be performed: one per simulated path, per discrete wealth level, and time level. For this reason, for larger problems, we use linear and quadratic terms only for the bases in the decision variables. The resulting quadratic optimization problems can be solved very efficiently with dedicated software and warm starts. (Note that in Sect. 4 some

Dynamic portfolio choice: a simulation-and-regression 385 problems in smaller dimensions were also tested with higher degree monomials). Bases in the state variables are also limited to quadratic, for the sake of computational simplicity. All cross-products of order two (also between decision variables and state variables) are included. 4 Numerical experiments We present the optimal weights at t ¼ 0 for a variety of problems. The first example is very simple but still displays some fundamental characteristics of the proposed approach; we also use it to illustrate a potential improvement direction for the method. The following examples have various features to highlight the behaviour of the algorithm. Example moves away from the one-risky-asset case and example 3 shows the effect of returns that depend on state variables. Example 4 is quite general: multiple periods, multiple assets, depending on multiple sources of uncertainty, with a CARA utility function. Finally, examples 5 and 6 show the impact of using a non-differentiable utility function, comparing the derivative-based approaches of Brandt et al. (005) and Garlappi and Skoulakis (00) with our proposed algorithm, hereafter respectively referred as BGSS05, GS0, and LSMC S&D (LSMC on State and Decision variables). In the case of example 4, we also provide certainty equivalent values for both the quadrature approach and LSMC S&D. Certainty equivalents are the standard metric to compare policies obtained by dynamic programming. They are computed by running an established policy through a set of simulations (simulated returns in the case of example 4). These simulations are out-of-sample, in the sense that they were not used to establish the LSMC S&D policy; and of course the same set of out-of-sample simulations is provided to all policies. Applying the inverse utility function to the sample mean utility (the average of the utilities at T obtained over all simulation paths for a given policy) gives the certainty equivalent value (associated with that policy). Our LSMC S&D algorithm and the quadrature approach were programmed mostly in Matlab, with C procedures helping out at a few critical places. For most tests, a desktop personal computer was sufficient. For the larger problems, we used a server with cpu s to run the optimizations in parallel. To give an idea of the computing times, a problem of six assets (five risky assets), with VAR returns (see example 4), an horizon of four periods, 0,000 simulations, a basis of order in the state variables and the decision variables (including all cross-products, 66 basis terms in total), and a decision variable grid at 0% intervals, is solved in less than 6 min on a desktop computer with four cpu s. With this number of simulations, the solution is stable only to more or less or % on each asset, but the objective function is already very stable. 4. Example : two assets, one period, independent returns We examine first an example taken from Brandt et al. (005). It is a one-period example with one risky asset and one risk-free asset to assess the behavior of the

386 M. Denault et al. method in a simple setting. The stock excess return is distributed as a shifted log normal distribution with constant mean l and standard deviation r. We use a constant relative risk aversion (CRRA) utility function, defined as: uw ð T Þ ¼ W c T c ðcrraþ where c is a risk aversion coefficient. The weights of the risky and risk-free assets are constrained to be non-negative. As is well known from the portfolio literature, there is no available analytical solution for this case, and the optimal portfolio is independent of the wealth level, which is set here to W 0 ¼. For this example, the wealth does not need to be discretized since it is a one period problem. The preliminary work can be described as follows: Simulate the returns in excess of the risk-free rate with R ðjþ T ¼ R f e lþre j;t ð Þ where e j;t are standard normal random variates. 6 Set the uniform grid of portfolio weights as x i ¼ x þði Þd x for i ¼ ton x with x ¼ 0, x nx ¼ and d x ¼ ðx nx x Þ= ðn x Þ. >, Define the basis vector as x x consisting of a constant, the portfolio weight for the risky asset and its squared value. A basis with the portfolio weights raised up to the fourth power is also examined, i.e. >. x x x 3 x 4 There is no recursive work because this is a one period problem. The wealths at time T are computed with W 0;T ðx i ; R ðjþ T Þ¼w 0 ðx i R ðjþ T þ R f Þ for i ¼ ton x and j ¼ to n r, while the sample of value function points is computed with v 0;T ðx i ; R ðjþ T Þ¼ uðw 0;T ðx i ; R ðjþ T ÞÞ for i ¼ ton x and j ¼ ton r. The regression coefficients are estimated by solving the normal equation while the optimization is performed using a golden section search algorithm. Table 8 presents the optimal weights at t ¼ 0 computed for two scenarios of values for the mean and standard deviation of the excess return. The first case roughly corresponds to typical values that could be found with monthly returns with a risk-free rate equal to 0.05 /, while the second is for annual values with a riskfree rate of 0.05. The weights have been obtained with three methods. The first method is a numerical quadrature approach which is used here to provide some reference point. The second method is the BGSS05 simulation and regression approach. The third method is the LSMC S&D approach proposed in this paper, for different grids of portfolio weights and different set of monomial bases (either with maximum degree of two or even up to degree four). For the second and third methods, ten optimizations have been performed, each with different simulations of 6 In order to reduce the sample variability, a moment matching simulation is used to generate the standard normal random variates. See, for examples, Barraquand (995) or Boyle et al. (997).

Dynamic portfolio choice: a simulation-and-regression 387 Table 8 Example, portfolio weights l ¼ 0:0; r ¼ 0:05 l ¼ 0:0; r ¼ 0:5 c ¼ 5 c ¼ 0 c ¼ 5 c ¼ 0 c ¼ 5 c ¼ 0 c ¼ 5 c ¼ 0 Quadrature 0.9000 0.4499 0.999 0.48 0.9890 0.4944 0.388 0.46 LSMC S&D, one-step, d x ¼ 0:0 B 0.8974 0.4507 0.306 0.390 0.9363 0.4984 0.3390 0.330 (0.00) (0.00) (0.00) (0.00) (0.003) (0.004) (0.00) (0.006) B4 0.905 0.4506 0.300 0.4 0.9909 0.4964 0.357 0.53 (0.00) (0.00) (0.00) (0.000) (0.005) (0.00) (0.009) (0.04) LSMC S&D, two-step, d x ¼ 0:0 B 0.903 0.4483 0.3009 0.58 0.99 0.4698 0.339 0.540 (0.00) (0.00) (0.00) (0.00) (0.005) (0.09) (0.00) (0.00) B4 0.905 0.4506 0.3003 0.5 0.9946 0.4949 0.3306 0.464 (0.00) (0.00) (0.00) (0.000) (0.005) (0.00) (0.00) (0.00) BGSS05 0.8966 0.4489 0.993 0.45 0.6799 0.3574 0.45 0.835 (0.00) (0.00) (0.000) (0.000) (0.007) (0.003) (0.00) (0.00) Quadrature are the weights computed with an adaptative Simpson quadrature algorithm. LSMC S&D, one-step and LSMC S&D, two-step are the average of 0 weights computed with the LSMC S&D approach with one and two steps, with each weight computed using 5000 sample paths. Numbers in parentheses are the standard errors of the 0 computed weights. d x is the increment used to build a grid of portfolio weights used as basis in the regressions. B are weights computed with monomial basis of up to degree two; B4 are weights computed with our restricted set of monomial basis of up to degree four. Common random numbers are used for the LSMC S&D approaches. BGSS05 are the average of 0 weights computed with the Brandt et al. (005) least-square Monte Carlo simulation approach, with each weight computed using 5000 sample paths n r ¼ 5000 sample paths. The table presents the average of these 0 optimizations, while the standard deviation of these ten weights are reported below in parentheses. As shown by these standard deviations, the results are fairly stable and do not vary much. As already noticed in their paper, Brandt et al. (005) find that their method is more precise for small pairs of l and r than with larger values for these parameters. This result is also obtained here and the weights computed with this approach are slightly off from the quadrature values for l ¼ 0: and r ¼ 0:5, especially for larger values of the risk aversion coefficient. Based on results not reported here for the sake of brevity, we observed empirically that the LSMC S&D approach has a very similar accuracy whether the discretization d x ¼ 0: or finer as with d x ¼ 0: or d x ¼ 0:05. Hence, it is possible to have a reasonable precision without imposing a large computational burden. As expected, this approach performs better when monomial of higher degree are also included in the basis. However, the differences are not large and the quadratic case offers a reasonable performance. Finally, for the annual case, for a risk aversion coefficient of 0, the LSCM S&D and BGSS05 approaches deteriorate and have difficulties pinning down the solution.