Robust Dual Dynamic Programming

1 / 18 Robust Dual Dynamic Programming Angelos Georghiou, Angelos Tsoukalas, Wolfram Wiesemann American University of Beirut Olayan School of Business 31 May 217

2 / 18 Inspired by SDDP Stochastic optimization - Optimizes expected value - Needs to know distribution min E P [f (x, ξ)] x Robust optimization - Optimizes for the worst case - Works with uncertainty sets Nested Benders min x max f (x, ξ) ξ Ξ SDDP RDDP

3 / 18 minimize max ξ Ξ Multistage Robust Optimization T qt x t (ξ t ) t=1 subject to T t x t 1 (ξ t 1 ) + W t x t (ξ t ) H t ξ t ξ Ξ, t x t (ξ t ) R nt, ξ t Ξ t, ξ t = (ξ 1,, ξ t ) Optimize over decision policies x t ( ). Ξ t polyhedral uncertainty sets. Constraints entangle consecutive stages. Infinite number of variables and constraints.

3 / 18 minimize max ξ Ξ Multistage Robust Optimization T qt x t (ξ t ) t=1 ( [ T ]) E qt x t (ξ t ) subject to T t x t 1 (ξ t 1 ) + W t x t (ξ t ) H t ξ t ξ Ξ, t t=1 x t (ξ t ) R nt, ξ t Ξ t, ξ t = (ξ 1,, ξ t ) Optimize over decision policies x t ( ). Ξ t polyhedral uncertainty sets. Constraints entangle consecutive stages. Infinite number of variables and constraints.

4 / 18 Relatively complete recourse Assumptions - any partial feasible solution x 1,..., x t can be extended to a complete solution x 1,..., x T - Can be addressed using feasibility cuts Right hand side uncertainty only - T t x t 1 (ξ t 1 ) + W t x t (ξ t ) H t ξ t - Ensures finite convergence - Makes problem easier - Relationship with Nested Benders/SDDP easier to see Both assumptions can be lifted

Nested Formulation The multistage problem can be expressed through a nested formulation [ ] min q1 x 1 + x 1 X 1 max min ξ 2 Ξ 2 x 2 X 2 (x 1,ξ 2 ) q 2 x 2 + + max min ξ T Ξ T x T X T (x T 1,ξ T ) q T x T }{{} First stage problem t stage problem Q 3 (x 2 ) } {{ } Q 2 (x 1 ) min x 1 R n 1 Q t (x t 1 ) = max min ξ t Ξ t x t R n t q 1 x 1 + Q 2 (x 1 ) W 1 x 1 h 1 q t x t + Q t+1 (x t ) T t x t 1 + W t x t H t ξ t 5 / 18

6 / 18 Nested Formulation Q 2 Q 3 Q T Q T +1 min x 1 R n 1 q 1 x 1 + Q 2 (x 1 ) W 1 x 1 h 1 Q t (x t 1 ) = max min ξ t Ξ t x t R n t q t x t + Q t+1 (x t ) T t x t 1 + W t x t H t ξ t Optimal value of inner problem convex in ξ t.

6 / 18 Nested Formulation Q 2 Q 3 Q T Q T +1 min x 1 R n 1 q 1 x 1 + Q 2 (x 1 ) W 1 x 1 h 1 Q t (x t 1 ) = max min ξ t ext Ξ t x t R n t q t x t + Q t+1 (x t ) T t x t 1 + W t x t H t ξ t Optimal value of inner problem convex in ξ t. We can replace Ξ t with ext Ξ t. Problem decomposes. If only we knew the value functions...

Nested Benders Decomposition FP ξ2 ξ3 x3 min x2 R n 2 subject to q 2 x2 + Q 3 (x2) Tt x1 + W2x2 H2ξ2 x2 x3 x1 x2 x3 max ξ2 extξ2 [ min x2 R n 2 subject to q 2 x2 + Q 3 (x2) Tt x1 + W2x2 H2ξ2 ] x3 BP Maintain one (outer approximation of a ) value function per node. Traverse the scenario tree forwards and backwards. FP: At every node, solve and decide where to refine. Move x t forward. We refine at all nodes, i.e., for all scenarios. BP: introduce Benders cuts to refine outer approximations. 7 / 18

Nested Benders Decomposition FP ξ2 ξ3 x3 min x2 R n 2 subject to q 2 x2 + Q 3 (x2) Tt x1 + W2x2 H2ξ2 x2 x3 x1 X2(ξ2, x1) x2 x3 max ξ2 extξ2 [ min x2 R n 2 subject to q 2 x2 + Q 3 (x2) Tt x1 + W2x2 H2ξ2 ] x3 BP Maintain one (outer approximation of a ) value function per node. Traverse the scenario tree forwards and backwards. FP: At every node, solve and decide where to refine. Move x t forward. We refine at all nodes, i.e., for all scenarios. BP: introduce Benders cuts to refine outer approximations. But cuts are valid for all nodes of a stage. 7 / 18

8 / 18 Towards SDDP: Cut Sharing ξ2 ξ3 FP min x2 R n 2 subject to q 2 x2 + Q 3 (x2) Tt x1 + W2x2 H2 ξ2 x1 x2 x3 X2(ξ2, x1) max ξ2 extξ2 [ min x2 R n 2 subject to q 2 x2 + Q 3 (x2) Tt x1 + W2x2 H2ξ2 ] BP Maintain one approximation per stage. But which scenario to propagate forwards? Where to refine the approximation? Exponential number of end-to-end choices.

Towards SDDP: Cut Sharing ξ2 ξ3 FP min x2 R n 2 subject to q 2 x2 + Q 3 (x2) Tt x1 + W2x2 H2 ξ2 x1 x2 x3 X2(ξ2, x1) max ξ2 extξ2 [ min x2 R n 2 subject to q 2 x2 + Q 3 (x2) Tt x1 + W2x2 H2ξ2 ] BP Maintain one approximation per stage. But which scenario to propagate forwards? Where to refine the approximation? Exponential number of end-to-end choices. SDDP Solution (for stochastic programming): pick at random! Small number of refinements. Good performance in practice. No deterministic upper bound/termination criterion. Stochastic convergence. 8 / 18

9 / 18 Robust Dual Dynamic Programming Not all scenarios are important. Pick worst case scenarios. Maintain both inner and outer approximations. In the FP: - use inner approximations to choose scenarios. - use outer approximations to choose decisions(points of refinement). In the BP refine both inner and outer approximations.

1 / 18 Where to Refine? Forward pass. Minimizing a convex function Maximizing a convex function ξt f = arg max min x f t = ξ t Ξ t x t R n t arg min x t R n t q t x t + Q t+1 (x t ) T t x f t 1 + W tx t H t ξ t q t x t + Q t+1 (x t ) T t x f t 1 + W tx t H t ξ f t

11 / 18 ξ b t = How to Refine? Backward Pass. arg max min ξ t Ξ t x t R n t qt x t + Q t+1 (x t ) T t xt 1 f + W tx t H t ξ t with corresponding inner solution x b t. Add (x f t 1, q t x b t + Q t+1 (x b t )) to the description of Q t By solving (the dual of) Q t (xt 1 f ) = min x t R n t q t x t + Q t+1 (x t ) T t x f t 1 + W tx t H t ξ b t Q t with a hyperplane at x f t 1. - Subgradients of perturbation functions Lagrange multipliers. refine

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x1 f = 5.9 16 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 1 = 5.9 Stage 2. Update Region. [ 15., 5.] 17 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 1 = 5.9 Stage 2. Update Region. [ 15., 5.] x f 2 = 5. 18 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 1 = 5.9 Stage 2. Update Region. [ 15., 5.] Stage 3. Update Region. [12., 15.] x f 2 = 5. 19 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 1 = 5.9 Stage 2. Update Region. [ 15., 5.] x f 2 = 5. x f 3 = 12. Stage 3. Update Region. [12., 15.] 2 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 2. Update Region. [ 15., 5.] x f 2 = 5. Stage 3. Update Region. [12., 15.] Stage 2. Update Q 3 (-5.,32.) x f 3 = 12. 21 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 2 = 5. Stage 3. Update Region. [12., 15.] x f 3 = 12. Stage 2. Update Q 3 Stage 2. Update Q 3 4. x 2 18. 6. x 2 (-5.,32.) 22 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 3. Update Region. [12., 15.] x f 3 = 12. Stage 2. Update Q 3 (-5.,32.) Stage 2. Update Q 3 Stage 1. Update Q 2 (5.9,2.2) 4. x 2 18. 6. x 2 23 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 3 = 12. Stage 2. Update Q 3 (-5.,32.) Stage 2. Update Q 3 4. x 2 18. 6. x 2 Stage 1. Update Q 2 Stage 1. Update Q 2 2. x 1 27. + 1. x 1 (5.9,2.2) 24 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 2. Update Q 3 (-5.,32.) Stage 2. Update Q 3 4. x 2 18. 6. x 2 Stage 1. Update Q2 (5.9,2.2) x f 1 = 1. Stage 1. Update Q 2 2. x 1 27. + 1. x 1 25 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 2. Update Q 3 4. x 2 18. 6. x 2 Stage 1. Update Q2 (5.9,2.2) Stage 1. Update Q 2 2. x 1 27. + 1. x 1 Stage 2. Update Region. [ 15., 1.9] x f 1 = 1. 26 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 1. Update Q 2 (5.9,2.2) Stage 1. Update Q 2 2. x 1 27. + 1. x 1 x f 1 = 1. x f 2 = 1.9 Stage 2. Update Region. [ 15., 1.9] 27 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 1. Update Q 2 2. x 1 27. + 1. x 1 x f 1 = 1. Stage 2. Update Region. [ 15., 1.9] Stage 3. Update Region. [59., 15.] x f 2 = 1.9 28 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 1 = 1. Stage 2. Update Region. [ 15., 1.9] x f 2 = 1.9 x f 3 = 59. Stage 3. Update Region. [59., 15.] 29 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 2. Update Region. [ 15., 1.9] x f 2 = 1.9 Stage 3. Update Region. [59., 15.] Stage 2. Update Q 3 (1.9,15.4) x f 3 = 59. 3 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 2 = 1.9 Stage 3. Update Region. [59., 15.] x f 3 = 59. Stage 2. Update Q 3 Stage 2. Update Q 3 4. x 2 5. + 1. x 2 (1.9,15.4) 31 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 3. Update Region. [59., 15.] x f 3 = 59. Stage 2. Update Q 3 (1.9,15.4) Stage 2. Update Q 3 Stage 1. Update Q 2 (-1.,35.4) 4. x 2 5. + 1. x 2 32 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 3 = 59. Stage 2. Update Q 3 (1.9,15.4) Stage 2. Update Q 3 4. x 2 5. + 1. x 2 Stage 1. Update Q 2 Stage 1. Update Q 2 2. x 1 38. +. x 1 (-1.,35.4) 33 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 2. Update Q 3 (1.9,15.4) Stage 2. Update Q 3 4. x 2 5. + 1. x 2 Stage 1. Update Q2 (-1.,35.4) x f 1 = 1.1 Stage 1. Update Q 2 2. x 1 38. +. x 1 34 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 2. Update Q 3 4. x 2 5. + 1. x 2 Stage 1. Update Q2 (-1.,35.4) Stage 1. Update Q 2 2. x 1 38. +. x 1 Stage 2. Update Region. [ 15., 2.] x f 1 = 1.1 35 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 1. Update Q 2 (-1.,35.4) Stage 1. Update Q 2 2. x 1 38. +. x 1 x f 1 = 1.1 x f 2 = 2. Stage 2. Update Region. [ 15., 2.] 36 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 1. Update Q 2 2. x 1 38. +. x 1 x f 1 = 1.1 Stage 2. Update Region. [ 15., 2.] Stage 3. Update Region. [2., 15.] x f 2 = 2. 37 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 1 = 1.1 Stage 2. Update Region. [ 15., 2.] x f 2 = 2. x f 3 = 2. Stage 3. Update Region. [2., 15.] 38 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 2. Update Region. [ 15., 2.] x f 2 = 2. Stage 3. Update Region. [2., 15.] Stage 2. Update Q 3 (2.,-6.) x f 3 = 2. 39 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 2 = 2. Stage 3. Update Region. [2., 15.] x f 3 = 2. Stage 2. Update Q 3 Stage 2. Update Q 3 4. x 2 2. + 2. x 2 (2.,-6.) 4 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 Stage 3. Update Region. [2., 15.] x f 3 = 2. Stage 2. Update Q 3 (2.,-6.) Stage 2. Update Q 3 Stage 1. Update Q 2 (-1.1,-3.8) 4. x 2 2. + 2. x 2 41 / 43

2. x 1 + Q 2 (x 1 ) 4. x 2 + Q 3 (x 2 ) 1. x 3 1 1 5 1 1 1 1 8 6 4 2 2 4 15 1 5 5 1 15 15 1 5 5 1 15 x f 3 = 2. Stage 2. Update Q 3 (2.,-6.) Stage 2. Update Q 3 4. x 2 2. + 2. x 2 Stage 1. Update Q 2 Stage 1. Update Q 2 2. x 1 3.8 + 2. x 1 (-1.1,-3.8) 42 / 43

Numerical Results: Inventory Control 13 / 18

Numerical Results Scalability w.r.t. horizon T={1, 5, 1} - 5 products - 4 random variables per stage (24 = 16 scenarios) 5-4-1 5 1 5 1 15 2 25 3 35 optimization time (secs) 5 5 1 1 2 3 4 optimization time (secs) 5-4-1 1 relative distance % 5 5-4-5 1 relative distance % relative distance % 1 5 5 1 5 1, 1,5 2, optimization time (secs) RDDP scales better than LDR w.r.t. the horizon 14 / 18

15 / 18 Numerical Results Scalability w.r.t. products ={1, 15, 2} - horizon T=1-4 random variables per stage (2 4 = 16 scenarios) 1 1-4-1 1 15-4-1 1 2-4-1 relative distance % 5 5 relative distance % 5 5 relative distance % 5 5 1 2 4 6 8 optimization time (secs) 1 2 4 6 8 1, optimization time (secs) 1 2, 4, 6, 8, optimization time (secs) RDDP does not solve the curse of dimensionality But, can address problems of practical importance

16 / 18 Scalability w.r.t. random variables ={5, 7, 9} - i.e., scenarios per stage= {32, 128, 512} - products= {6, 8, 1} - horizon T=1 Numerical Results 1 6-5-1 1 8-7-1 1 1-9-1 relative distance % 5 5 relative distance % 5 5 relative distance % 5 5 1 1 2 3 4 5 6 7 8 optimization time (secs) 1 1 2 3 4 optimization time (secs) 1 4, 8, 12, optimization time (secs)

17 / 18 Current Work Extension to Stochastic Programming - Same inner approximation, different algorithm - Same deterministic convergence guarantees - Preliminary results indicate comparable complexity Robust Optimization Stochastic Optimization

18 / 18 RDDP Summary Converges to optimal solution. Implementable strategy at every iteration(ub). Lower bound available at every iteration. Finite convergence for RHS/ technology matrix uncertainty Deterministic asymptotic convergence for Recourse matrix/objective uncertainty. 2-Stage sub-problems hard. State of the art multistage problems small 2-Stage problems. Stochastic optimization/ distributionally robust optimization