Portfolio selection with multiple risk measures Garud Iyengar Columbia University Industrial Engineering and Operations Research Joint work with Carlos Abad
Outline Portfolio selection and risk measures Variance Value at Risk Coherent risk measures: Spectral risk measures Finite sample approximations Linear programs: very large and ill-conditioned Cannot handle multiple risk models in practice Our contribution: Fast first-order algorithm Decomposition algorithm Can handle very large instances Can accommodate multiple risk models 2
Portfolio selection and risk measures Two objectives: Maximize expected return and minimize risk Maximize return with a risk bound Maximize a weighted combination of return and risk 3
Portfolio selection and risk measures Two objectives: Maximize expected return and minimize risk Maximize return with a risk bound Maximize a weighted combination of return and risk Markowitz mean-variance portfolio selection model (1952) risk variance of the return Portfolio selection: convex quadratic program Variance only appropriate for elliptical distributions Does not model tail losses well 3
Portfolio selection and risk measures Two objectives: Maximize expected return and minimize risk Maximize return with a risk bound Maximize a weighted combination of return and risk Markowitz mean-variance portfolio selection model (1952) risk variance of the return Portfolio selection: convex quadratic program Variance only appropriate for elliptical distributions Does not model tail losses well Value-at-Risk VaR β ( L) = inf{v : P( L v) β} Mandated in the Basel-II accord Probability of tail losses but not magnitude Not a convex risk measure: portfolio selection hard 3
Coherent risk measures Coherent risk measures ρ( L) satisfy (i) Monotonicity: if X Ỹ, then ρ( X) ρ(ỹ ) (ii) Positive homogeneity: for all α 0, ρ(α X) = αρ( X) (iii) Convexity: for all 0 α 1, ρ ( α X + (1 α)ỹ )) αρ( X) + (1 α)ρ(ỹ ) (iv) Cash-invariance: for all α R, ρ( X + α) = ρ( X) + α 4
Coherent risk measures Coherent risk measures ρ( L) satisfy { } ρ( L) = sup E Q [ L] Q Q 4
Coherent risk measures Coherent risk measures ρ( L) satisfy { } ρ( L) = sup E Q [ L] Q Q Expected Shortfall ES β ( L) = 1 1 VaR p ( L)dp 1 β β Also called Average Value at Risk. Almost Conditional Value-at-Risk. 4
Coherent risk measures Coherent risk measures ρ( L) satisfy { } ρ( L) = sup E Q [ L] Q Q Expected Shortfall ES β ( L) = 1 1 VaR p ( L)dp 1 β β Also called Average Value at Risk. Almost Conditional Value-at-Risk. Mean Upper Semi-deviation (λ [0, 1]) ρ upper ( L) = E P [ L] + λ (E P [( L E P [ L]) +]) 1 2 4
Coherent risk measures Coherent risk measures ρ( L) satisfy { } ρ( L) = sup E Q [ L] Q Q Expected Shortfall ES β ( L) = 1 1 VaR p ( L)dp 1 β β Also called Average Value at Risk. Almost Conditional Value-at-Risk. Mean Upper Semi-deviation (λ [0, 1]) ρ upper ( L) = E P [ L] + λ (E P [( L E P [ L]) +]) 1 2 Convex risk measures: portfolio optimization easy (in theory!) 4
Spectral risk measures Spectral Risk Measure (Acerbi (2002)) M γ ( L) = 1 0 ES β ( L)dγ(β) γ : [0, 1] R + : probability measure Bertsimas and Brown (2010): Distortion Risk Measures 5
Spectral risk measures Spectral Risk Measure (Acerbi (2002)) M γ ( L) = 1 γ : [0, 1] R + : probability measure 0 ES β ( L)dγ(β) Bertsimas and Brown (2010): Distortion Risk Measures Spectral Risk Measures Coherent, comonotone additive, and law-invariant Law invariant risk measure: X = Ỹ a.s. ρ( X) = ρ(ỹ ) Sampling methods only work for law invariant risk measures 5
Spectral risk measures Spectral Risk Measure (Acerbi (2002)) M γ ( L) = 1 γ : [0, 1] R + : probability measure 0 ES β ( L)dγ(β) Bertsimas and Brown (2010): Distortion Risk Measures Spectral Risk Measures Coherent, comonotone additive, and law-invariant Law invariant risk measure: X = Ỹ a.s. ρ( X) = ρ(ỹ ) Sampling methods only work for law invariant risk measures Law-invariant risk measures ρ LI ( L) satisfy (Kusuoka 2001) { ρ LI ( L) = max Mγ ( L) } γ Γ 5
Finite approximations N samples of losses: l = [l 1,..., l N ] 6
Finite approximations N samples of losses: l = [l 1,..., l N ] Finite sample approximation of ES: A linear program ES β (l) = max q l, s.t. 1 q = 1, 1 0 q (1 β)n 1. 6
Finite approximations N samples of losses: l = [l 1,..., l N ] Finite sample approximation of ES: A linear program ES β (l) = max q l, s.t. 1 q = 1, 1 0 q (1 β)n 1. Spectral risk measures: m M γ (l) = γ j ES βj (l) j=1 6
Finite approximations N samples of losses: l = [l 1,..., l N ] Finite sample approximation of ES: A linear program ES β (l) = max q l, s.t. 1 q = 1, 1 0 q (1 β)n 1. Spectral risk measures: m M γ (l) = γ j ES βj (l) j=1 Law invariant risk measures: ρ(l) = max γ Γ { m j=1 } γ j ES βj (l) 6
Mean-spectral risk portfolio selection problem n assets: portfolio x R n with 1 x = 1. m different risk models L k R Nk n : Loss matrix for the k-th model ρ k (L k x) = d k j=1 γ(k) j ES (k) β (L k x) j 7
Mean-spectral risk portfolio selection problem n assets: portfolio x R n with 1 x = 1. m different risk models L k R Nk n : Loss matrix for the k-th model ρ k (L k x) = d k j=1 γ(k) j ES (k) β (L k x) j Why bother with multiple risk models? Good, bad, ugly historical return periods Risk models with different periods Robustness to parameters 7
Mean-spectral risk portfolio selection problem n assets: portfolio x R n with 1 x = 1. m different risk models L k R Nk n : Loss matrix for the k-th model ρ k (L k x) = d k j=1 γ(k) j ES (k) β (L k x) j Why bother with multiple risk models? Good, bad, ugly historical return periods Risk models with different periods Robustness to parameters Portfolio selection problem max µ x λ x 1 s.t. ρ k (L k x) α k, k = 1,, m, 1 x = 1, x B. 7
LP formulation LP duality (Rockafellar and Uryasev) { 1 N ES β (L) = min z + z (1 β)n (L j z) +} j=1 8
LP formulation LP duality (Rockafellar and Uryasev) { 1 N ES β (L) = min z + z (1 β)n (L j z) +} j=1 Portfolio selection problem max µ x λ x 1 d k 1 N k ( ) ) + s.t. γ kl (z kl + (1 β kl )N (Lk k x) j z kl α k, l=1 j=1 1 x = 1, x B. k 8
LP formulation LP duality (Rockafellar and Uryasev) { 1 N ES β (L) = min z + z (1 β)n (L j z) +} j=1 Portfolio selection problem max µ x λ x 1 d k 1 N k ( ) ) + s.t. γ kl (z kl + (1 β kl )N (Lk k x) j z kl α k, l=1 j=1 1 x = 1, x B. k Complexity of LP = O((mdN + n) 3 ) n = 100, m = 5 models, d = 3, N = 10, 000: mdn + n = 150, 000. LP is very badly ill-conditioned 8
Penalty formulation Decouple the risk measures: reduces complexity. 9
Penalty formulation Decouple the risk measures: reduces complexity. Penalty formulation: ( ) min η λ x 1 µ x ( + ) + max k(l k x) α k } 1 k m }{{} g(x) s.t. 1 x = 1, x B Solve for a decreasing sequence of values of η 9
Penalty formulation Decouple the risk measures: reduces complexity. Penalty formulation: ( ) min η λ x 1 µ x ( + ) + max k(l k x) α k } 1 k m }{{} g(x) s.t. 1 x = 1, x B Solve for a decreasing sequence of values of η The objective is non-smooth: max{ } is non-smooth ρ k contains ES terms that are non-smooth 9
Penalty formulation Decouple the risk measures: reduces complexity. Penalty formulation: ( ) min η λ x 1 µ x ( + ) + max k(l k x) α k } 1 k m }{{} g(x) s.t. 1 x = 1, x B Solve for a decreasing sequence of values of η The objective is non-smooth: max{ } is non-smooth ρ k contains ES terms that are non-smooth Sub-gradient algorithms very slow! 9
Smooth the non-smooth function g(x) Smooth the max term: max 1 k m {x k } max {x k} = max m k=1 u k x k 1 k m s.t. 1 u = 1, u 0. 10
Smooth the non-smooth function g(x) Smooth the max term: max 1 k m {x k } max δ (x) = max m k=1 u k x k δ 2 u 2 2 s.t. 1 u = 1, u 0. 10
Smooth the non-smooth function g(x) Smooth the max term: max 1 k m {x k } max δ (x) = max m k=1 u k x k δ 2 u 2 2 s.t. 1 u = 1, u 0. Easy QP: can be solved by a 1 dimensional search max δ (x) = u 10
Smooth the non-smooth function g(x) Smooth the max term: max 1 k m {x k } max δ (x) = max m k=1 u k x k δ 2 u 2 2 s.t. 1 u = 1, u 0. Easy QP: can be solved by a 1 dimensional search max δ (x) = u Smooth the Expected Shortfall term ES β,ν (l) = max q l ν 2 q 2 2 s.t. 1 u = 1, 0 q 1 (1 β)n 1. Harder QP: can still be solved by a 1 dimensional search ES β,ν (l) = q 10
Smooth the non-smooth function g(x) Smooth the max term: max 1 k m {x k } max δ (x) = max m k=1 u k x k δ 2 u 2 2 s.t. 1 u = 1, u 0. Easy QP: can be solved by a 1 dimensional search max δ (x) = u Smooth the Expected Shortfall term ES β,ν (l) = max q l ν 2 q 2 2 s.t. 1 u = 1, 0 q 1 (1 β)n 1. Harder QP: can still be solved by a 1 dimensional search ES β,ν (l) = q Smoothed function ({ d k } m ) g νδ (x) = max δ γ kl ES βkl,ν(l k x)) α k=1 l=1 10
FISTA for fixed η Smoothed Penalty formulation: ( ) min η λ x 1 µ x) + g νδ (x) s.t. 1 x = 1, x B, 11
FISTA for fixed η Smoothed Penalty formulation: ( ) min η λ x 1 µ x) + g νδ (x) s.t. 1 x = 1, x B, Proximal gradient algorithm: In every iteration we need to solve min ηλ x 1 + h(x; y (k) ) s.t. 1 x = 1, x B h(x; y (k) ) = ( ηµ + g νδ (y (k) ) ) (x y (k) ) + L 2 x y(k) 2 2 g νδ : O(dmN ) complexity 11
FISTA for fixed η Smoothed Penalty formulation: ( ) min η λ x 1 µ x) + g νδ (x) s.t. 1 x = 1, x B, Proximal gradient algorithm: In every iteration we need to solve min ηλ x 1 + h(x; y (k) ) s.t. 1 x = 1, x B h(x; y (k) ) = ( ηµ + g νδ (y (k) ) ) (x y (k) ) + L 2 x y(k) 2 2 g νδ : O(dmN ) complexity l 1 -penalized separable QP Number of variables equal to number of assets Efficiently solvable even with side constraints 11
FISTA for fixed η Smoothed Penalty formulation: ( ) min η λ x 1 µ x) + g νδ (x) s.t. 1 x = 1, x B, Proximal gradient algorithm: In every iteration we need to solve min ηλ x 1 + h(x; y (k) ) s.t. 1 x = 1, x B h(x; y (k) ) = ( ηµ + g νδ (y (k) ) ) (x y (k) ) + L 2 x y(k) 2 2 g νδ : O(dmN ) complexity l 1 -penalized separable QP Number of variables equal to number of assets Efficiently solvable even with side constraints Complexity: O(mdN + n 3 ) compared to O((mdN + n) 3 ) 11
Other portfolio selection problems Weighted sparse mean-spectral risk max µ x λ x 1 m η k ρ k (L k x) k=1 s.t. 1 x = 1, x B. Sparse mean-max spectral risk portfolio selection problem ( ) max µ x λ x 1 η max ρ k(l k x) k=1,,m s.t. 1 x = 1, x B. Suppose Kusuoka representation set Γ = conv (γ 1,..., γ m ). Then ρ( X) { = max Mγk ( X) } 1 k m Method extends to law-invariant coherent risk measures. 12
Problem scaling results Compared the performance of our FISTA based code and Gurobi Average CPU Time (s) assets risk models # ES samples max err(%) Gurobi OurCode 10 5 3 100 0.22 0.118 0.464 10 5 3 500 0.41 1.222 2.912 10 5 3 1000 0.01 1.994 0.505 10 5 3 1500 0.00 2.924 0.994 100 5 3 1000 0.00 29.494 1.542 100 5 3 5000 0.00 242.974 8.706 100 5 3 10000 0.00 373.729 26.155 100 5 3 15000 x.xx x.xxx 27.862 1000 5 3 5000 1.65 38378.000 48.637 1000 5 3 10000 x.xx x.xxx 108.408 1000 5 3 15000 x.xx x.xxx 183.520 x.xx = Gurobi exited without computing a solution 13
Derivative portfolio selection Alexander, Coleman and Li. Minimizing VaR and CVaR for a portfolio of derivative. J. Banking and Finance. 2006. 4 correlated assets and 12 vanilla European calls and puts 12 binary European calls and puts The option prices are computed using the Black-Scholes formula. Nominal portfolio: N = 25, 000 samples using one risk model: σ 2 = σ 2 0 14
Derivative portfolio selection Alexander, Coleman and Li. Minimizing VaR and CVaR for a portfolio of derivative. J. Banking and Finance. 2006. 4 correlated assets and 12 vanilla European calls and puts 12 binary European calls and puts The option prices are computed using the Black-Scholes formula. Nominal portfolio: N = 25, 000 samples using one risk model: σ 2 = σ 2 0 Robust portfolio N = 25, 000 with three risk models: σ 2 = [1.05, 1, 0.95]σ 2 0 This problem is intractable for Gurobi. 14
Derivative portfolio selection Alexander, Coleman and Li. Minimizing VaR and CVaR for a portfolio of derivative. J. Banking and Finance. 2006. 4 correlated assets and 12 vanilla European calls and puts 12 binary European calls and puts The option prices are computed using the Black-Scholes formula. Nominal portfolio: N = 25, 000 samples using one risk model: σ 2 = σ 2 0 Robust portfolio N = 25, 000 with three risk models: σ 2 = [1.05, 1, 0.95]σ 2 0 This problem is intractable for Gurobi. Test: Risk budget violation on 10 sets of N = 25, 000 samples. 14
Derivative portfolio: Numerical results Low risk α = 1 and high sparsity λ = λ 0 prob µ x ρ test > α max ρ test µ 5 x x i 0 cpu time nom 0.0181 5 1.0133 0.0180 7 352.69 rob 0.0176 0 0.9831 0.0175 4 887.13 Low risk α = 1 and low sparsity λ = λ 0 /16 prob µ x ρ test > α max ρ test µ 5 x x i 0 cpu time nom 0.0228 6 1.0197 0.0193 42 1078.80 rob 0.0221 0 0.9857 0.0190 48 3130.00 High risk α = 3 and low sparsity λ = λ 0 /16 prob µ x ρ test > α max ρ test µ 5 x x i 0 cpu time nom 0.0645 6 3.0697 0.0550 68 1066.90 rob 0.0626 0 2.9709 0.0535 64 2898.00 15
Conclusions Fast first-order algorithm for portfolio selection with multiple spectral risk constraints Each step of algorithm separable convex QP in number of assets. The algorithm very efficient both in theory and in practice. Can prove a theoretical complexity bound Tested algorithm with n = 200, N = 25, 000 and m = 5 risk models Even MATLAB implementation is superior to state-of-art LP solver! 16