Risk-averse Reinforcement Learning for Algorithmic Trading

Size: px

Start display at page:

Download "Risk-averse Reinforcement Learning for Algorithmic Trading"

Luke Stevens
6 years ago
Views:

1 Risk-averse Reinforcement Learning for Algorithmic Trading Yun Shen 1 Ruihong Huang 2,3 Chang Yan 2 Klaus Obermayer 1 1 TECHNISCHE UNIVERSITÄT BERLIN 2 HUMBOLDT-UNIVERSITÄT ZU BERLIN 3 LOBSTER TEAM IEEE CIFEr, London March 28, 214

2 Introduction Transaction cost: TC = X(P d P ) + x j P j x j P +(X x j )(P n P ) + visible }{{}}{{} invest. related trading related visible: commission fees, taxes, etc. Task: to liquid a huge inventory over a short time horizon. Data: high-frequency (millisecond) limit orders in NASDAQ Method: reinforcement learning (RL) + risk control Kissell & Glantz, Opt. Trading Strategies, 23; Nevmyvaka et al., ICML, 26 Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Introduction 3/18

3 Risk Uncertain future prices, volumes, etc. Example: 21 flash crash Standard RL is risk-neutral does not take risk into consideration perform badly when outlier events happen Variance as a risk measure: non-gaussian noise is computationally infeasible Gaussian noise is not always the case need new measures of risk and new algorithms Nevmyvaka et al., ICML, 26; Almgren & Chriss, J. of Risk, 21 On May 6, 21, the prices of many U.S.-based equity products experienced an extraordinarily rapid decline and recovery. That afternoon, major equity indices in both the futures and securities markets, each already down over 4% from their prior-day close, suddenly plummeted a further 5-6% in a matter of minutes before rebounding almost as quickly. CFTC & SEC Report Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Introduction 4/18

4 Markov decision processes Objective: x j P j x j P +(X x j )(P n P ) max π J(π, s) :=E[S T S 1 = s,π] [ T ] =E R(S t, A t) S 1 = s,π t=1 reward function R : S A Ω R policy π = [π 1,π 2,...,π T ],π t : S A key assumption: Markov P(S t+1 F t) = P(S t+1 S t, A t) [ [ ]] J(π, s) = E π 1 S 1 =s R(S 1, A 1 )+E π 2 S R(S 2 2, A 2 )+...+E π T S [R(S T T, A T )]... Adding risk: max π E π [S T ] λv π [S T ] λ controls the risk sensitivity. example of V: standard deviation it is difficult 3 to solve the problem except the case with Gaussian noise see, e.g., Puterman, 1994; Almgren & Chriss, 2; Mannor & Tsitsiklis, 21. Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Model 6/18

5 Evaluation function/risk measure E π 1 S 1 =s [ [ R(S 1, A 1 )+E π 2 S 2 R(S 2, A 2)+...+E π T S T [R(S T, A T )]... ]] [ [ ]] U π 1 S 1 =s R(S 1, A 1 )+U π 2 S 2 R(S 2, A 2)+...+U π T S T [R(S T, A T )]... U( s, a) is a risk measure for all (s, a) monotonicity translation invariance concavity/coherency Utility-based shortfall 2 : U s,a(x) = sup{m R E s,a[u(x m)] } u: a concave, continuous and strictly increasing function satisfying u() =. concave u risk averse. Shen et al., SIAM J. on Cont. & Opt., 213; Artzner et al., Math. Finance, 1999; Föllmer & Schied, Finance & Stoch., 22; Föllmer & Schied, 24. Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Model 7/18

6 Utility-based shortfall U s,a(x) = sup{m R E s,a[u(x m)] } we consider { 1 u(x) = λ [(x+1)λ 1] x x x < λ (, 1) controls the degree of risk-averseness. λ = 1: U s,a( ) = E s,a( ). Example: {(r 1, p),(r 2, 1 p)} (r 1 > r 2) subjective probability: w(p) = U(p) r 2 r 1 r 2 (prospect theory) u(x) x w(p) p Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Model 8/18

7 Risk-averse reinforcement learning at tth time point, repeat a at state s N times {R i, s i} i=1,2,...,n. iterative update: π (N) t Q (i+1) t (s, a) = Q (i) t (s, a)+ 1 i u (R i + max a ) Q t+1 (s i, a) Q (i) t (s, a) (s) = arg max a A Q (N) (s, a) πt, the optimal policy, as N t (*) Shen et al., Neu. Comp., 214; Dunkel & Weber, Math. Oper. Re., 21. initialize Q T+1 (s, a) = for all s S, a A; for t = T to 1 do initialize Q t(s, a) = for all s S, a A; for each state s S and a A do for n = 1 to N do execute action a at s to obtain sampled reward R and successive state s ; update Q t(s, a) according to (*); end for end for end for needs no knowledge of transition model P(S t+1 S t, A t) data driven; online alg.: adapts to new data easily; parallel computing is possible; risk control by u, specifically, λ. Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Model 9/18

Problem Setting Task: sell V shares of AMZN in NASDAQ within H min.

Training 1.5.29 3.4.21 Test 1.5.21 31.1.21 Test period contains the flash crash on 6.

com Performance evaluation: mid-quote at time average execution price cost = 1 4

8 Problem Setting Task: sell V shares of AMZN in NASDAQ within H min. Data Provided by LOBSTER with two price levels, i.e., only two best asks and bids. Training Test Test period contains the flash crash on Performance evaluation: mid-quote at time average execution price cost = 1 4 mid-quote at time Risk Standard deviation of costs 95%-quantile cost 95% Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Experiment 11/18

9 MDP formulation time resolution total time horizon H = 1min. Limit orders are submitted to the market at t = n H, n =, 1,...,T 1. T states V = the target volume, I = the number of inventory units. state i = v I/V. market variables: spread, vol. misbalance, signed vol. etc. actions: a = submit a sell order at price ask a (unit: US cent) with all remaining shares. rewards: the cash inflow resulted from a (partial) execution of the limit order placed at ask a ) 2) Price 3) 4) cf. Nevmyvaka et al., ICML, 26. Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Experiment 12/18

10 Results I: Tuning of λ 6 average trading cost 3 standard deviation 4 2 risk neutral V=2k,T=5 V=2k,T= λ 95% qunatile cost risk neutral λ cost at flash crash risk neutral 2 risk neutral λ λ Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Experiment 13/18

11 Results II: Flash Crash Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Experiment 14/18

12 Results II: Flash Crash 4 35 RN no spread RN spread RA no spread RA spread V1k T5 I5 V1k T1 I1 V2k T5 I5 V2k T1 I1 Trading costs on the flash crash spot. Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Experiment 15/18

13 Results III: Overall Performance average trading cost RN no spread RN spread RA no spread RA spread V1k T5 I5 V1k T1 I1 V2k T5 I5 V2k T1 I1 standard deviation RN no spread RN spread RA no spread RA spread V1k T5 I5 V1k T1 I1 V2k T5 I5 V2k T1 I1 95% quantile cost RN no spread RN spread RA no spread RA spread V1k T5 I5 V1k T1 I1 V2k T5 I5 V2k T1 I1 Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Experiment 16/18

14 Conclusion and outlook Conclusion: our novel risk-averse RL significantly reduces the trading cost at the spot of flash crash remarkably lowers down risk in the whole test period at the price of a slight increase of average trading cost Outlook market impact expand state space: test with various market variables. expand action space with volume number. other u-functions, even risk-seeking type?! Thank you for your patience! Yun Shen (TU Berlin) Risk-averse RL for Algo Trading Conclusion and Outlook 18/18

ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION

ONLINE LEARNING IN LIMIT ORDER BOOK TRADE EXECUTION Nima Akbarzadeh, Cem Tekin Bilkent University Electrical and Electronics Engineering Department Ankara, Turkey Mihaela van der Schaar Oxford Man Institute