Pairs trading. ROBERT J. ELLIOTTy, JOHN VAN DER HOEK*z and WILLIAM P. MALCOLM

Quantitative Finance, Vol. 5, No. 3, June 2005, 271 276 Pairs trading ROBERT J. ELLIOTTy, JOHN VAN DER HOEK*z and WILLIAM P. MALCOLM yhaskayne School of Business, University of Calgary, Calgary, Alberta, Canada T2N 1N4 zdeartment of Alied Mathematics, The University of Adelaide, Adelaide, South Australia 5005, Australia National ICT Australia, The Australian National University, Canberra, ACT, Australia 0200 (Received 27 December 2004; in final form 11 Aril 2005) Pairs Trading is an investment strategy used by many Hedge Funds. Consider two similar stocks which trade at some sread. If the sread widens short the high stock and buy the low stock. As the sread narrows again to some equilibrium value, a rofit results. This aer rovides an analytical framework for such an investment strategy. We roose a meanreverting Gaussian Markov chain model for the sread which is observed in Gaussian noise. Predictions from the calibrated model are then comared with subsequent observations of the sread to determine aroriate investment decisions. The methodology has otential alications to generating wealth from any quantities in financial markets which are observed to be out of equilibrium. Keywords: Pairs trading; Hedge funds; Sreads 1. Introduction Pairs Trading is a trading or investment strategy used to exloit financial markets that are out of equilibrium. Litterman (2003) exlains the hilosohy of Goldman Sachs Asset Management as one of assuming that while markets may not be in equilibrium, over time they move to a rational equilibrium, and the trader has an interest to take maximum advantage from deviations from equilibrium. Pairs Trading is a trading strategy consisting of a long osition in one security and a short osition in another security in a redetermined ratio. If the two securities are stocks from the same financial sector (like two mining stocks), one may take this ratio to be unity. This ratio may be selected in such a way that the resulting ortfolio is market neutral, a ortfolio with zero beta to the market ortfolio. This ortfolio is often called a sread. We shall model this sread (or the return rocess for this sread) as a mean-reverting rocess which we calibrate from market observations. This model will allow us to make redictions for this sread. If observations are larger (smaller) than the redicted value (by some threshhold value) we take a long (short) osition in the ortfolio and we unwind the osition and make a rofit when the sread reverts. A brief history and *Corresonding author. Email: Jvanderh@maths.adelaide. edu.au discussion of airs trading can be found in Gatev et al. (1999) and in two recent books by Vidyamurthy (2004) and Whistler (2004). Reverre (2001) discusses a classical study of airs trading involving Royal Dutch and Shell stocks. Pairs trading is also regarded as a secial form of Statistical Arbitrage and is sometimes discussed under this toic. The idea of airs trading can be alied to any equilibrium relationshi in financial markets, or to (market neutral) ortfolios of securities some held short and others held long (see Nicholas (2000)). 2. The sread model 2.1. The state rocess Consider a state rocess fx k j k ¼ 0, 1, 2,...g where x k denotes the value of some (real) variable at time t k ¼ k for k ¼ 0, 1, 2,... We assume that {x k } is mean reverting: x kþ1 x k ¼ða bx k Þ þ ffiffi "kþ1, ð1þ where 0, b > 0, a 2R(which we may assume is nonnegative without any loss of generality), and f" k g is iid Gaussian Nð0, 1Þ. Clearly, we assume that " kþ1 is indeendent of x 0, x 1,..., x k. The rocess mean reverts to ¼ a=b with strength b. Clearly, x k Nð k, 2 kþ, ð2þ Quantitative Finance ISSN 1469 7688 rint/issn 1469 7696 online # 2005 Taylor & Francis htt://www.tandf.co.uk/journals DOI: 10.1080/14697680500149370

272 R. J. Elliott et al. where k ¼ a h b þ 0 a i ð1 bþ k, b and k 2 2 h i ¼ 1 ð1 bþ2k þ 2 0ð1 2 bþ 2k : 1 ð1 bþ It is easy to show that and k! a b as k!1, k 2 2! 1 ð1 bþ 2 as k!1, ð6þ rovided we have chosen > 0 and small so that j1 bj < 1. We can also write (1) as x kþ1 ¼ A þ Bx k þ C" kþ1, ð7þ with A ¼ a 0, 0 < B ¼ 1 b <1 and C ¼ ffiffi.we could also regard x k ffi XðkÞ where fxðtþjt 0g satisfies the stochastic differential equation dxðtþ ¼ða bxðtþþ dt þ dwðtþ, ð8þ where fwðtþjt 0g is a standard Brownian motion (on some robability sace). 2.2. The observation rocess We assume that we have an observation rocess {y k }of {x k } in Gaussian noise: y k ¼ x k þ D! k, ð9þ where f! k g are iid Gaussian Nð0, 1Þ and indeendent of the f" k g in (1) and D > 0. We may assume that 0 C < D, which should be the case for small values of. We set Y k ¼ fy 0, y 1,..., y k g which reresents the information from observing y 0, y 1,..., y k. We will wish to comute the conditional exectation (filter): ^x k ¼ E ½x k jy k Š, ð10þ which are best estimates of the hidden state rocess through the observed rocess. In order to make the estimate (10), we will need to estimate ða, B, C, DÞ or rather ða, B, C 2, D 2 Þ from the observed data. We shall resent various results for this below. 2.3. The alication We shall regard {y k } as a model for the observed sread of two securities at time t k. We assume the observed sread is a noisy observation of some mean-reverting state rocess {x k }. The {y k } could also model the returns of the sread ortfolio as is often done in ractice. If y k > ^x kjk 1 ¼ E ½x k jy k 1 Š the sread is regarded as too large, and so the trader could take a long osition in the sread ortfolio and rofit when a correction ð3þ ð4þ ð5þ occurs. An alternative would be to initiate a long trade only when y k exceeds ^x kjk 1 by some threshold value. A corresonding short trade could be entered when y k < ^x kjk 1. Various decisions have to be made by the trader. What is a suitable air of securities for air trading? If our estimates for B reveal 0 < B < 1, then this is consistent with the mean-reverting model we have described. Comaring y k and ^x kjk 1 may or may not lead to a trade if thresholds must be met. How are thresholds set? See Vidyamurthy (2004) for some ossibilities. When is the airs trade unwound? There are various ossibilities: the next trading time (see Reverre (2001) examle) or when the sread corrects sufficiently. The rice alication and data bases used are often rorietary in industry alications. The machinery we resent then rovides some useful tools aroriate for airs trading. Another exlicit strategy could make use of First- Passage Times results (see Finch (2004) and references cited therein) for the (standardized) Ornstein Uhlenbeck rocess dzðtþ ¼ ZðtÞ dt þ ffiffi 2 dwðtþ: ð11þ Let T 0,c ¼ infft 0, ZðtÞ ¼0 j Zð0Þ ¼cg, ð12þ which has a robability density function f 0,c. It is known exlicitly that rffiffiffi! 2 jcj e t f 0,c ðtþ ¼ ex c2 e 2t ð1 e 2t 3=2 Þ 2ð1 e 2t ð13þ Þ for t > 0. Now f 0,c has a maximum value at ^t given by ^t ¼ 1 2 ln 1 þ 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðc 2 3Þ 2 þ 4c 2 þ c 2 3 : ð14þ 2 We can also write (8) in the form dxðtþ ¼ ðxðtþ Þ dt þ dwðtþ, where ¼ b and ¼ a=b. When ð15þ Xð0Þ ¼ þ c ffiffiffiffiffi, ð16þ 2 the most likely time T at which XðT Þ¼ is given by T ¼ 1 ^t, ð17þ where ^t is given by (14). Use the Ornstein Uhlenbeck rocess as an aroximation to (7) with a ¼ A=, b ¼ð1 BÞ= and ¼ C= ffiffi with the calibrated values A, B, C. Choose a value of c > 0. Enter a air trade when y k þ cð= ffiffiffiffiffi 2 Þ and unwind the trade at time T later. A corresonding air trade would be erformed when y k cð= ffiffiffiffiffi 2 Þ and unwound at time T later. Other methods based on u- and down-crossing results for AR(1) rocesses could also be considered. Corresonding results like those for the Ornstein Uhlenbeck rocesses are not known.

Pairs trading 273 3. Filtering and estimation results We assume some underlying robability sace ð, F, PÞ whose details need not concern the trader, excet that P reresents the real world robability. 3.1. Kalman filtering We have a state equation x kþ1 ¼ A þ Bx k þ C" kþ1, and the observation equation y k ¼ x k þ D! k, for k ¼ 0, 1, 2,... Given ða, B, C, DÞ we can comute k ¼ ^x k ^x kjk ¼ E ½x k jy k Š ð18þ ð19þ ð20þ using the Kalman Filter (see Elliott et al. (1995) for a reference robability style roof ). Let R k ¼ kjk E ðx k ^x k Þ 2 Yk : ð21þ Then ð ^x k, R k Þ are determine recursively as follows: ^x kþ1jk ¼ A þ B k ¼ A þ B ^x kjk, ð22þ kþ1jk ¼ B 2 kjk þ C 2, ð23þ K kþ1 ¼ kþ1jk =ð kþ1jk þ D 2 Þ, ð24þ ^x kþ1 ¼ ^x kþ1jkþ1 ¼ ^x kþ1jk þk kþ1 ½ y kþ1 ^x kþ1jk Š, ð25þ R kþ1 ¼ kþ1jkþ1 ¼ D 2 K kþ1 ¼ kþ1jk K kþ1 kþ1jk : ð26þ For initialization we could take ^x 0 ¼ y 0 and R 0 ¼ D 2. Remark: As k!1, R k converges (monotonically) to the ositive root R of B 2 R 2 þðc 2 þ D 2 B 2 D 2 ÞR C 2 D 2 ¼ 0 rovided B 2 6¼ 0, C 2 D 2 6¼ 0. We cannot say very much about limiting values of ^x k excet it is exonentially forgetting of ^x 0. However, these comments are not very imortant as we will only assume the model (18) holds over a short time horizon for a given set of values on ða, B, C, DÞ. 3.2. Estimation of model We now rovide estimates for # ða, B, C 2, D 2 Þ based on observations y 0, y 1,..., y N. We use the EM-Algorithm to find ^# by an iteration that rovides a stationary value of the likelihood function based on the observations. In fact, let (see Elliott et al. (1995)) dp L N ð#þ ¼E # 0 YN ð27þ dp 0 be the likelihood function for # 2. The maximum likelihood estimate solves ^# ¼ arg max L Nð#Þ: ð28þ #2 The EM-Algorithm is an iterative method to comute ^#. If ^# 0 is an initial estimate, the EM-Algorithm rovides ^# j, j ¼ 1, 2,..., as a sequence of estimates. Ste 1 (the E-ste): Comute (with ~# ¼ ^# j ) Qð#, ~#Þ ¼E # ~ log dp # dp Y N : ð29þ # ~ Ste 2 (the M-ste): Find # jþ1 2 arg max Qð#, ^# j Þ: ð30þ #2 In the literature there are basically two rocedures to imlement the EM-Algorithm. 3.2.1. Shumway and Stoffer (1982) smoother aroach. This method is described by Shumway and Stoffer (1982, 2000) and is an off-line calculation and makes use of smoother estimators for the Kalman Filter. We define smoothers (for k N ): ^x kjn ¼ E ½x k jy N Š, ð31þ kjn ¼ E ðx k ^x kjn Þ 2 YN ¼ E ðxk ^x kjn Þ 2, ð32þ k 1, kjn ¼ E ðx k ^x kjn Þðx k 1 ^x k 1jN Þ : ð33þ These smoothers can be comuted by J k ¼ B kjk, ð34þ kþ1jk ^x kjn ¼ ^x kjk þj k ^x kþ1jn ðaþb^x kjk Þ, ð35þ kjn ¼ kjk þj 2 k kþ1jn kþ1jk, ð36þ k 1, kjn ¼J k 1 kjk þj k J k 1 k, kþ1jn B kjk, ð37þ N 1, NjN ¼ Bð1 K N Þ N 1jN 1, ð38þ where initial values for this backward recursion ^x NjN and NjN are obtained from the Kalman Filter along with other estimates. Given # j ¼ðA, B, C 2, D 2 Þ and initial values for the Kalman Filter ^x 0 ¼ j 1 ^x 0jN and 0j0 ¼ j 1 0jN which are the smoothers from the revious ste ( j 1). The udates # jþ1 ¼ð^A, ^B, ^C 2, ^D 2 Þ are comuted as follows: ^A ¼ N 2, ð39þ N ^B ¼ N 2, ð40þ ^C 2 ¼ 1 X N ðx N k ^A ^Bx k 1 Þ 2 Y N, ð41þ where X N ^D 2 ¼ 1 N þ 1 E x 2 k 1 YN k¼0 ¼ X N ðy k x k Þ 2 YN, ð42þ E ½x k 1 x k jy N Š ^x kjn, h i k 1jN þ ^x 2 k 1jN, ^x k 1jN ¼ ^x NjN þ ^x 0jN, k 1, kjn þ ^x k 1jN ^x kjn,

274 R. J. Elliott et al. and the right-hand sides of (41) and (42) are readily comuted in terms of smoothers: ^C 2 ¼ 1 X N N kjn þ ^x 2 kjn þ ^A 2 þ ^B 2 k 1jN þ ^B 2 ð ^x k 1jN Þ 2 2 ^A ^x kjn þ 2 ^A ^B ^x k 1jN 2 ^B k 1, kjn 2 ^B ^x kjn ^x k 1jN, ^D 2 ¼ 1 N þ 1 X N k¼0 h i y 2 k 2y k ^x kjn þ kjn þ ^x 2 kjn : The disadvantage of this algorithm is that, as new values of observations are given, the whole algorithm must be reeated off-line. However, if we have written a code for this estimation based on N þ 1 observations y 0, y 1,..., y N, then with y Nþ1 we simly rovide the code with inut y 1, y 2,..., y Nþ1. The Shumway and Stoffer algorithm has been widely tested. 3.2.2. Elliott and Krishnamurthy (1999) filter aroach. This aroach to the imlementation of the EM-Algorithm uses filtered quantities and can be erformed on-line. This was based on a new class of finite-dimensional recursive filters for linear dynamic systems, which can be adated to equations (18) and (19). The imortant advantages of this filter-based EM-Algorithm comared with the (standard) smoother based EM-Algorithm include (i) substantially reduced memory requirements, and (ii) ease of arallel imlementation on a multirocessor system (see Elliott and Krishnamurthy (1997, 1999)). The details of this aroach are discussed in Elliott et al. (in ress), where comutational issues and convergence are reorted. As in section 3.2.1, we start with ^# j ¼ðA, B, C 2, D 2 Þ and initial values for the Kalman Filter and the next estimate ^# jþ1 ¼ð^A, ^B, ^C 2, ^D 2 Þ. We introduce various quantities: H 0 k ¼ Xk H 1 k ¼ Xk l¼1 H 2 k ¼ Xk J k ¼ Xk I 0 k ¼ Xk I 1 k ¼ Xk Y k ¼ Xk x 2 l, x l x l 1, x 2 l 1, x l y l, x l, x l 1, y 2 l : H d 0 k ¼ E H 0 d H 1 k ¼ E H 1 d H 2 k ¼ E H 2 JbJ k ¼ E J Yk k, I b 0 k ¼ E I 0 Ib 1 k ¼ E I 1 If E ¼ E ^# j, which means using ^# j ¼ðA, B, C 2, D 2 Þ in the dynamics (18), (19), then ^# jþ1 ¼ð^A, ^B, ^C 2, ^D 2 Þ is given through " ^A ¼ 1 ð c # 1 " I 1 N Þ2 c d I 0 HdH 2 N H 1 c # N I 1 N, ð43þ N HdH 2 N ^B ¼ 1 h HdH d H 2 1 N ^A c i I 1 N, ð44þ N ^C 2 ¼ 1 h HdH 0 N T þ T ^A 2 þ H d 2 ^B 2 N 2 ^A I c 0 N þ 2 ^A ^B I c 1 N 2 ^B d i H 1 N, ð45þ ^D 2 ¼ 1 h T þ 1 Y N 2cJ J N þ d i H 0 N : ð46þ We now rovide recurrences for comuting the quantities in (43) (46). Given # j we use the Kalman Filter calculations (22) (26) to determine the values of k and R k, from which we have (M ¼ 0, 1, 2) HdH M k ¼ a M k þ b M k k þ dk M ½R k þ 2 kš, ð47þ JbJ k ¼ a k þ b k k, ð48þ Ib 0 k ¼ s0 k þ t 0 k k, ð49þ I b 1 k ¼ s1 k þ t 1 k k, ð50þ Y k ¼ Y k 1 þ y 2 k, ð51þ where the various coefficients are determined as follows. Set then k ¼ 1 þ B2 R k C 2, ð52þ k ¼ 1 B 2 k C 2, ð53þ S k ¼ 1 k AB k R k C 2, ð54þ a 0 0 ¼ 0, b 0 0 ¼ 0, d 0 0 ¼ 1, a 0 kþ1 ¼ a 0 k þ b 0 ks k þ d 0 k½sk 2 þ k 1 Š, b 0 kþ1 ¼ b 0 k k þ S k þ 2d 0 k k S k, d 0 kþ1 ¼ 1 þ d 0 k 2 k, a 1 0 ¼ 0, b 1 0 ¼ 0, d 1 0 ¼ 0, a 1 kþ1 ¼ a 1 k þ b 1 ks k þ dk½s 1 2 k þ k 1 Š, b 1 kþ1 ¼ b 1 k k þ S k þ 2d 1 k k S k, d 1 kþ1 ¼ k þ d 1 k 2 k, a 2 0 ¼ 0, b 2 0 ¼ 0, d 2 0 ¼ 0, a 2 kþ1 ¼ a 2 k þ b 2 ks k þðd 2 k þ 1Þ½S 2 k þ k 1 Š, b 2 kþ1 ¼ b 2 k k þ 2ðd 2 k þ 1Þ k S k, d 2 kþ1 ¼ð1 þ d 2 kþ 2 k, ð55þ ð56þ ð57þ

Pairs trading 275 where rogrammers are warned to distinguish here between suerscrit 2 and squared terms a 0 ¼ 0, b0 ¼ y 0, a kþ1 ¼ a k þ b k S k, b kþ1 ¼ y kþ1 þ b k k, s 0 0 ¼ 0, t 0 0 ¼ 1, s 0 kþ1 ¼ s 0 k þ t 0 ks k, t 0 kþ1 ¼ 1 þ t 0 k k, s 1 0 ¼ 0, t 1 0 ¼ 0, s 1 kþ1 ¼ s 1 k þ t 1 ks k, t 1 kþ1 ¼ 1 þ t 1 k k : ð58þ ð59þ ð60þ Remarks: (a) Given ^# j the calculation of ^# jþ1 is comuted by the stes: initialize the Kalman Filter with 0 ¼ y 0 and R 0 ¼ D 2. If the k and R k have been calculated, the various coefficients may now be calculated using (52) (54) and then (55) (60). Find kþ1 and R kþ1 from the Kalman Filter equations. Continue until k ¼ N. Then comute the quantities in (47) (51) (at k ¼ N) and then ^# jþ1 from (43) (46). Some initial guess for ^# 0 must be made, and then iterations are concluded when the values for ^# j have converged sufficiently. Call this ^#(N). (b) If ^#ðnþ ¼ð^A, ^B, ^C 2, ^D 2 Þ, we should check that ^A > 0 and 0 < ^B < 1, else the airs trading algorithm should not be used with this data. (c) The rocedure described in (a) could be regarded as an initialization, and need not be reeated in subsequent stes, where only one iteration should suffice to udate the coefficients in the model. 3.3. Imlementation of the EM-Algorithm We will assume that model (18), (19) holds over N eriods. The values of ^#(N) and N are comuted based on the observations y 0, y 1,..., y N and a trade may be initiated as described in section 2, and ossibly unwound at t ¼ N þ 1 (or according to some other criterion). Based on ^#(N), Nþ1 is comuted based on the data y 1, y 2,..., y Nþ1 (the most recent N þ 1 values with the Kalman Filter initialized at 1 ¼ y 1 and R 1 ¼ D c 2 ðnþ) and a trade initiated. ^#ðn þ 1Þ is calculated with one iteration using section 3.2.1 or 3.2.2 and using the Kalman Filter based on data y 1, y 2,..., y Nþ1. The rocedure is then reeated. 4. Numerical examles Here we will rovide some simulation and calibration results which demonstrate that the Shumway and Stoffer algorithm rovides a consistent and robust estimating algorithm for the model. Studies based on Elliott and Krishnamurthy are given by Elliott et al. (in ress). Some initial exeriments have also been erformed with real data with a hedge fund. 0.5 0.4 0.3 0.2 0.1 0.95 0 0.9 0.85 0.8 0.75 To illustrate the tyical erformance of the Shumway and Stoffer EM algorithm, adated to estimation of the set fa, B, C, Dg, we consider a simulation with arameter values A ¼ 0:20, B ¼ 0:85, C ¼ 0:60 and D ¼ 0:80. Our observation set contained 100 oints. To initialize the EM algorithm, the following values were used: A ¼ 1:20, B ¼ 0:50, C ¼ 0:30 and D ¼ 0:70, with ^x 0j0 ¼ 0 and 0j0 ¼ 0:1. The EM algorithm was iterated 150 times. Figures 1 and 2 show convergence of the maximum likelihood estimates of all arameters. References 0.7 Estimate of A Estimate of B Figure 1. Convergence of the maximum likelihood estimates A and B. 0.8 0.6 0.5 0.4 Estimate of C 0.2 1.3 1.2 1.1 1 0.9 0.8 Estimate of D Figure 2. Convergence of the maximum likelihood estimates C and D. Elliott, R.J., Aggoun, L. and Moore, J.B., Hidden Markov Models, 1995 (Sringer: Berlin). Elliott, R.J. and Krishnamurthy, V., Exact finite-dimensional filters for maximum likelihood arameter estimation of

276 R. J. Elliott et al. continuous-time linear Gaussian systems. SIAM Journal of Control and Otimization, 1997, 35, 1908 1923. Elliott, R.J. and Krishnamurthy, V., New finite-dimensional filters for arameter estimation of discrete-time linear Gaussian models. IEEE Transactions of Automatic Control, 1999, 44, 938 951. Elliott, R.J., Malcolm, W.P. and van der Hoek, J., The numerical analysis of a filter based EM algorithm (in ress). Finch, S., Ornstein Uhlenbeck rocess. Unublished Note. Available online at: htt://auillac.inria.fr/algo/bsolve/ constant/constant.html. Gatev, E.G., Goetzmann, W.N. and Rouwenhorst, K.G., Pairs trading: erformance of a relative average arbitrage rule. NBER Working Paer 7032, National Bureau of Economic Research Inc., 1999. Available online at: htt://www.nber.org/ aers/w7032. Litterman, B., Modern Investment Management An Equilibrium Aroach, chater 1, 2003 (Wiley: New York). Nicholas, J.G., Market Neutral Investing Long/Short Hedge Fund Strategies, 2000 (Bloomberg Professional Library, Bloomberg Press: Princeton, NJ, USA). Reverre, S., The Comlete Arbitrage Desk-book, chater 10, 2001 (McGraw Hill: New York). Shumway, R.H. and Stoffer, D.S., An aroach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series, 1982, 3, 253 264. Shumway, R.H. and Stoffer, D.S., Time Series Analysis and Its Alications, 2000 (Sringer: New York). Vidyamurthy, G., Pairs Trading Quantitative Methods and Analysis, 2004 (Wiley: New York). Whistler, M., Trading Pairs Caturing Profits and Hedging Risk with Statistical Arbitrage Strategies, 2004 (Wiley: New York).