Statistical Arbitrage Using Eigenportfolios

Size: px

Start display at page:

Download "Statistical Arbitrage Using Eigenportfolios"

Erika Walsh
5 years ago
Views:

1 MATH 238 MATHEMATICAL FINANCE FINAL PROJECT Statistical Arbitrage Using Eigenportfolios Date: March 2 th, 214 Stanford University Contents 1 Abstract I Signal Generation 2 2 Factor Models and Pairs-Trading Using the Given Financial Data A Model for the Residual Process Generating a Trading Signal II Emperical Factors 18 6 Emperical Correlation Matrices The Marcenko-Pasteur Distribution for the Distribution of Eigenvalues

2 1 Abstract In this project, we will develop the basic pieces of a statistical arbitrage trading algorithm. In Part I we will investigate the trading signal used to drive the algorithm with a reduced set of data; in Part II we use a larger data set to create empirical factors to drive our trading strategy. This report follows closely the algorithm detailed in Avellaneda and Lee s 21 paper Statistical Arbitrage in the US Equities Market. Part I Signal Generation 2 Factor Models and Pairs-Trading In this part we look at how a trading signal might be developed for a given stock when there is only one factor available. Specifically, we will look at JP Morgan?s stock (ticker JPM) and we will use the XLF ETF (the financial firms ETF) as our factor. We want to start trading on Sep 1st, 214 and stop trading on Mar 1st, 215. One common technique for predicting the behavior of a stock is to use factor models. In this setting, historical data of the stock price or returns are regressed on certain factors; when the factors are observed in the future they can be used to make predictions about the stock?s behavior. Mathematically we write this as r stock = β T F where F is a vector of factors and β is a vector of loadings; the βs can be determined through regression. Alternatively, this model can be arrived at by studying pairs-trading. Suppose S and F are two stocks (or a stock and a ETF) with similar characteristics, then one expects the returns of the two to be highly correlated and that once this has been corrected for, the discrepancy between the returns can be suitably modeled. In a continuous form this can be stated as ds t = αdt + βdf t + dx t where dx t = αdt + dx t is a model for the residual process and X t is a mean-reverting process. A large value of X t suggests that the return of S is uncharacteristically large and likely to drop (relative to F) so one should go short in S and long in F. Observation 1 For the model ds t = αdt + β df t + dx t = β df t + d S t F t F X t (1) t β is proportional to the correlation coefficient of the stock S t with the factor or ETF F t in magnitude. Thus, it is a measure of how well the two stocks are linked. In this case, since the returns are a time-series, β measures the degree of co-integration between the two stocks. A positive value means that both increase together. A negative value on the other hand means that when the returns from one stock decreases, the returns from the other stock increases. If we denote the returns vector from S t as r S and the returns vector from F t as r F, then we know from simple regression analysis that β = cor(r S, r F ) var(r S ) var(r F ) = sign(cor(r S, r F )) cor(r S, r F ) and thus, the sign of the correlation determines the sign of β. var(r S ) var(r F ) (2) Page 2 of 28

3 Observation 2 We know that the XLF is the exchange traded fund consisting of stocks from the financial sector. It has 21 stocks with JPM having an index weight of 7.66 % as of March 11, 215 as per http: // Since JP Morgan Chase is a financial sector company, we expect that its stocks will perform well (poorly) when the entire sector does well (poorly). Thus, we expect that the returns of JPM and XLF will be positively correlated and hence we expect a positive sign for β. Some Definitions Market-Neutral A market neutral trading-strategy is a hedging strategy to avoid market risks. Ideally, it means that the portfolio we create must have zero correlation to the market. Market-neutral strategies are attained by taking matching long and short positions in different stocks to increase the return from making good stock selections and decreasing the return from broad market movements. As defined by Avellaneda and Lee, a market neutral portfolio is one where the dollar amounts Q i {i = 1,...,N} invested in each of the stocks are such that N β ij Q i = ; j = 1, 2,..., m (3) i=1 where β ij is the β of stock i on the factor j. Dollar-Neutral A dollar-neutral strategy is a type of market-neutral strategy. Dollar-Neutral trading strategy entails establishing simultaneously both a long and short position in two similar (for e.g. same sector) stocks with each position having the same absolute dollar amount. This strategy seeks to generate significant and consistent returns while controlling risk by maintaining a low correlation to broader market averages. β-neutral β-neutral trading strategies are another way of achieving market-neutrality. If we use the indicator of market behavior as a market index for e.g. the S&P 5 and let β be the regression coefficient of a given stock to the S&P 5 and hence to the market, then a β-neutral strategy creates a portfolio that is made up of stocks with a weighted average beta of meaning that the portfolio has no market exposure. The aim again, is to generate a profit without being exposed to market risk. Observation 3 Assume that we regress a stock on the market (e.g. S&P 5) and we subtract the means from the data vectors and scale the data with their standard deviations so that the data is standardized as mean and variance 1. Then, β is the correlation coefficient of the stock to the market. The market has a β of 1 with itself (by definition). Stocks linked with the market (perfectly correlated) have a beta of 1. A security that is negatively correlated (perfectly uncorrelated) to the market has a beta of -1 and a security that has no correlation with the market has a beta of. Thus, if we are truly aiming to be market neutral, we want to be uncorrelated to the market movements and hence a small absolute value of β nearer to. Page 3 of 28

4 3 Using the Given Financial Data We have daily financial data of both JPM (JPMorgan Chase) and the XLF ETF (Financial Select Sector SPDR ETF) in a CSV format. Observation 1 To generate the β for Dec 1st, 214, we need 6 trading days prior to this day. We know that the period between Sep 5th, 214 and Dec 1st, 214 has 87 calar days but 27 days are non-trading days due to weeks (13 Saturdays and 13 Sundays) and 1 public holiday (Thanksgiving day) and thus amounts to 6 trading days. To generate the β for Mar 1st, 215 (Sunday, holiday), we need 6 trading days prior to this day. We know that the period between Dec 2nd, 214 and Mar 1st, 215 has 89 calar days but 29 days are non-trading days due to weeks (13 Saturdays and 12 Sundays) and 4 holidays (Christmas Day - Thursday, December 25, 214, New Year s Day - Thursday, January 1, 215, Martin Luther King Day - Monday, January 19, 215, Presidents Day - Monday, February 16, 215). Thus, the total window over which data is required is: June 6th, 214 to February 27, 215. Note: In the given data files, data is given starting Sep 2nd, 214 upto Feb 27th, 215. Thus, we can start using a 6 day window starting November 25th, 214. Observation 2 The price of an asset as a function of time is perhaps the most natural financial time series but is not the best way to manipulate the data mathematically. The price of any reasonable asset will increase exponentially in time, but mathematical tools (e.g., correlation and regression) work most naturally with linear functions. Note that the mean value of an exponentially increasing time-series has no obvious meaning and the derivative of an exponential function is exponential. However, the simple net return defined as R t = P t 1 = P t P t 1 (4) P t 1 P t 1 has much more robust properties. The scale or units of the prices are irrelevant. Negative returns means the asset has declined in value and positive returns means that the asset has increased in value. Zero returns means that the asset is unchanged in value. The return is a complete and scale-free summary of investment performance. Furthermore, a nice property of returns is that multiplying them gives the return over longer period: 1 + R t [k] = P k 1 k 1 t P t j = = (1 + R t j ) (5) P t k P t j 1 j= Events like a market crash can trigger a large discrepancy between the price on two contiguous days. For example, on October 19th, 1987, the stock market crash caused the IBM stock price to fall from $ 134 to $ 13 on the same trading day. However, these price changes are algorithmically insignificant since if we are market neutral, our long position gain will negate the short position loss thereby keeping our portfolio unaffected. While using returns, since the returns dep only on the ratio rather than the prices themselves, we can ensure that the stock prices reference the same price at a given time by dividing all the prices with a constant fixed price. Observation 3 We choose to perform all predictions with Adjusted Closing Prices. There are I/O problems with the given CSV file in that MATLAB cannot read in the headers correctly using text read or dlmread. Thus, we choose to create two files of the adjusted closing prices for XLF and JPM and read that j= Page 4 of 28

5 instead. Note that the data is given in reverse chronological order. Thus, we have to flip the data column vector to make it chronological. Loss of one day due to calculation of returns: Note that we have data from Sep 2nd, 214 to Feb 27th, 215. Thus, we can calculate returns from Sep 3rd, 214 to Feb 27th, 215. Thus, we can use a 6 day window prior to the trading day under consideration (Avellaneda Paper page 762) starting Nov 26th, 214 upto Feb 27th, 215 which gives us predictions of β for 63 days. The MATLAB script to perform the read and calculate the β is as follows: clc; clear all; close all; % 2 - Using Given Financial Data % ========================================================================= % Read in the financial data JPM = dlmread( JPM_AdjClose.txt ); JPM = (fliplr(jpm )) ; XLF = dlmread( XLF_AdjClose.txt ); XLF = (fliplr(xlf )) ; N = length(jpm); % Create returns vector retjpm = (JPM(2:N) - JPM(1:(N-1)))./JPM(1:(N-1)); retxlf = (XLF(2:N) - XLF(1:(N-1)))./XLF(1:(N-1)); N = length(retjpm); % Evaluate regression coefficient alphavec = zeros((n-6),1); % Drift for Nov 25, 214 to Feb 27th, 214 betavec = zeros((n-6),1); % Beta for Nov 25, 214 to Feb 27th, 214 for i = 61:1:N [beta,betaint] = regress(retjpm((i-6):(i-1)),... [ones(6,1) retxlf((i-6):(i-1))]); alphavec(i-6) = beta(1); betavec(i-6) = beta(2); Observation 4 Now, we plot the β and the returns of the JPM stock. Page 5 of 28

6 Time Evolution of β.3 Time Evolution of JPM Returns 1.38 β Returns from JPM /1/14 12/24/14 1/9/15 1/26/15 2/9/15 2/24/15 Dates.3 12/1/14 12/24/14 1/9/15 1/26/15 2/9/15 2/24/15 Dates The βs do NOT vary as rapidly as the returns of the stock. A quantitative measure of how rapidly a signal varies is the autocorrelation of the signal. The longer the correlation length (defined as the lag beyond which the autocorrelation drops below the standard error), the slower its variation. We see that the correlation length of the returns is 1 whereas the correlation length of β is 16. The plots below show these autocorrelations. 1 Autocorrelation of β Autocorrelation of r JP M Sample Autocor r elation Sample Autocor r elation Lag Lag The relevant code is % Plot autocorrelation of betavec h1=figure(1); autocorr(betavec); xlabel( Lag, Interpreter, Latex, FontSize,18, FontWeight, bold ); ylabel( Sample Autocorrelation, Interpreter, Latex, FontSize,18,... FontWeight, bold ); title( Autocorrelation of $\beta$, Interpreter, Latex, FontSize,18,... FontWeight, bold ); set(gca, FontSize,16, FontWeight, bold ); Page 6 of 28

7 saveas(h1, betavecautocorr, epsc ); % Plot autocorrelation of retjpm h2=figure(2); autocorr(retjpm); xlabel( Lag, Interpreter, Latex, FontSize,18, FontWeight, bold ); ylabel( Sample Autocorrelation, Interpreter, Latex, FontSize,18,... FontWeight, bold ); title( Autocorrelation of $r_{jpm}$, Interpreter, Latex, FontSize,18,... FontWeight, bold ); set(gca, FontSize,16, FontWeight, bold ); saveas(h2, retjpmautocorr, epsc ); % Plot betavec h3=figure(3); dates = textread( date.txt, %s ); dates = (fliplr(dates )) ; plot(betavec, LineWidth,1.2); axis tight; a = [dates(1) dates(2) dates(3) dates(4) dates(5) dates(6)]; set(gca, XtickLabel,a); xlabel( Dates, Interpreter, Latex, FontSize,18, FontWeight, bold ); ylabel( $\beta$, Interpreter, Latex, FontSize,18, FontWeight, bold ); title( Time Evolution of $\beta$, Interpreter, Latex, FontSize,18,... FontWeight, bold ); set(gca, FontSize,16, FontWeight, bold ); grid on; saveas(h3, beta, epsc ); % Plot JPM h4=figure(4); dates = textread( date.txt, %s ); dates = (fliplr(dates )) ; plot(retjpm(61:123), LineWidth,1.2); axis tight; a = [dates(1) dates(2) dates(3) dates(4) dates(5) dates(6)]; set(gca, XtickLabel,a); xlabel( Dates, Interpreter, Latex, FontSize,18, FontWeight, bold ); ylabel( Returns from JPM, Interpreter, Latex, FontSize,18,... FontWeight, bold ); title( Time Evolution of JPM Returns, Interpreter, Latex, FontSize,18,... FontWeight, bold ); set(gca, FontSize,16, FontWeight, bold ); grid on; saveas(h4, JPM, epsc ); Observation 5 On the first day i.e. Novermber 26th, 215, we evaluate the process: x(k) = r JP M (k) β JP M r XLF (k) α JP M Page 7 of 28

8 k X(k) = x(l) The plot of X is presented below where the black line is the mean of the process. l=1.5 Time Evolution of Resiudal on Day 1 (Nov 26th, 214).45.4 X-ResidualProcess(Day1) /16/14 9/3/14 1/14/14 1/28/14 11/11/14 11/25/14 9/16/14 9/3/14 1/14/14 1/28/14 11/11/14 11/25/14 Dates Indeed, the process appears to be mean reverting. Note that X 6 = which is an artifact of the linear least squares regression. 4 A Model for the Residual Process Consider a portfolio that that holds $1 of the stock and $β of the ETF at time t (where β has been determined by regression based historical data). The return of such a portfolio over a small time window is approximately r t dt = (r stock βr ET F )dt = d X t where d X t = αdt + dx t is the residual process. For simplicity, we will discuss the case where α =, as α is often small in practice. Note that the process X t corresponds to the co-integrated process computed in the previous section, which we hypothesized as being a mean reverting process. Intuitively, we expect the behavior of the stock to fluctuate about the factor. Suppose we model the residual process with the most basic mean-reverting stochastic process, the Ornstein Uhlenbeck process, with dynamics dx t = κ(m X t )dt + σdw t where dwt is a standard Brownian motion. Under this model, it says that r t dt = dx t = κ(m X t )dt + σdw t which has an expected return of κ(m X t )dt. The implications of this for a trading strategy will be discussed below; however, it should be clear that estimation of the parameters of mean reversion is important. Page 8 of 28

9 Observation 1 Consider the portfolio that holds $1 of the stock S and $ β of the ETF at time t. Since the ETF is formed out of the stocks in the same industry sector as the stock S, we expect that the behavior of the stock S and all the companies in the sector (and hence of the ETF representing the sector) must be very similar i.e. they must be correlated. However, their may be particular times when the stock S is temporarily overpriced or underpriced due to certain conditions particular to the company. It is natural to expect that the same conditions will also affect the price of the stocks in the same industry sector if we believe in an efficient market. This means that the differences in the value of our dollar in the stock S and our β dollars in the ETF will be nullified over a certain time period which will be the mean-reversion time of the residual process. Thus, it makes sense to assume that X t is mean-reverting. This means that we are betting on the fact that in the ideal world, the returns on one dollar invested in stock S and β dollars invested in the ETF will be equal and perfect correlation will be achieved. Estimation of OU Parameters The MATLAB script for computing the daily OU parameters between November 26th, 214 and Feb 27th, 215 is shown below. % 3 - Model for the Residual Process % ========================================================================= % Declare memory allocation alphavec = zeros((n-6),1); betavec = zeros((n-6),1); kappa = zeros((n-6),1); m = zeros((n-6),1); sigma = zeros((n-6),1); sigmaeq = zeros((n-6),1); a = zeros((n-6),1); b = zeros((n-6),1); varzeta = zeros((n-6),1); s = zeros((n-6),1); dt = 1/252; % 252 trading days in a year % Computation of OU parameters for all days between No 26th, 214 to Feb 27th, 215 for i = 61:1:N % Evaluate regression coefficient [beta,betaint,res] = regress(retjpm((i-6):(i-1)),... [ones(6,1) retxlf((i-6):(i-1))]); alphavec(i-6) = beta(1); betavec(i-6) = beta(2); % Compute residual process on day i X = zeros(6,1); for j = 1:6 X(j) = sum(res(1:j)); % Regress as per AR 1 model Page 9 of 28

10 [beta1,beta1int,res1] = regress(x(2:6),[ones(59,1) X(1:59)]); % Compute OU parameters kappa(i-6) = -log(beta1(2))/dt; m(i-6) = beta1(1)/(1-beta1(2)); sigma(i-6) = sqrt(var(res1)*2*kappa(i-6)/(1-beta1(2)^2)); sigmaeq(i-6) = sqrt(var(res1)/(1-beta1(2)^2)); b(i-6) = exp(-kappa(i-6)*dt); a(i-6) = m(i-6)*(1-b(i-6)); varzeta(i-6) = var(res1); s(i-6) = -m(i-6)/sigmaeq(i-6); Histogram of Mean Reversion Times A histogram of the mean reversion times 1/κ is shown below. In Avellaneda (page ), the hypothesis is that the residual process X can be modeled as the Ornstein-Uhlenbeck process dx(t) = κ(m X(t))dt + σdw (t) and that the parameters κ, m and σ vary slowly in relation to the Brownian motion increments dw (t) in the time-window of interest. In the current simulation, the residual process for JPM is estimated over a 6-day time-window assuming implicitly that the parameters are constant over the window. In the case for the JPM stock, we see that the average mean reversion time is.26 with a standard deviation of.113. Furthermore, if κ >> 1, the stock reverts quickly to its mean and for us to be consistent with the hypothesis of constant parameters, we want 1/κ << T i = 6/252 =.2381 which is the case here since the most frequent mean reversion time of.2 is about 9 % of T i (as.2/.23 9%). Thus, the spread of the mean reversion times for the JPM stock is over an interval length of about.4 and since this interval is much smaller than T i (as:.4/ %), we can accept the hypothesis for the JPM stock and thus say that the 6 day window was reasonable. Note: In the Avellaneda paper, they select stocks with mean-reversion times less than 1/2 the period. For us, the longest mean-reversion time is 1/4 the period. Thus, we are clearly in the safe zone. Page 1 of 28

11 2 Histogram of Mean-Reversion Times Fr equency Mean Reversion Times 1/κ Observation 2 A histogram of the AR-1 coefficient b for the X process is shown below. The mean of b is.8387 and the standard deviation is.544. Thus, the histogram of b is much more compact and less spread as compared to τ = 1/κ. A quantitative measure for this would be the ratio of standard deviation to the mean (σ/µ). The ratio of τ is about 43.5 % whereas the ratio for b is about 6%. We can evaluate the skewness of τ and b. We obtain: Skewness of τ = 1.23 Skewness of b = -.14 Clearly, τ is more skewed than b. Recall that the two are related as b = exp ( κ t) The skewness of b is smaller than the skewness of κ since the change in b due to κ is damped by the exponential factor as is evident from db dκ = κe κ t = 1 τ e t/τ. Furthermore, the skew of τ is positive implying that there is a tency at some time instances for the assumptions of our hypothesis to fail i.e. τ << T i no longer holds. On the other hand, the skew of b is negative meaning that there are time instances where the assumptions become more relevant i.e. b not being close to 1. However, the bulk behavior of τ and b satisfy the assumptions in our hypothesis reasonably well. Page 11 of 28

12 9 Histogram of b Fr equency AR-1 Coefficient b 5 Generating a Trading Signal Suppose we invest $1 in the stock and short $β of the factor at time zero, then the profit of this investment is given by exp t which for small times t is approximately r s (t)dt β exp t r f (t)dt (1 β) (1 + r s ()t) β(1 + r f ()dt) (1 β) = r s ()dt βr f ()dt = d X t () which has expectation (assuming α = ) κ(m X t )dt What this says is that if you were to go long a dollar in the stock and short β dollars of the factor, your (expected) instantaneous rate of return would be κ(m X t )dt. So, clearly you want to do this when m X t is positive. Furthermore, as the cumulated rate of return from the start of entering the position is the integral t κ(m X t )dt + t σdw t you want to remain in this position as long as the expected additional rate of return is positive, so while m X t >. The closer to zero m X t becomes the more dominant the stochastic fluctuations are, and the more likely you are to start losing money (by switching to a region where the difference in the two positions becomes negative), therefore you want to enter this position at a time when m X t is sufficiently positive that you will remain in the position long enough to account for transaction costs, but exit as some point before m X t becomes negative. The parameter 1/κ = τ is the half-life of the process, the time it would take the deterministic system to move closer to m by a factor of e. In this sense, κ determines how quickly the process will return to its mean, and thus how quickly Page 12 of 28

13 m X t goes to zero. In the stochastic setting, it gives a measure of how frequent the excursions from the mean are, and how long they will last; this relationship between κ and the average fluctuation size is evident by the role it plays in the equilibrium standard deviation of the process: σ eq = σ 2κ σ eq can be used to non-dimensionalize the process, by measuring the size of m X t, in terms of standard deviations. Taking for example: s = X t m σ eq scales the displacement from the mean by the average size of such deviations, allowing one to detect large excursions easily across different processes. A large and positive value of s corresponds to the belief that the return of the stock with fall relative to the return of the factor so shorting the stock and buying the factor is a strong position; when s becomes smaller (but still positive), it is a good time to exit such a position. Conversely, when s is large and negative, taking a long position in the stock and a short position in the factor is correct. When computing the s value, updated estimates of β, κ, σ and m are used, based on the last 6 days. A trading strategy can thus be defined by four numbers: s long,enter, s long,exit, s short,enter and s short,exit where s long,enter < s long,exit < and s short,enter > s short,exit >. Observation 1 For each 6-day window, we can perform the regression of the returns of the stock with the returns on the ETF and obtain the residuals so as to get so that we get the auxiliary process r n JP M = β + βr n XLF + ɛ n ; n = 1, 2,... 6 X k = k ɛ j ; k = 1, 2,... 6 j=1 Note that X 6 = is an artifact of the regression due to the fact that the betas and residuals are estimated using the same samples. From the AR-1 model, we then estimate the parameters κ, m and σ using the formulas provided in the appix of Avellaneda (page ). We can then calculate σ eq = σ 2κ We are now ready to trade. The closer we are to the mean m, the dominant the stochastic fluctuations become. These fluctuations start representing a significant overpricing/underpricing of the stock with respect to the factors when we are sufficiently far away from the mean and thus it is justified to take a position when this event happens. A non-dimensional measure of this distance from the mean is the Z-score for the process denoted by the letter s and called the s-score here. Hence, define s = X t m σ eq = m σ eq noting that X t =. The s-score allows us to detect large excursions. Given the prior belief that the residual is mean-reverting, a large positive value means that the return of the stock will fall in a time window in the future and thus entering into a short position now and closing the position later will Page 13 of 28

14 generate profits; whereas a large negative value means that the stock is currently underpriced and that the the returns of the stock will rise and thus entering into a long position now and closing the position later will generate profits. We thus, define explicitly the rules for the trading based on the trading signal s: Observe the trading signal and execute as below: If s = s long,enter Enter a long position in the stock. If s = s long,exit If there was an open long position on the stock, exit the long position on the stock. If s = s short,enter Enter a short position in the stock. If s = s short,exit If there was an open short position on the stock, exit the short position on the stock. Trading Signal The trading signal generated over the trading period from Nov 26th, 214 to Feb 27th, 215 is shown below. 2.5 Statistical Arbitrage Enter Short 1 Exit Short Trading Signal.5.5 Exit Long Exit Long Enter Long Enter Long /1/14 12/24/14 1/9/15 1/26/15 2/9/15 2/24/15 Time It is seen that we enter a long position and close it 2 times. Similarly, we enter a short position and close it 1 time. The code for generating this signal has already been supplied on Page 8 of this report. Strategy Execution We can now execute the trading strategy over the data provided for the last 93 days if the averaging period is reduced to 3 days. We can then evaluate the Sharpe ratio of the statistical arbitrage strategy as opposed to the Sharpe ratio of trading just the JPM and XLF stocks. First let us Page 14 of 28

15 generate the trading signal and show the positions we take on 1 share of JPM (the opposite position on β shares of XLF stock). Shown below is the trading signal Statistical Arbitrage Trading Signal Long Enter Long Exit Short Exit Short Enter Position Held 1 Trading Signal Time The blue line is the trading signal based on the s-score and the pink line is the position we take on 1 share of the stock on JPM. Here +1 stands for going long and -1 stands for going short. We see that we make a profit and we can evaluate the Sharpe ratio for this profit. We assume that there is a 5% risk-free growth rate and that any money left over or lost at the of the day is put into/taken from a bank account with an interest rate of 5%. We see that: 1. Sharpe Ratio of Statistical Arbitrage of JPM and XLF = Sharpe Ratio of trading just JPM = Sharpe Ratio of trading just XLF =.216 The code is supplied below. clc; clear all; close all; % 4 - Generating a Trading Signal % ========================================================================= % Read in the financial data JPM = dlmread( JPM_AdjClose.txt ); JPM = (fliplr(jpm )) ; XLF = dlmread( XLF_AdjClose.txt ); XLF = (fliplr(xlf )) ; N = length(jpm); % Create returns vector retjpm = (JPM(2:N) - JPM(1:(N-1)))./JPM(1:(N-1)); Page 15 of 28

16 retxlf = (XLF(2:N) - XLF(1:(N-1)))./XLF(1:(N-1)); N = length(retjpm); % Declare memory allocation alphavec = zeros((n-3),1); betavec = zeros((n-3),1); kappa = zeros((n-3),1); m = zeros((n-3),1); sigma = zeros((n-3),1); sigmaeq = zeros((n-3),1); a = zeros((n-3),1); b = zeros((n-3),1); varzeta = zeros((n-3),1); s = zeros((n-3),1); dt = 1/252; % 252 trading days in a year for i = 31:1:N % Evaluate regression coefficient [beta,betaint,res] = regress(retjpm((i-3):(i-1)),... [ones(3,1) retxlf((i-3):(i-1))]); alphavec(i-3) = beta(1); betavec(i-3) = beta(2); % Compute residual process on day i X = zeros(3,1); for j = 1:3 X(j) = sum(res(1:j)); % Regress as per AR 1 model [beta1,beta1int,res1] = regress(x(2:3),[ones(29,1) X(1:29)]); % Compute OU parameters kappa(i-3) = -log(beta1(2))/dt; m(i-3) = beta1(1)/(1-beta1(2)); sigma(i-3) = sqrt(var(res1)*2*kappa(i-3)/(1-beta1(2)^2)); sigmaeq(i-3) = sqrt(var(res1)/(1-beta1(2)^2)); b(i-3) = exp(-kappa(i-3)*dt); a(i-3) = m(i-3)*(1-b(i-3)); varzeta(i-3) = var(res1); s(i-3) = -m(i-3)/sigmaeq(i-3); counter = 1; position = zeros(length(s),1); profitflag = zeros(length(s),1); while (counter < length(s)) if (s(counter) >= 1.25) Page 16 of 28

17 profitflag(counter) = 1; while (s(counter) >=.75 && counter < length(s)) position(counter) = -1; counter = counter + 1; fprintf( a%d\n,counter); profitflag(counter) = -1; continue; if (s(counter) <= -1.25) profitflag(counter) = -1; while (s(counter) <= -.5 position(counter) = 1; counter = counter + 1; fprintf( b%d\n,counter); profitflag(counter) = 1; continue; counter = counter + 1; fprintf( c%d\n,counter); && counter < length(s)) % Calculate Profit profit = sum(profitflag.*retjpm(31:n) - profitflag.*retxlf(31:n).*betavec); % Plot the trading signal h6=figure(6); plot(s, LineWidth,1.2); hold on; line1 = -1.25*ones((N-3),1); line2 = -.5*ones((N-3),1); line3 =.75*ones((N-3),1); line4 = 1.25*ones((N-3),1); plot(line1, g-., LineWidth,1.2); plot(line2, r-., LineWidth,1.2); plot(line3, r, LineWidth,1.2); plot(line4, g, LineWidth,1.2); plot(position, m, LineWidth,1.2); axis([ 1 (N-3) ]); l = leg( Trading Signal, Long Enter, Long Exit, Short Exit, Short Enter, Position Held ); set(l, Interpreter, Latex ); xlabel( Time, Interpreter, Latex, FontSize,18, FontWeight, bold ); ylabel( Trading Signal, Interpreter, Latex, FontSize,18, FontWeight, bold ); Page 17 of 28

18 title( Statistical Arbitrage, Interpreter, Latex, FontSize,18, FontWeight, bold ); set(gca, FontSize,16, FontWeight, bold ); grid on; saveas(h6, TradePerformance, epsc ); Part II Emperical Factors In the following section we study the development of empirical factors for use in the statistical arbitrage algorithm. The data was acquired through CRSP/WRDS It contains the returns and prices for all of the stocks with a capitalization over 2 billion for the entire time window Sep 1st, 27 to April 1st, 214. The data file contains 5 variables: names, tickers, returns, nstocks and ndays. 6 Emperical Correlation Matrices Suppose we want to generate empirical factors on day J; we will use a n d,ep = 252 day (one year) window to compute our statistics. We start by standardizing the the daily returns: if r ij is the return of the i th stock on the j th day, then the standardized return is?where???y i,j j = r i,j j r i s i, j =,..., n d,ep 1 s 2 i = r i = 1 n d,ep 1 n d,ep j= r i,j j n d,ep 1 1 (r i,j j r i ) 2 n d,ep 1 j= The rows of the matrix Y = Y i,j ji=1,...,nd,ep 1 are now mean zero and varance one. The correlation matrix can then be computed as 1 ρ J = n d,ep 1 Y Y T It is the principal componenets of this matrix that will form the basis of our emperical factors. If ρ J = V ΛV T is the eigen-decomposition of the correlation matrix V, where v k is the k th eigenvector, corresponding to the k th eigenvalue λ k and λ 1 λ 2... λ nstocks. Observe that n stocks i=1 λ i = trace(ρ J ) = n stocks. The eigen-portfolios, or empirical risk factors, are vectors p k = p k i i=1,...,n, stocks where p k i = vk i s i where v k is the i th weighting of stock i in the k th principal component and is its empirical standard deviation. The interpretation of p k is that in the k th empirical risk factor, we invest p k dollars in stock i (if it is negative, we short the stock). The return of the k th eigen-portfolio on day j is F kj = n stocks i=1 ˆp k i r ij Page 18 of 28

19 where ˆp k i = pk i (in other words, the weights have been normalized to sum to one). One crucial fact e T p k about the returns of the eigen-portfolio returns is that they are empirically orthogonal. Finally, the?percentage of explained variance? of a given set K of eigen-portfolios is defined by κ K λ k x1% n stocks Observation 1 We know that ρ J v (j) = λ j v (j) (6) Also, r ij is the return on the ith stock on the jth day and r i is the mean return on the stock i. Thus, we have n d,ep j= n d,ep (F k,j j F k )(F k,j j F k ) = ( j= n stocks Since, Y ik = r ik r i s i and ρ ij = 1 n d,ep nd,ep k=1 Y iky jk, we must have i=1 v (k) n stocks i (r i(j j) r i ))( s i i =1 v (k ) i (r i s (J j) r i )) (7) i n d,ep j= n d,ep (F k,j j F k )(F k,j j F k ) = ( = = n stocks j= i=1 n stocks i,i =1 n stocks i,i =1 v (k) n stocks i (r i(j j) r i ))( s i i =1 v (k) i v (k ) i ρ ii (n d,ep 1) v (k) i v (k ) i λ k (n d,ep 1) v (k ) i (r i s (J j) r i )) i (8) = λ k (n d,ep 1)δ kk = if j j Procedure to compute eigenportfolios We follow the procedure outlined in Avellaneda (pages ) to generate the eigenportfolios for day 252 upto day 52 from the given data. The MATLAB script used to do that is shown below. clear all; close all; % Part II: Emperical Factors % ========================================================================= % 5: Emperical Correlation Matrices % ========================================================================= % Load data load( finalproject_data.mat ); % Calculate Y matrix of standardized returns % Calculation starts on day 252 and goes upto day 52 (251 days) F1 = zeros(251,1); for day = 252:1:52 Page 19 of 28

20 % Calculate rbar on day rbar = sum(returns(:,(day-251):day),2)./252; % Calculate s^2 on day smat = returns(:,(day-251):day); for j = 1:252 smat(:,j) = (smat(:,j) - rbar).^2; s2 = sum(smat,2)./251; % Calculate matrix Y on day Y = returns(:,(day-251):day); for j = 1:252 Y(:,j) = (Y(:,j) - rbar)./sqrt(s2); % Calculate matrix rhoj on day rhoj = (1/251).*Y*Y ; % Calculate eigenportfolios on day [V,D] = eig(rhoj); P = V; for j = 1:422 P(:,j) = P(:,j)./sqrt(s2(j)); lambda = diag(d); Day 1 On the first day: (i) Smallest number of eigenportfolios required to explain 55 % of variance = 27 (accounts for %) (ii) Variance accounted for by the first eigenportfolio is = 23.7 % Density of Eigenvalues We can plot the density of the eigenvalues of the correlation matrix resulting on day 1. The plot is shown below. The tail of the bulk spectrum is considerable and when a bin size of.5 is selected, we see that there are 3 detached eigenvalues. Specifically, they are: , and Page 2 of 28

21 8 Density of Eigenvalues of day 1 7 Percent of Eigenvalues Eigenvalues λ (Bin Size =.5) Cumulative Return of the Principal Eigen-portfolio Returns F 1 j First EigenPortfolio SPY Cumulative Return from principal eigenportfolio Day of trading window Page 21 of 28

The cumulative return on a given day j of the principal eigen-portfolio is given as where p {k} i F 1j = = v{k} i s i n stocks i=1 ˆ p {1} i r ij and ˆp {k} i = p{k} i e T p {k} We go to Google

22 The cumulative return on a given day j of the principal eigen-portfolio is given as where p {k} i F 1j = = v{k} i s i n stocks i=1 ˆ p {1} i r ij and ˆp {k} i = p{k} i e T p {k} We go to Google Finance website and download the prices of the SPY index and calculate its returns. We see that there is excellent agreement between the market behavior and the principal eigenportfolio between Apr 1st 26 and Apr 1st 27. The plot is shown. Financial Implications of Weighting It has been pointed out in literature that the first eigenvector or the dominant eigenvector from the PCA analysis used to create the first eigenportfolio is associated with the market portfolio i.e. the portfolio that is a capitalization weighted portfolio of all the stocks under consideration. Their behavior is qualitatively the same. Since the weights assigned to stocks in a capitalization weighted portfolio must be positive, we see that the eigenportfolio also assigns positive weights to the the stocks. Avellaneda and Lee notice that the weights in the dominant eigenportfolio are inversely proportional to the volatilities of the stocks which is intuitively expected since larger the market capitalization of a stock, the lesser its volatility. Since the weights of the stocks in the first eigenportfolio are all positive and we know from previous parts that the eigenportfolios are all orthogonal, the weights of the stocks in these other eigenportfolios must be a combination of positive and negative numbers. It is observed that if we were to relabel the stocks by arranging the coefficients of the eigenvectors in decreasing order i.e. for the jth eigenportfolio, we relabel the entries as: v (j) n1 v (j) n2 v (j) n3 v (j) n4... v (j) nn then the nearest stocks around the stock n i t to be in the same industry sector. As j increases, this observation becomes less and less true since the eigenvectors become more noisy. It is under this paradigm that we can interpret at least for the first few eigenportfolios after the dominant one as long-short portfolios at the level of the industries or sectors meaning that the ith stock having a negative value is shorted and the jth stock with a positive value has a long position such that i and j are in different industry sectors. Now, consider the picture below of the situation described in the question. The second eigenportfolio corresponds to the situation where we go long on both the ith and jth stocks whereas the third eignportfolio corresponds to the situation where we go short on the ith stock and long on the jth stock. Page 22 of 28

23 7 The Marcenko-Pasteur Distribution for the Distribution of Eigenvalues Suppose that X is a matrix with m rows and n columns and that each entry X ij is an iid meanzero-variance- one Gaussian random variables. The Marcenko-Pastur distribution of the eigenvalues of the correlation matrix 1 n 1 XXT = V ΛV T have a limiting density ρ(λ) = Q (λ+ λ)(λ λ ), λ [λ, λ + ]; λ ± = (1 ± Q 2π λ 1 ) 2 where Q = n/m and the limit is as m, n so that Q is a constant. In other words P r(λ k (λ + dλ) φ)dλ = ρ(λ)dλ # of eigenvalues in(λ, λ + dλ) m Observation 1 Probability Density (pdf) Marcenko-Pastur Density (Bin Size =.5) Experimental Marcenko-Pastur λ Figure 1: A plot of the experimental eigenvalue density compared to the Marcenko-Pastur density. We generate a 1 x 5 random matrix and plot the experimental eigenvalue distribution and compare it to the theoretical Marcenko-Pastur density. We see very good agreement of the bulk spectrum. The Marchenko-Pastur distribution, or Marchenko-Pastur law, describing the asymptotic behavior of singular values of large rectangular random matrices is indeed verified. The plot is shown below. The smallest eigenvalue is.356 and the largest eigenvalue is Also observed is the fact that as the bin size is refined, there is better and better fit of the experimental density to the Marcenko-Pastur density. The code used to generate it is also shown below. % 6 - Marcenko-Pastur Distribution % ========================================================================= Page 23 of 28

24 n = 5; % Number of columns m = 1; % Number of rows Q = n/m; % Generate matrix X = normrnd(,1,m,n); A = 1/(n-1)*X*X ; % Perform Eigenvalue decomposition [E,D] = eig(a); d = diag(d); % Create density from histogram locs = :.5:2.5; [heights] = hist(d,locs); hts = heights./length(d); % Create Marcenko-Pastur density lambda_plus = (1+sqrt(1/Q))^2; lambda_minus = (1-sqrt(1/Q))^2; lambda = lambda_minus:.5:2.2; rho = Q/2/pi*sqrt((lambda_plus - lambda).*(lambda - lambda_minus))./lambda; rho = rho./sum(rho); % Create figure figure(1); plot(locs,hts,lambda,rho, LineWidth,1.2); axis([ 2.5.5]); l= leg( Experimental, Marcenko-Pastur ); set(l, Interpreter, Latex, Position,[ ,.2]); xlabel( $\lambda$, Interpreter, Latex, FontSize,18, FontWeight, bold ); ylabel( Probability Density (pdf), Interpreter, Latex, FontSize,18, FontWeight, bold ); title( Marcenko-Pastur Density (Bin Size =.5), Interpreter, Latex, FontSize,18, FontWeight, bold ); set(gca, FontSize,16, FontWeight, bold ); grid on; Observation 2 We compare the density of the eigenvalues obtained from the standardized data set of 52 days and compare that to the theoretical Marcenko-Pastur density that would be obtained if the returns were IID. Page 24 of 28

25 .12 Marcenko-Pastur Density (Bin Size =.5.12 Marcenko-Pastur Density (Bin Size =.5 Probability Density (pdf) Expe rime ntal Marcenko-Pastur Probability Density (pdf) Expe rime ntal Marcenko-Pastur λ Figure 2: The full density of the eigenvalues obtained from the data. Notice the detached eigenvalues also called the spike spectrum λ Figure 3: A zoomed version of density of the eigenvalues obtained from the data on the support of the Marcenko-Pastur density. Notice the sharp termination of the theoretical density as compared to the flattening out of the observed density. Further, there is disagreement in the bulk spectrum. Observation 3 We randomize each time series in the data indepently so as to remove any correlations in the returns. We proceed to perform the same experiment as above. We obtain..9 Marcenko-Pastur Density (Bin Size =.5).12 Marcenko-Pastur Density (Bin Size =.5 Probability Density (pdf) Expe rime ntal Marcenko-Pastur Probability Density (pdf) Expe rime ntal Marcenko-Pastur λ Figure 4: The full density of the eigenvalues obtained from the data. Notice there are NO detached eigenvalues λ Figure 5: A zoomed version of density of the eigenvalues obtained from the data on the support of the Marcenko-Pastur density. Page 25 of 28

26 Now notice the extremely good agreement of the observational density with the theoretical Marcenko-Pastur density. We can conclude the following: When the correlations are removed from the time series data, we see very good agreement with the theoretical Marcenko- Pastur density of the eignevalues of the correlation matrix. This agreement disappears when we use the data as is where correlations are present. Thus, we conclude that real life returns are NOT IID Gaussian variables and thus the correlations in the data play an important role. Here we present the code for Questions 2 and 3. clc; clear all; close all; % Part II: Emperical Factors % ========================================================================= % 6: Marcenko-Pastur on data % ========================================================================= % Load data load( finalproject_data.mat ); % Standardize Returns Matrix stdret = returns; for i = 1:1:422 stdret(i,:) = stdret(i,:) - mean(stdret(i,:)); stdret(i,:) = stdret(i,:)./std(stdret(i,:)); % Compute data correlation matrix rho = (1/51)*stdret*stdret ; % Compute eigenvalues of the matrix [V,D] = eig(rho); lambda = diag(d); % Create density from histogram locs = :.5:15; [heights] = hist(lambda,locs); hts = heights./length(lambda); % Marcenko-Pastur parameters n = 52; % Number of columns m = 422; % Number of rows Q = n/m; lambda_plus = (1+sqrt(1/Q))^2; lambda_minus = (1-sqrt(1/Q))^2; lambdamp = lambda_minus:.5:4; densmp = Q/2/pi*sqrt((lambda_plus - lambdamp).*(lambdamp - lambda_minus))./lambdamp; densmp = densmp./sum(densmp); Page 26 of 28

27 % Create figure h1=figure(1); plot(locs,hts,lambdamp,densmp, LineWidth,1.2); %plot(lambdamp,densmp, LineWidth,1.2); axis([ 4.12]); l= leg( Experimental, Marcenko-Pastur ); set(l, Interpreter, Latex, Position,[ ,.2]); xlabel( $\lambda$, Interpreter, Latex, FontSize,18, FontWeight, bold ); ylabel( Probability Density (pdf), Interpreter, Latex, FontSize,18, FontWeight, bold ); title( Marcenko-Pastur Density (Bin Size =.5), Interpreter, Latex, FontSize,18, FontWeight, bold ); set(gca, FontSize,16, FontWeight, bold ); grid on; saveas(h1, MPvsRealZoom, epsc ); % Create random permutations in the time series for i = 1:1:422 index = randperm(numel(stdret(i,:))); stdret(i,:) = stdret(i,index); % Compute data correlation matrix rho = (1/51)*stdret*stdret ; % Compute eigenvalues of the matrix [V,D] = eig(rho); lambda = diag(d); % Create density from histogram locs = :.5:15; [heights] = hist(lambda,locs); hts = heights./length(lambda); % Create figure h2=figure(2); plot(locs,hts,lambdamp,densmp, LineWidth,1.2); %plot(lambdamp,densmp, LineWidth,1.2); axis([ 4.12]); l= leg( Experimental, Marcenko-Pastur ); set(l, Interpreter, Latex, Position,[ ,.2]); xlabel( $\lambda$, Interpreter, Latex, FontSize,18, FontWeight, bold ); ylabel( Probability Density (pdf), Interpreter, Latex, FontSize,18, FontWeight, bold ); title( Marcenko-Pastur Density (Bin Size =.5), Interpreter, Latex, FontSize,18, FontWeight, bold ); set(gca, FontSize,16, FontWeight, bold ); grid on; Page 27 of 28

28 saveas(h2, MPvsReal2Zoom, epsc ); Page 28 of 28

9.1 Principal Component Analysis for Portfolios

9.1 Principal Component Analysis for Portfolios Chapter 9 Alpha Trading By the name of the strategies, an alpha trading strategy is to select and trade portfolios so the alpha is maximized. Two important mathematical objects are factor analysis and