Performance of risk-based asset allocation strategies

Size: px

Start display at page:

Download "Performance of risk-based asset allocation strategies"

Hilary Blankenship
5 years ago
Views:

1 Performance of risk-based asset allocation strategies Copenhagen Business School 2015 Master s Thesis Cand.merc.(mat.) 01/07/2015 Authors: Simen Knutzen Jens Retterholt Supervisor: Martin Richter Simen Knutzen Jens Retterholt 109 pages characters

2 Abstract This thesis measured the performance and behaviour of the most used risk-based asset allocation strategies, i.e., the global minimum variance portfolio, the equally-weighted risk contributions portfolio, the most diversified portfolio, and the equally-weighted portfolio. We analyzed how well these portfolios performed when re-investing returns, and to what degree they were useful as risk targeting tools. The covariance matrix estimation was applied in line with the most recent studies in this field. To estimate the covariance matrix we used the method of shrinkage. As the covariance matrix is of great importance in the construction of the portfolios, we included two shrinkage targets to examine the impact the different covariance matrices had on the results. These targets matrices were chosen from the constant correlation model and the single-index model. The data subject to study were the historical components of the EURO STOXX 50 from February 1998 to March This study found that all the risk-based asset allocation strategies performed better than the market index and the equally-weighted portfolio when reinvesting returns. The results among the strategies differed when different covariance matrices were used. Results obtained from the study found that the equally-weighted risk contributions portfolio was the most robust towards covariance matrix changes. Our study showed that the global minimum variance portfolio performed best among all strategies regardless of covariance matrix used when re-investing returns. The study also found the equally-weighted portfolio to be the most useful to obtain a constant risk target since this portfolio was closest to the target, and its results was insensitive to covariance matrix changes. The other strategies consistently underestimated their risk, and their risk targeting abilities were highly influenced by the choice of covariance matrix used.

3 Contents 1 Introduction Problem formulation Limitations Structure Theory Assumptions Definitions Covariance matrix Sample covariance matrix Shrinkage estimator Shrinkage targets Shrinkage intensity Markowitz portfolio theory Justification for the mean-variance strategy The unconstrained global minimum variance portfolio The long-only global minimum variance portfolio Maximum diversification Diversification ratio The most diversified portfolio The equally-weighted portfolio Risk parity Risk contribution and marginal risk contribution The equally-weighted risk contributions portfolio Numerical examples Equal volatility example Constant correlation example Statistical measures Sharpe ratio Sortino ratio Maximum drawdown Skewness and kurtosis i

4 2.10 Turnover and transaction costs Methodology Data Data source and data type The investment universe Risk-free rates Calculation methodology Basic calculations Covariance matrix estimation Global minimum variance portfolio Most diversified portfolio Equally-weighted risk contributions portfolio Equally-weighted portfolio Sharpe ratio Sortino ratio Maximum drawdown Skewness and kurtosis Turnover and transaction costs Constant risk targeting Results How well do these portfolios perform when re-investing returns? What are the characteristics of the portfolios returns distribution? How concentrated are the portfolio weights? What are the levels of turnover and transaction costs? To what degree can these portfolios be used to obtain a constant risk target? How can our choice of covariance matrix affect results? Discussion Discussion Further research Conclusion 100 Bibliography 103 Appendix A Results using constant correlation as target 110 A.1 Figures using the constant correlation model A.2 Tables using the constant correlation model ii

5 Appendix B R program 117 B.1 Main program B.2 Functions used in main program B.3 Graphing functions iii

6 List of Figures 2.1 The efficient frontier Nominal/risk allocation in the 60/40 strategy Equal risk contributions Number of stocks fulfilling data requirement Net return EURO STOXX 50 in euro Risk-free rate Cumulative performance - single-index model Return-standard deviation - single-index model Rolling 24-month volatility - single-index model Rolling 24-month Sharpe ratio - single-index model Rolling 24-month Sortino ratio - single-index model Density of portfolio returns - single-index model Number of stocks invested in for each portfolio - single-index model Maximum absolute values of portfolio weights - single-index model Minimum absolute values of portfolio weights - single-index model Annualized turnover - single-index model Performance after transaction costs - single-index model Rolling 24-month volatilities with volatility target 10% - single-index model Leverage constant over time - single-index model Rolling 24-month volatilities with volatility target 10% - constant correlation model Leverage constant over time - constant correlation model Shrinkage intensity - single-index model Shrinkage intensity - constant correlation model Cumulative performance - constant correlation model A.1 Rolling 24-month volatility - constant correlation model A.2 Return-standard deviations - constant correlation model A.3 Rolling 24-month Sharpe ratio - constant correlation model A.4 Rolling 24-month Sortino ratio - constant correlation model A.5 Density of portfolio returns - constant correlation model A.6 Number of stocks invested in for each portfolio - constant correlation model iv

7 A.7 Maximum absolute values of portfolio weights - constant correlation model A.8 Minimum absolute values of portfolio weights - constant correlation model A.9 Annualized turnover - constant correlation model A.10 Performance after transaction costs - constant correlation model v

8 List of Tables 2.1 Equal volatility example Constant correlation example Transaction costs Countries and supersectors covered by the EURO STOXX Component changes in the EURO STOXX Returns and standard deviations - single-index model Maximum drawdowns, Sharpe ratios, Sortino ratios - single-index model Return distribution characteristics - single-index model Portfolio concentrations - single-index model Turnover and transaction costs - single-index model Sharpe ratios after transaction costs - single-index model Volatilities with target - single-index model Volatilities with target - constant correlation model Statistics of shrinkage intensity Returns and standard deviations - constant correlation model Summary of results with both targets Relative changes in results A.1 Maximum drawdowns, Sharpe ratios, Sortino ratios - constant correlation model 115 A.2 Return distributions characteristics - constant correlation model A.3 Portfolio concentrations - constant correlation model A.4 Turnover and transaction costs - constant correlation model A.5 Sharpe ratios after transaction costs - constant correlation model vi

9 Chapter 1 Introduction The problem of how to optimally allocate wealth among assets has been and still is a major concern for investors. This problem has also attracted the attention of scholars from various fields and has a long history in the literature. The theory and practice of how to construct optimal portfolios have evolved over the years. After two financial crises since the beginning of the millennium we have observed that the investors appetite for risk has changed. The increased volatility in the market over the years has led to more focus on risk management and more focus on the ability to construct portfolios that can handle the turbulence in the market. Portfolio strategies that only need risk as input have been revisited and expanded over the recent years. These strategies do not need estimates of expected returns to be implemented, and they are drawing attention because expected returns are considered notoriously hard to predict. The oldest of these strategies is the global minimum variance portfolio, a special case from the famous mean-variance framework of Markowitz (1952). Markowtiz s mean-variance framework was ground breaking when it was published and is still highly relevant. The idea of creating a portfolio of risky assets with the lowest possible risk in the market is compelling. Thus, over the recent years, funds that follow this strategy have been opened. Choueifaty and Coignard (2008) followed up with another risk-based asset allocation strategy when they revisited the field of diversification and created the maximum diversification framework. With the goal of maximizing the benefits from diversification in mind, they introduced the diversification ratio and the most diversified portfolio that maximizes this ratio. Another recently studied risk-based asset allocation strategy was formulated after the idea of budgeting risk. This strategy, initiated subsequent discussions on traditional risk parity approaches. The large difference in risk contributions in traditional strategies led to the idea that all assets of a portfolio should equally contribute with risk. This lead the way to the equallyweighted risk contributions portfolio described by Maillard et al. (2008) where risk contributions are equal for all components in the portfolio. 1

10 1.1 Problem formulation Some of the risk-based asset allocation strategies are relatively new. Thus, there exists only a few studies on historical data where all of these strategies are included. Consequently, studies on risk-based asset allocation strategies on historical data are relevant due to the increased interest in the field. We wanted to contribute to this field of research by studying the theory and background of these models, and analyze their performance on historical data from Europe. In this thesis, the subject of study were the three most common risk-based strategies as described by Jurczenko et al. (2013): (1) The global minimum variance (GMV) portfolio by Markowitz (1952); (2) the most diversified portfolio (MDP) by Choueifaty and Coignard (2008); (3) and the equally-weighted risk contributions (ERC) portfolio from Maillard et al. (2008). To compare the performance of these three strategies, we included the equally-weighted (1/n) portfolio which is interesting because of its simplicity and widespread use. The research question in this thesis is: How do GMV, MDP, ERC, and 1/n portfolios perform and behave in comparison to each other? To answer the above research question we will answer the following sub questions: How well do these portfolios perform when re-investing returns? What are the characteristics of the portfolio return distributions? How concentrated are the portfolio weights? What are the levels of turnover and transaction costs? To what degree can these portfolios be used to obtain a constant risk target? How can our choice of covariance matrix affect results? 2

11 1.2 Limitations When we refer to risk-based asset allocation strategies in this thesis, we refer to the the GMV portfolio, the MDP, and the ERC portfolio. There exist more asset allocation models based solely on risk, but we have chosen to only study the above. In addition to these strategies, the equally-weighted portfolio will be included as a benchmark in this thesis. To answer the problem formulation and sub questions we will use an investment universe of historical data with only stocks and a risk-free rate in some situations. The components of the EURO STOXX 50 from February 27th 1998 to March 31st 2015 will make up the stock investment universe in our analysis. No other stocks than the components of this index at a given point in time will be included and all conclusions will be based on this data in the specified time period. We know investors most likely have more investment opportunities than these 50 stocks, but we believe an analysis of this investment universe is sufficient to draw conclusions. The approximation for the risk-free rate in this thesis is the Euribor 1-month from December 31st 1998, and the LIBOR 1-month in Deutsche Mark prior to that date. The reason we use two interest rates is that the euro was introduced January 1st 1999, and another risk-free rate must be used before January 1st We assume the Euribor 1-month rate was available for trading one day prior to the launch of the euro because we use monthly data from the last trading day every month. We know these interest rates are not totally risk-free since there exists a small credit risk, but we will treat them as risk-free in this thesis. We have chosen to only use the covariance matrix estimation with shrinkage. We know other promising methods exist but we limit our analysis to only use shrinkage. In addition to the limitations above, we limit the analysis by assuming all assumptions regarding the models and other methods used in this thesis are sufficiently fulfilled. The results we present will not be corrected for potential violations of assumptions, such as normal distributed stock returns. 3

12 1.3 Structure This thesis is structured into five chapters which all includes subsections. The first chapter contains the introduction, the problem statement, the limitations of this thesis, and now the structure. The rest of this thesis is organized as follows: Chapter 2 - Theory We begin this chapter by discussing central assumptions behind the theory used in this thesis. We then define some basic variables used in the derivations in the theory chapter. After the basic definitions we proceed to deriving the theory behind covariance matrix estimation with shrinkage using the constant correlation and the single-index model. After discussing the covariance matrix we derive the GMV, MDP, ERC, and 1/n portfolios. In addition to the above, we will discuss the theory behind some measures for portfolio performance and statistics used in the analysis. This includes Sharpe ratio, Sortino ratio, skewness, kurtosis, turnover, transaction costs, and maximum drawdown. Chapter 3 - Methodology In this chapter we discuss the data used in this thesis in detail. We explain how we gathered the data and what kind of data types we have used in the analysis. We explain in detail how we created the investment universe, and provide information about how the EURO STOXX 50 is constructed. Our choices regarding the risk-free rate are also discussed. The rest of the chapter explains how we have used the previous discussed theory in practice. We explain how we have implemented the theory in R, and we explain how this program works. The R script we created to perform all calculations can be seen in the appendix. By running this script on the same data as in this thesis, all results and graphs can be replicated. Chapter 4 - Results In this chapter we present the results from the analysis on historical data. This presentation includes graphs and key values in tables, in addition to a presentation of the findings. All results needed to answer the research question and sub questions are presented in this chapter. Chapter 5 - Discussion In this chapter we discuss the weaknesses of our analysis and what might have influenced the results. We compare our results to previous findings and outline similarities and differences. We end the chapter with a proposal for future studies. Chapter 6 - Conclusion We end this thesis with our conclusion. Here we answer the research question, by answering all of the sub questions. 4

13 Chapter 2 Theory In this chapter all the theory used in this thesis is discussed and derived. The theory includes all the risk-based asset allocation strategies and the covariance matrix estimation with shrinkage. After discussing the different strategies we will show some special characteristics of these strategies, with two numerical examples. The theory behind the ratios we will use in the discussion of performances is also included in this chapter. Finally, we will explain how we calculate turnover and transaction costs when these measures were included in the analysis. 2.1 Assumptions In this section the assumptions behind the theory used in this thesis will be presented. We know that not all of these assumptions are valid in practice, but as in the in academic papers we have used, some assumptions are needed for the theory to be applicable. The covariance matrix estimation process with shrinkage which is applied is in need of some assumptions to be a valid model. We follow the same assumptions for covariance matrix shrinkage as Ledoit and Wolf (2003b,a). Their first assumption regarding the covariance matrix is that returns are independent and identically distributed in time. This assumption states that there is no auto-correlation between the returns in time. So the return, r t+1 is independent of the return r t for all assets. Ledoit and Wolf (2003b) also assume that the returns on different stocks, have the same type of distribution. This means that for example if stock a and its returns are normally distributed, so should stock b and its returns be. It should be noted that in Ledoit and Wolf (2003b,a) stock returns are not assumed to be normally distributed. Where normal distributed returns is an assumption, for example in the mean-variance model, we assume the stock returns fulfil this requirement. Ledoit and Wolf (2003b) also assume that the fourth moment is finite. By assuming this we can apply the central limit theorem on the sample covariances and variances. 5

14 When we implement shrinkage with the single-index model by Sharpe (1963), Ledoit and Wolf (2003b) states that there are two specific assumptions that are needed for the model to be used in the shrinkage framework. The first assumption is that the true covariance matrix of the population needs to be unequal of the true covariance matrix for the single-index model Σ Φ (2.1) The second assumption related to the single-index model, is that the variance of the market portfolio needs to be positive σ 2 m > 0 (2.2) We assume all assets are infinitely divisible. That means all stocks can be divided into any desired fraction, both long and short. Where short-sales are allowed, we assume there is no limit on how much an investor can go short in assets. This also means we assume all investors have the opportunity to short-sale assets. Where the risk-free asset is included we have no restrictions on the amount an investor can lend out or borrow at the risk-free rate. When we rebalance the portfolios, returns and transaction costs are calculated based on the closing prices on the last trading day of each month. We assume all transactions are carried out at the closing price simultaneously with no delay. Stocks are therefore sold and bought at the same time. All stocks have buyers that are willing to pay the closing prices for the stocks, and all stocks have sellers who offer to sell the stocks at the closing prices. We assume that the size of an order does not matter, and all orders are executed instantaneously regardless of size. Since all transactions are carried out at the closing prices, we disregard the effects large orders can have on stock prices. When we calculate returns every month we assume the investors does not pay any tax on returns. Transaction costs on the other hand are implemented in parts of the analysis, but kept out of the optimization problems. 2.2 Definitions In this section, the basic definitions needed for the derivation of all the portfolios are presented. Matrices will be defined with bold upper case letters and vectors with bold lower case letters. The covariance matrix will not be discussed in this section, but will have a section of its own, as we consider the estimation of this matrix to be highly important. In the analysis we consider an investor who wants to invest in the financial markets. In each of the models in this thesis we consider two periods at the time. In general terms the investment is carried out at time t = 0, and the returns of the investment is collected at t = 1. The investor 6

15 has an initial wealth, W 0 > 0, and he will invest all of this wealth at time t = 0. The investment universe consists of n assets that the investor can choose to invest in. The prices of these n assets at t = 1 and t = 0 are given by S i,t for i = 1,..., n t = 0, 1 (2.3) The prices one period ahead is not known, and S i,1 is therefore stochastic and not known until t = 1. The rate of return on asset i from t = 0 to t = 1 is stochastic at the point of investment and is defined as The vector of the n rates of return is defined as r i = S i,1 S i,0 S i,0 (2.4) r = (r 1,..., r n ) (2.5) and is stochastic at t = 0. The vector of the expected returns is defined as µ = E(r) = (E(r 1 ),..., E(r n )) (2.6) where it is expected that not all entries in µ are equal. As we have discussed earlier, we do not need these expected returns to find the desired portfolios for the analysis. However, the expected returns are part of the general derivation of the mean-variance model and are therefore needed for parts of our derivation of the global minimum variance portfolio. The variance of asset i s returns up to period T is defined as σ 2 i = 1 T T (r i,t r i ) 2 (2.7) t=1 where r i is the mean return of asset i up to period T r i = 1 T T r i,t (2.8) t=1 The standard deviation of asset i s returns is then σ i = σ 2 i (2.9) All the standard deviations can be gathered in the vector v for later use v = (σ 1,..., σ n ) (2.10) 7

16 The covariance is the movement of two variables in relation to each other. between the returns of asset i and asset j is defined as σ ij = cov (r i, r j ) = T t=1 (r i,t r i ) (r j,t r j ) T The covariance (2.11) The variance of an asset s return can be seen as the special case of covariance when i = j cov (r i, r i ) = σ 2 i (2.12) We know that the covariance is symmetric, i.e. cov (r i, r j ) = cov (r j, r i ) (2.13) The following relation is useful when considering the variance of the sum of two variables Var (ar i ± br j ) = a 2 Var (r i ) + b 2 Var (r j ) ± 2abCov (r i, r j ) (2.14) The correlation between the returns of asset i and the returns of asset j is a measure of the movement in standardized covariance, and will always be between -1 and 1. The correlation is defined as corr(r i, r j ) = ρ ij = σ ij σ i σ j (2.15) At t = 0 the investment is chosen by buying a i units of asset i, where i = 1, 2, 3,..., n. Since all of the initial wealth at t = 0 is invested, we have that W 0 = n a i S i,0 (2.16) i=1 We have chosen to work with relative portfolio weights in this thesis rather than absolute portfolio weights. The value of the investment in the i th asset is therefore measured relative to the value of the total investment. The value of the investment in asset i is given by the weight x i = a i S i,0 n i=1 a is i,0 = a is i,0 W 0 (2.17) and can be regarded as the proportion of the wealth at t = 0 invested in asset i. The vector of all the portfolio weights is defined as x = (x 1,..., x n ) (2.18) 8

17 Since the weights are measured relatively, all the weights must sum to 1 x 1 = n x i = i=1 n i=1 a i S i,0 W 0 = 1 (2.19) Any vector that satisfy the above is considered an investment strategy. These weights will be positive for all strategies in this thesis, except the unconstrained global minimum variance portfolio. All the other portfolio strategies only involves long-sales of assets. Mathematically the restriction of no short-sales is x i 0. When we know the portfolio weights, we can calculate the portfolio variance and portfolio standard deviation as σ 2 P = x Σx, σ P = x Σx (2.20) where Σ is the covariance matrix discussed later. Since the investor invests all his initial wealth at t = 0, the terminal wealth at t = 1 is the return of the investment plus the initial investment, i.e. W 1 = n a i S i,0 + i=1 = W 0 (1 + n i=1 n a i (S i,1 S i,0 ) (2.21) i=1 a i S i,0 W 0 S i,1 S i,0 S i,0 ) (2.22) = W 0 (1 + x r) (2.23) When the portfolio weights and the actual returns are known the terminal wealth is known. Since we know the portfolio weights and the amount invested, the expectations to the terminal wealth is E(W 1 ) = W 0 (1 + x E(r)) = W 0 (1 + x µ) (2.24) The uncertainty of W 1 at t = 1 is defined as Var(W 1 ) = Var(W 0 (1 + x r)) (2.25) = W0 2 Var(1 + x r)) (2.26) = W0 2 x Var(r)x (2.27) = W0 2 x Σx (2.28) 9

18 The definition of positive definite and positive semi-definite matrices are presented below. If we have a symmetric n n matrix A, where A R n n, we can say that A is positive semi-definite if w R n \ {0} : w Aw 0 (2.29) where w is a n-vector of real numbers. We say that matrix A is positive definite if w R n \ {0} : w Aw > 0 (2.30) 2.3 Covariance matrix In this section the covariance matrix estimators used in this thesis will be presented. We will present the problems of finding a good estimation of the covariance matrix of asset returns. We will also present models for coping with these problems. The only input we need in the risk-based asset allocation models is the covariance matrix of assets returns. The estimation process is therefore of great importance. One of the general difficulties regarding all covariance matrix estimation techniques is the fact that we do not know the true covariance matrix. If we did, estimation would not be necessary. For many years the use of the sample covariance matrix was the standard estimation. However, the performance of the sample covariance matrix has in the later years been questioned, and alternative solutions have been proposed. More information about the sample covariance matrix and its problems can be found in the next section. In this thesis we have chosen to focus our estimation on the theory of shrinkage estimators. The shrinkage estimators have shown promising results in different studies, such as Disatnik and Benninga (2007), and Ledoit and Wolf (2003a,b). The shrinkage approach is not new, and was first introduced over 50 years ago by Stein et al. (1956). This approach has been revisited in later years because of the great importance of solid covariance matrices in different fields of science. In finance, Ledoit and Wolf (2003a,b) are among the leading researchers of the shrinkage method. We will therefore use their articles when deriving the shrinkage theory later in this section, and use their two different models from Ledoit and Wolf (2003a,b). The reason we will use two different models in our analysis is because the results may differ based on which model we use. The importance of the covariance matrix in this thesis is great. Therefore it should be discussed how different estimators of the covariance matrix will yield different results. We will therefore use both of the methods in the articles from Ledoit and Wolf (2003a,b). This is also the reasons why some practitioners use more than one covari- 10

19 ance matrix. For example, some investors change covariance matrix when the market is highly volatile Sample covariance matrix We will in this section present the sample covariance matrix and its challenges. The sample covariance matrix is the historically most used estimator of the covariance matrix. This sample estimator is also part of the shrinkage method. The sample mean of N variables and T observations is presented in vector form below as in Ledoit and Wolf (2003b) m = 1 X1 (2.31) T In general the covariance matrix is often noted by S. Ledoit and Wolf (2003b) define the sample covariance matrix as S = 1 T X ( I 1 T 11 ) X (2.32) If we use our definitions of variance and covariance from earlier, we can write S as σ1 2 σ 12 σ 1n. σ S = (2.33) σ n1 σn 2 The sample covariance matrix shown above has some important properties. We can see that S will always have the dimension N N, meaning S is a quadratic matrix. Since the covariance is symmetric as we discussed earlier in equation (2.13), we can see that S is symmetric. It should be noted that the use of T in equation (2.32) is sometimes replaced by T 1 in literature. Our notation here follows Ledoit and Wolf (2003b) and Johnson and Wichern (2007). The vector 1 in equation (2.32) has the length N, and contains ones in every entry. I is the identity matrix, containing ones along the diagonal and zeros in the off-diagonal entries. The matrix X contains T returns from N stocks. We know that the rank of a matrix is defined to be the maximum numbers of linearly independent rows, or the maximum number of linearly independent columns (Edwards and Penney, 1988). The column rank and the row rank can be proved to be equal. This has the intuitive explanation that the rank can not be greater than the smallest number of N and T. We remember that matrix X has dimension (T N). Therefore it can be stated that when N > T, 11

20 the following is true Rank (S) T (2.34) The dimension problems of the covariance matrix comes from the relation between N and T. These problems are summarized by Ledoit and Wolf (2004b). They state that the ratio N/T leads to the following characteristics of S. If the ratio is above 1, meaning that N > T, then S is non-invertible. If the ratio N/T is less than one, then S is invertible (Ledoit and Wolf, 2004b). However, if T is not significantly larger than N, S becomes ill-conditioned as Ledoit and Wolf (2004b) calls it. This means that the estimation errors get amplified by inverting S. Since the inverse of S is important in the portfolio optimization, such as the closed form solution for the global minimum variance portfolio, this is not a wanted property. Increased extremes in this example will lead to large concentrations in few components in the global minimum variance portfolio Shrinkage estimator In this section we will present the framework for the shrinkage estimator used in this thesis. The shrinkage estimator was introduced into the theory of covariance matrix estimation as a solution to the problems with the previously discussed sample covariance matrix, S. Ledoit and Wolf (2003a) suggest solving the problems of non-invertible covariance matrices, and covariance matrices with a high degree of estimation error by combining two extremes. By combining the unstructured S with a structured target, it is possible to obtain a compromise estimate that performs better than the extremes alone (Ledoit and Wolf, 2003a). The idea is to shrink the extremes in S towards the centre, and thereby reduce the previously discussed problems. If we define the target matrix as F, we can write the convex linear combination between S and the target as ˆΣ = δf + (1 δ) S (2.35) where ˆΣ is the shrinkage estimator and δ is the shrinkage intensity. The intensity can be all values between 0 and 1. But there exists an optimal intensity. We have now set the framework for shrinking with equation (2.35). The challenge is now to select an appropriate target, F. And from this, find the optimal shrinkage intensity, δ. In the following sections we discuss two of the suggestions for F made by Ledoit and Wolf (2003a,b). 12

21 The first target we will present is from the constant correlation model in Ledoit and Wolf (2003a). The second target we present is from the single-index model by Sharpe (1963), derived in Ledoit and Wolf (2003b). After presenting the two suggestion for F, we derive how to find the optimal shrinkage intensity, δ. We know from Schäfer and Strimmer (2005) and Ledoit and Wolf (2003a) that the optimal intensity gives us the estimator with the minimum mean squared error between the estimator and the expected true covariance matrix. The estimator is also always well-conditioned and positive definite. This applies even though the raw data has dimension problems, or S is ill-conditioned (Schäfer and Strimmer, 2005) Shrinkage targets There have been various suggestions as to what the shrinkage target F should be. At least five suggestions for F have been proposed. It is concluded that the single-index model and the model assuming equal pairwise correlation performs best among these five (Ledoit and Wolf, 2003a,b, 2004b, Ledoit, 1996). Coqueret and Milhau (2014) tested the shrinkage method on historical data, with the two targets we will use in this thesis. In addition to these two estimators, they also tested other estimators. Coqueret and Milhau (2014) concluded that both the shrinkage estimators performances were close to the others. Because of the latter, we believe these two targets will provide the strongest results in this thesis. It should be noted that these models assume all assets are from the same asset class. In our study this will not be a problem, since we focus explicitly on stocks. In the next subsection we will discuss our two choices for target, F in detail. We will start with the constant correlation model. Constant correlation model Here we present the constant correlation model as the shrinkage target. Kwan (2006) defines the constant correlation model, as a model where the pairwise correlations between all assets are equal. The correlation coefficient is set constant as the average correlation of all assets (Ledoit and Wolf, 2003a). This model was introduced by Elton and Gruber (1973), as a model for estimating future correlation coefficients from historical data (Kwan, 2006). The reason for using this model as target, is that this framework provides more structure than using individual correlations. The mathematics behind this model are presented below. 13

22 Before defining this framework we need to present some further definitions. We know that the sample covariance matrix is S. We now define the entries in S as s ij for the returns r i and r j. The population covariance matrix is defined as Σ and its entries are noted as σ ij. We define the population correlation between i and j as ϱ ij, and we can write the correlation and average correlation as done in Ledoit and Wolf (2003a) ϱ ij = ϱ = σ ij σii σ jj (2.36) N 1 2 (N 1) N N i=1 j=i+1 ϱ ij (2.37) The expressions for the sample correlation and average sample correlation is then sc ij = sc = s ij sii s jj (2.38) N 1 2 (N 1) N N i=1 j=i+1 sc ij (2.39) The diagonal in the constant correlation matrix is equal to the diagonal in the sample covariance matrix (Ledoit and Wolf, 2003a). This means the variance is the same in both matrices. Entries above and below the diagonal are the covariances calculated with the constant correlation coefficient. These entries can be defined as φ ii = σ ii (2.40) φ ij = ϱ σ ii σ jj (2.41) The entries in the constant correlation matrix, F, is then defined by Ledoit and Wolf (2003a) as f ii = s ii (2.42) f ij = sc s ii s jj (2.43) where we have used the relation between correlation and covariance defined earlier in equation (2.41) and equation (2.38). The single-index model The single-index model was introduced by Sharpe (1963) and is a single-factor model. It was written as an extention of the work by Markowitz (1952). The model builds on the idea that stocks tend to move together, driven by the same economic factors (Shapiro, 2008). In this way, Sharpe (1963) created the model to simplify and describe the relationship among securities. 14

23 Ledoit and Wolf (2003b) defines the single-index model by the following equation to explain stock returns r i = α i + β i (r m ) + ɛ i (2.44) where β i is the movement of the return of stock i with the return of the market, and ɛ i is the residuals (Ledoit and Wolf, 2003b). Shapiro (2008) defines β i as β i = cov(r i, r m ) var(r m ) (2.45) where r m is the return of the market and r i is the return of asset i. From the model in equation (2.44), we need to define our target matrix of variance and covariance, F. This matrix will be the target for shrinkage in this model. We know from Ledoit and Wolf (2003b) that the correlation between the r m, and ɛ i is zero, and that all the ɛ i are uncorrelated to each other. Shapiro (2008) defines the systematic risk of asset i as σ 2 mβ 2 i (2.46) and the firm-specific risk as σ 2 (ɛ i ) (2.47) which is constant within stocks according to Ledoit and Wolf (2003b). variance of a specific stock, i, is then In this model, the σ 2 i = β 2 i σ 2 m + σ 2 (ɛ i ) (2.48) From the note by Shapiro (2008), we know that the covariance between assets i and j is cov (i, j) = β i β j σ 2 m (2.49) and we see that the covariance depends only on the returns sensitivity to the market and the risk in the market itself (Shapiro, 2008). Now we know how the variance and covariance is defined for our target. The covariance matrix in this model can then be defined as in Ledoit and Wolf (2003b) Φ = σ 2 mββ + (2.50) The vector β is the vector of all betas, the is the diagonal matrix containing the variance of the residuals. According to Ledoit and Wolf (2003b), the estimator of the covariance matrix in the single-index framework can then be defined as F = s 2 mbb + D (2.51) 15

24 where b is the vector of estimated betas, and D is a diagonal matrix with the estimates of the entries in. As Ledoit and Wolf (2004b) emphasize, we need to make two assumptions for the results in this section to be valid. These two assumption can be seen in equation (2.1) and equation (2.2), in the assumptions section. Now that we have defined the two targets used in this thesis, we can discuss how to find the optimal intensity Shrinkage intensity We will in this section present the shrinkage intensity and the mathematics behind deriving the optimal intensity value. The optimal value of the intensity, δ, minimizes the distance between the estimator and the expected true covariance matrix (Ledoit and Wolf, 2003a). To find the optimal relation between the target, F, and S, we need to formulate the performance with different intensities. The performance of an estimator is in theory measured by how close it is to the true covariance matrix. Since this matrix is not known, we need an alternative way of measuring the performance. By using a matrix norm, we can define the distance between the true covariance matrix and the estimator as a loss function. This is in line with Ledoit and Wolf (2003a). By applying the Frobenius norm, taking the difference between the true covariance matrix and the estimator, we arrive at our quadratic loss function as it is presented in Ledoit and Wolf (2003a). Ledoit and Wolf (2003a) define the Frobenius norm as Z F = N N z ij 2 (2.52) i=1 j=1 The above is also sometimes called euclidean norm. The matrix Z is an N N symmetric matrix. This is in line with the characteristics of the covariance matrix. In some of the early loss functions, as the one introduced in Frost and Savarino (1986), the inverse of the covariance matrix were included. The problem with this becomes obvious from earlier sections where the relation N T makes the shrinkage estimator break down, as stated by Ledoit and Wolf (2003a). When applying equation (2.52) to our estimator as suggested by Ledoit and Wolf (2003a), we arrive at the loss function. This loss function does not contain the inverse of the matrix. We can then present the following quadratic loss function, as in Ledoit and Wolf (2003a) L(δ) = δf + (1 δ) S Σ 2 (2.53) 16

25 The only problem in practice is that the real covariance matrix is not known. Therefore we need a function that can be optimized. This function needs to give us as low as possible loss, without actually knowing the true covariance matrix. As a result, the expected loss is introduced into the relation. This can also be said to be the risk of our function, as presented in Ledoit and Wolf (2003a) R (δ) =E (L (δ)) = E ( δf + (1 δ) S Σ 2) (2.54) We now want to find the optimal value of δ that minimizes the risk function. In the calculations here the entries of the matrices are denoted with small characters. We write the risk function and its expectations as by Ledoit and Wolf (2003b) N N R (δ) = E (δf ij + (1 δ) s ij σ ij ) 2 (2.55) i=1 j=1 N N = Var (δf ij + (1 δ) s ij ) + [E (δf ij + (1 δ) s ij σ ij )] 2 (2.56) i=1 j=1 N N = δ 2 Var (f ij ) + (1 δ) 2 Var (s ij ) + 2δ (1 δ) Cov (f ij, s ij ) (2.57) i=1 j=1 + δ 2 (φ ij σ ij ) 2 The results above were obtained using the rules for variance and covariance defined earlier. The meaning of equation (2.54) is to minimize the risk, as a function of δ. This can be done by finding the first and second derivatives of the function. If we want to find an extreme point, it is basic knowledge that the first derivative of the function, needs to be zero. Since we only derive the function (2.57) by δ, the rest stays constant. Ledoit and Wolf (2003b) defines the first and second derivative as N N R (δ) = 2 δvar (f ij ) (1 δ) Var (s ij ) + (1 2δ) Cov (f ij, s ij ) (2.58) i=1 j=1 + δ (φ ij σ ij ) 2 N N R (δ) = 2 Var (f ij ) + Var (s ij ) 2Cov (f ij, s ij ) + (φ ij σ ij ) 2 (2.59) i=1 j=1 N N = 2 Var (f ij s ij ) + (φ ij σ ij ) 2 (2.60) i=1 j=1 The simplification from equation (2.59) to equation (2.60) can be done by using general rules of variance. We follow Ledoit and Wolf (2003b) and set R = 0. We can then solve the equation 17

26 to find δ N N 0 = 2 δvar (f ij ) (1 δ) Var (s ij ) + (1 2δ) Cov (f ij, s ij ) + δ (φ ij σ ij ) 2 (2.61) i=1 j=1 We can rearrange the above to N N δvar (f ij ) δvar (s ij ) + 2δCov (f ij, s ij ) δ (φ ij σ ij ) 2 (2.62) i=1 j=1 = N i=1 j=1 N Var (s ij ) + Cov (f ij, s ij ) We move δ to the left and use the rules of variance to obtain N N δ Var (f ij s ij ) + (φ ij σ ij ) 2 = i=1 j=1 N N Var (s ij ) Cov (f ij, s ij ) (2.63) i=1 j=1 Finally, we can solve for δ and reproduce the result found in Ledoit and Wolf (2003b) δ = N N N Var (s ij ) Cov (f ij, s ij ) i=1 j=1 i=1 j=1 (2.64) N Var (f ij s ij ) + (φ ij σ ij ) 2 which is the expression for the optimal intensity. As stated by Ledoit and Wolf (2003b), the fact that R is always positive, makes sure that the solution obtained here is always a minimum of the risk function. The problem with finding the optimal intensity now is that we still do not know the true covariance matrix. In Ledoit and Wolf (2003b) they prove that the δ and its behaviour is asymptotically like another constant κ over the number of observations, T. The constant κ is in their article defined in the following way κ = π ρ γ (2.65) The three new variables here are variables that can not be found with the information we have in a regular analysis of the covariance matrix. They will therefore have to be estimated. This 18

27 means that κ needs to be estimated as well. Ledoit and Wolf (2003b) define them as follows π = ρ = γ = N T i=1 j=1 N T i=1 j=1 N i=1 j=1 ( ) AsyVar T sij ( AsyCov T fij, ) T s ij (2.66) (2.67) T (φ ij σ ij ) 2 (2.68) These variables can not be found explicitly, so we need to use estimators, denoted by hat. These estimators will be derived below as in Ledoit and Wolf (2003a,b). It should be noticed that the expression for asymptotic covariance, ρ, contains the target matrix entries. This means that the estimation of this variable will differ for our two targets as F is different in the two cases. We now find an estimator for the asymptotic variance, ˆπ. This estimator comes from standard asymptotic theory, as stated by Ledoit and Wolf (2003b) N N ˆπ = ˆπ ij (2.69) i=1 j=1 ˆπ ij = 1 T {(r it r i ) (r jt r j ) s ij } 2 (2.70) T t=1 We now define an estimator for the asymptotic covariance. (2003b,a), it can be written as As defined in Ledoit and Wolf ρ = N T i=1 j=1 ( AsyCov T fij, ) T s ij (2.71) where ρ is the sum of the T scaled asymptotic covariance of F and S. We re-write equation (2.71) to ρ = N i=1 ( ) AsyVar T sii + N T i=1 j=1,j i ( AsyCov T fij, ) T s ij (2.72) The above expression was re-written so that we can use ˆπ ii instead of the first part of equation (2.72). Since we know the asymptotic variance already, from π, this replaces the sum along the diagonal as done in Ledoit and Wolf ( (2003b). The entries on the off-diagonal is as Ledoit and T Wolf (2003a) say, given by AsyCov fij, ) T s ij. 19

28 This means that the entries will be different for the two different target functions, because f ij is part of the expression, meaning the intensity will differ for the two methods. If the target is the constant correlation matrix, we know( from equation (2.43) the form of T the f ij entries. If we substitute equation (2.43) in AsyCov fij, ) T s ij we get ( AsyCov T sc sii s jj, ) T s ij (2.73) The latter can be estimated as in Ledoit and Wolf (2003a) as ( r sjj [ AsyCov T sii, ] sii [ T s ij + AsyCov T sjj, ] ) T s ij (2.74) 2 s ii s jj We now need a consistent estimator of the asymptotic covariance, and this is presented as ˆϑ in Ledoit and Wolf (2003a). The two expressions we have for the asymptotic covariances in equation (2.74) depends on two different [ sample variances, s ii and s jj. Ledoit and Wolf (2003a) T defines the estimator for AsyCov sii, ] T s ij as ˆϑ ii,ij = 1 T T t=1 [ T and the estimator for AsyCov sjj, ] T s ij as {(r i,t r i ) 2 s ii } {(r i,t r i ) (r j,t r j ) s ij } (2.75) ˆϑ jj,ij = 1 T T t=1 {(r j,t r j ) 2 s jj } {(r i,t r i ) (r j,t r j ) s ij } (2.76) where we know that r i,t is the return of asset i at time t defined in the definition section. The estimator for ρ is now complete, and is defined by Ledoit and Wolf (2003a) as the following ˆρ = N ˆπ ii + i=1 N N i=1 j=1,j i ( ) r sjj sii ˆϑii,ij + ˆϑjj,ij 2 s ii s jj (2.77) It is important to notice that this ˆρ is only valid for the constant correlation as target. The single-index matrix will yield a different expression for the estimator. The single-index matrix will follow the same procedure as the constant correlation matrix up to and including equation (2.72). To find the intensity using the single-index model we insert the entries from the single-index matrix in equation (2.72). These are defined by Ledoit and Wolf (2003b) as f ij = b i b j s m = s ims jm s m for non-diagonal entries and for f ii = s ii the diagonal. 20

29 The entries denoted by s im is assets i s covariance with the market ρ = N i=1 [ ] AsyVar T sii + N N i=1 i=1,i j [ T s im s jm AsyCov, ] T s ij s m (2.78) We then use the form for the estimator for asymptotic variance from equation (2.70). Now we need a consistent estimator for the asymptotic covariance, to get an expression for ˆρ. According to Ledoit and Wolf (2003b), the off-diagonal elements of ρ can be defined as [ T s im s jm ρ ij = AsyCov, ] T s ij s m = σ jm [ AsyCov T sim, ] T s ij + σ [ im AsyCov T sjm, ] T s ij σ m σ m σ imσ jm [ AsyCov T sjm, ] T s ij σ m (2.79) (2.80) A consistent estimator for σ qm is the corresponding entries in the sample covariance matrix, s qm. Here q can be q = m, q = i or q = j. [ T The consistent estimator for AsyCov sim, ] T s ij as ˆϖ im,ij = 1 T is defined by Ledoit and Wolf (2003b) T {(r i,t r i ) (r m,t r m ) s im } {(r j,t r j ) (r i,t r i ) s ij } (2.81) t=1 [ T The consistent estimator for AsyCov sjm, ] T s ij as ˆϖ jm,ij = 1 T T {(r j,t r j ) (r m,t r m ) s jm } {(r i,t r i ) (r j,t r j ) s ij } (2.82) t=1 [ T Lastly, a consistent estimator for AsyCov smm, ] T s ij can be defined as ˆϖ mm,ij = 1 T T {(r m,t r m ) (r m,t r m ) s mm } {(r i,t r i ) (r j,t r j ) s ij } (2.83) t=1 These are the estimators of the asymptotic covariances as presented by Ledoit and Wolf (2003b). If equations (2.81), (2.82), and (2.83) are plugged into equation (2.80), along with the estimates of the true covariance to the market, s qm, we can find an expression for ˆρ. This leads to the estimator of ρ defined as ˆρ = N ˆπ ii + i=1 N N i=1 i=1,i j s jm ˆϖ im,ij + s im ˆϖ jm,ij s ims jm s mm s mm s 2 ˆϖ mm,ij (2.84) mm 21

30 This is the estimator for ρ using the single-index matrix as target as derived in Ledoit and Wolf (2003b). We have re-written to the same form as the one of constant correlation in Ledoit and Wolf (2003a). The last estimator needed to complete the expression for κ is the misspecification between the target of the population and the true covariance matrix. This is the misspecification between Φ and Σ. We use the estimators of the population matrices, which we know to be F and S. The squared sum of the distance between these to gives us the estimator, as in Ledoit and Wolf (2003a,b) ˆγ = N N (f ij s ij ) 2 (2.85) i=1 j=1 When we have this last estimator, we can now find the estimator of κ ˆκ = ˆπ ˆρ ˆγ (2.86) This is the estimator that will give us our shrinkage intensity. From the expression below we see that the intensity depends one the value of T as well, as presented in Ledoit and Wolf (2003b,a) { { }} ˆκ ˆδ = max 0, min T, 1 (2.87) The last equation tells us that the intensity has to be between 0 δ 1. This is almost always fulfilled for ˆκ T, but when its not, the maximum and minimum functions will keep it between the boundaries. From the framework presented here, we obtained an estimate for the shrinked covariance matrix. There are a lot of different models to use when estimating the matrix, but the results from Ledoit and Wolf (2003a,b) have been very promising. The reason for choosing the constant correlation model and the single-index model is simply of convenience and promising results. It is stated by Disatnik and Benninga (2007) and Ledoit and Wolf (2003b) that the framework presented in this section have provided promising results in research. They also argue that these models are fairly easy to implement. 2.4 Markowitz portfolio theory We will derive the mean-variance framework by Markowitz (1952, 1959) leading up to the unconstrained global minimum variance portfolio. We then add the constraint of no-short sales to the portfolio afterwards. Before deriving the model itself we start this section by a justification for the model. 22

31 2.4.1 Justification for the mean-variance strategy In this section we will provide a justification for the mean-variance framework. We find this natural since this model has led the way for many other models during the years, and it is still in focus over 60 years after it was developed by Markowitz. The background and some of the ideas behind the model deserves a discussion before we derive it mathematically. Leading up to his mean-variance model, Markowitz (1959) emphasizes the importance of considering investors utility functions. An agent that would only want to maximize expected return will never prefer a diversified portfolio. This investor would simply invest all his initial funds in the asset with the greatest expected return (Markowitz, 1959). If more than one fund had similar expected returns the investor would be indifferent between which assets he invested. Such behaviour is clearly not realistic and have led to the idea of maximizing investors utilities instead of expected returns alone. One of the important assumptions behind the minimum-variance model is that investors are risk averse. In other words, the investor wants to choose x, to maximize his terminal wealth, W 1, and at the same time minimize the risk associated with his terminal wealth, Var (W 1 ). This argument will be justified by discussing the investor s utility function and its mathematical properties following Tsiang (1972) and Zakamouline and Koekebakker (2009). We assume that the agent has a von Neumann-Morgenstern utility function, u(w ), which is defined over wealth (Zakamouline and Koekebakker, 2009). We assume this utility function is differentiable and that the investor wish to maximize the expected utility at t = 1, E(u(W 1 )). In accordance with Arrow (1965) we also assume the following two properties of the utility function u (W ) > 0, The marginal utility of wealth is positive (2.88) u (W ) < 0, The marginal utility of wealth decreases with an increase of wealth (2.89) Arrow discussed further properties of risk averse utility functions, but the above assumptions are sufficient for the arguments in this section. According to Elton and Gruber (1995), a positive marginal utility of wealth states that the utility of wealth is growing, meaning that an investor would always prefer an investment with the largest outcome if the investment is certain, i.e. u(w + z) > u(w ) z > 0 (2.90) A decreasing marginal utility of wealth is a way to state that an investor is risk averse. This means that the investor would reject a fair gamble (Elton and Gruber, 1995) u(1) > 1/2u(2) + 1/2u(0) (2.91) 23

32 By rearranging the above and multiplying by 2 we get u(1) u(0) > u(2) u(1) (2.92) meaning that the investor gets more utility from a one-unit increase from 0 to 1 than a one-unit increase from 1 to 2. Since the value of an investment in the future is not certain we define the variance of W 1 at t = 0 as the variance of a random variable Var(W 1 ) = E(W 1 E(W 1 )) 2 (2.93) To approximate the function u(w 1 ) at t = 0 we can take a Taylor series expansion of u around the expected terminal wealth u(w 1 ) = n=0 1 n! u(n) (E(W 1 ))(W 1 E(W 1 )) n (2.94) We keep the terms up to the second derivative of u and assume the remaining derivatives are small enough to omit. This neglection of the terms beyond the second moment is common in literature due to their small size and is suggested omitted by Tsiang (1972). utility may therefore be approximated by The expected E(u(W 1 )) u(e(w 1 )) + u (E(W 1 ))E(W 1 E(W 1 )) u (E(W 1 ))E(W 1 E(W 1 )) 2 (2.95) By using our earlier defined expression for Var(W 1 ) and that E(W 1 E(W 1 )) = 0, we can rewrite the above to the same form as Levy and Markowitz (1979) E(u(W 1 )) u(e(w 1 )) u (E(W 1 ))Var(W 1 ) (2.96) From the above we now have expressed the investor s expected utility as a function of the expected size of W 1 and the variance of this terminal wealth. It is clear that the investor s utility function increases by increased expected terminal wealth, and decreases with increasing risk measured in variance of expected terminal wealth. The mean-variance approach therefore seems natural for this investor, and we can proceed to derive the model suitable for such needs The unconstrained global minimum variance portfolio Harry Markowitz first introduced the mean-variance concept in Markowitz (1952) and as a further developed model in Markowitz (1959). In literature this model is often referred to as modern portfolio theory. In his model, Markowitz (1959) describes investors with two objectives. The first objective is to make the return on their investments as high as possible. The second objective is to have 24

33 the return to be subject to as little uncertainty as possible. Markowitz (1959) is aware of the risk seeking investors in the market that does not fulfil this assumption, and states that this technique is not suitable for them. He concludes that this model is for the investors, other things being equal, that prefer certainty over uncertainty. We justified this kind of investor preferences in the previous chapter by studying the assumed utility functions which increased in wealth and decreased in variance. The following derivation of the mean-variance model in this section leads up to the global minimum variance portfolio, which will be included in the empirical research in this thesis. The global minimum variance portfolio can also be derived in a quicker way, but we want to derive the full mean-variance model in general. In this way we can get a better understanding of the model, and see the special case of the global minimum variance model graphically in the mean-standard deviation space. The calculations in this section have been done with inspiration from Markowitz (1959) and Lando and Poulsen (2006). It should be noted that Markowitz (1959) assumed that short-sales were not allowed in his original work. This constraint is not included in the following derivation, but will be included later in the thesis. We now consider the investor s objective formulated as the following minimization problem 1 min x 2 x Σx s.t. x µ = µ P x 1 = 1 (2.97) Here we find the portfolio weights so that the variance of the portfolio s return is minimized, given an expected return. It should be noted that the factor 1/2 is included in the minimization problem to ease the mathematical calculations. This factor has no importance other than making the calculations look nicer (Lando and Poulsen, 2006). Our assumptions regarding Σ and µ ensures unique solutions for all values of µ p > 0. To solve the problem above we set up the following Lagrange function L (x, λ 1, λ 2 ) = 1 ( ) ) 2 x Σx λ 1 x µ µ P λ 2 (x 1 1 (2.98) From this Lagrange function we get the following first-order conditions for optimality L x = 1 2 (Σ + Σ )x λ 1 µ λ 2 1 = 0 (2.99) L = x µ µ P = 0 λ 1 (2.100) L = x 1 1 = 0 λ 2 (2.101) 25

34 By using the fact that Σ is symmetric we know that 1 2 (Σ + Σ ) = Σ (2.102) By inserting (2.102) in equation (2.99), rearranging the equation, and using the fact that Σ is positive definite we can write [ ] [ ] x = Σ 1 λ 1 µ 1 λ 2 (2.103) Rearranging and combining equation (2.100) and equation (2.101) gives [ ] [ ] µ P µ 1 x = 1 (2.104) We can multiply both sides of equation (2.103) with [ µ 1] and use equation (2.104) to get [ ] µ [ ] P [ = µ 1 x = µ 1] [ ] [ ] Σ 1 λ 1 µ 1 1 We now introduce the following matrix A in order to ease the notation A := [ µ 1] [ ] Σ 1 µ 1 λ 2 (2.105) (2.106) As done by Lando and Poulsen (2006), we use the rules for partitioned matrices and get [ ] µ Σ 1 µ µ Σ 1 1 A = µ Σ Σ 1 1 (2.107) It can be seen that A is symmetric because Σ 1 is symmetric. To further ease the notation we introduce a,b, and c as notation for the entries in A A := [ a b ] b c (2.108) We now show that the A matrix is positive definite. This is an important property since this will ensure the existence of A 1. The first step in showing this is introducing the arbitrary vector z = (z 1, z 2 ) R 2 \ {0} (2.109) 26

35 Since not all the entries in µ are equal we can say that [ ] [ ] z 1 µ 1 = z 2 [ ] z 1 µ z (2.110) By using the previous definition of A and that Σ 1 is positive definite, we get [ z R 2 \ {0} : z Az = z µ 1] [ ] Σ 1 µ 1 z > 0 (2.111) Meaning A is positive definite and we can solve equation (2.105) for λ 1 and λ 2 [ λ 1 λ 2 ] [ ] = A 1 µ P 1 (2.112) We can insert the above in equation (2.103) to obtain an expression for the optimal portfolio weights [ [ ] ] ˆx = Σ 1 µ 1 A 1 µ P 1 (2.113) The above expression is the minimum variance portfolio for a given µ p and thereby the portfolio weights that solve the investor s initial problem (Lando and Poulsen, 2006). To explore this risk we can look at the variance of the minimum variance portfolio s return ˆσ 2 P = Var(ˆx r) (2.114) = x Σ ˆx (2.115) [ [ ] ] = (Σ 1 µ 1 A 1 µ [ [ ] ] P ) ΣΣ 1 µ 1 A 1 µ P (2.116) 1 1 [ ] [ = µ P 1 (A 1 ) µ 1] [ [ ] ] (Σ 1 ) µ 1 A 1 µ P (2.117) 1 [ ] ( [ = µ P 1 A 1 µ 1] [ ] ) [ Σ 1 µ 1 A 1 µ P 1] (2.118) }{{} = [ ] [ ] µ P 1 A 1 µ P 1 =A by def. (2.119) In the calculations above, we used that A 1 and Σ 1 are symmetric since A and Σ are symmetric. The expression for A 1 can be written as A 1 = [ 1 c ac b 2 b ] b a (2.120) 27

36 We can insert the above in equation (2.119) to obtain a simplified expression for the minimum portfolio return variance ˆσ 2 P = [ [ ] 1 c µ P 1 ac b 2 b ] [ ] b µ P a 1 (2.121) = a 2bµ P + cµ 2 P ac b 2 (2.122) The portfolio with the lowest variance among all portfolios is called the global minimum variance portfolio (Lando and Poulsen, 2006). This portfolio s expected return µ gmv is found by minimizing equation (2.122) with respect to µ p µ gmv = b c (2.123) By inserting equation (2.123) in equation (2.122) we get the expression for the smallest possible variance for any portfolio ˆσ 2 gmv = a 2bµ gmv + cµ 2 gmv ac b 2 = a 2b (b/c) + c (b/c)2 ac b 2 = 1 c (2.124) The portfolio weights for the global minimum variance portfolio can be found by inserting equation (2.123) in equation (2.113) [ [ ] ] b ˆx gmv = Σ 1 µ 1 A 1 c 1 (2.125) (2.126) By inserting the expression for A 1 we get the general expression ˆx gmv = 1 [ [ ] 1 c c Σ 1 µ 1 ac b 2 b = 1 [ ] [ ] 0 c Σ 1 µ 1 1 ] [ ] b b a c (2.127) (2.128) = 1 c Σ 1 1 (2.129) where 1/c is a scaling parameter which enforces the constraint that asset weights sum to one (Clarke et al., 2013). This closed form solution in equation (2.129) will be used in the analysis on historical data. 28

37 The results of the derivations in this section can be shown graphically in the two dimensional mean-standard deviation space in figure 2.1. All the mean-variance efficient portfolios make up the top half of the hyperbola in the space. The global minimum variance portfolio is shown with associated expected return and variance derived earlier. The portfolios from the global minimum variance portfolio and along the line towards the right corner of the graph is called the efficient frontier (Merton, 1972). Source: Lando and Poulsen (2006) Figure 2.1: The efficient frontier Markowitz (1959) explained the efficient portfolio as the portfolios with both higher likely return and lower uncertainty than the alternatives. By exploring all efficient portfolios and creating the efficient frontier the investor might exclude the inefficient portfolios from consideration. The proper choice among the efficient portfolios then depends on the individual investor s risk profile. The most risk seeking investors in the mean-variance model will invest in portfolios far up on the efficient frontier, while the most risk-averse investors will choose efficient portfolios close to or at the global minimum variance portfolio. The different portfolios on the efficient frontier will not be further explored in this thesis as they require expectations to returns. They are however useful in the graphical illustration of the global minimum variance portfolio. 29

38 2.4.3 The long-only global minimum variance portfolio In this section we will look at the global minimum variance portfolio optimization problem with a long-only constraint on the asset weights. This constraint is often used in practice because many managers do not have the possibility of short-sales. Investors with long-only constraints can try the closed form solution from equation (2.129) derived in the previous section. However, if some of the portfolio weights are negative the managers, need to solve the problem with the long-only constraint. We will now look at this problem with inspiration from the article by Roncalli (2011), where he analysed the impact of weight constraints in portfolio theory. The long-only global minimum variance portfolio is the portfolio that solves the following minimization problem min x 1 2 x Σx s.t. x 1 = 1 x i 0 (2.130) We set up the following Lagrange function L ( x, λ 1, λ ) = 1 ) 2 x Σx λ 1 (x 1 1 λ x (2.131) where λ 1, λ i are 0. By re-writing the solution from Roncalli (2011), the Kuhn-Tucker conditions Σx λ 1 1 λ = 0 (2.132) 1 x 1 = 0 (2.133) min(λ i, x i) = 0 (2.134) The above problem does not have an analytical solution like we found in the previous section. The most common is to solve the problem numerically with a computer program. We will in this thesis use a quadratic programming solver to find the constrained global minimum variance portfolio. The method for this can be seen in the methodology chapter. 2.5 Maximum diversification In this section we will discuss the theory of maximum diversification and the most diversified portfolio. The importance of diversification is well known and is a central part of finance. The framework in this section focus on capturing the benefits from diversification. A popular diversification method is to follow the market with market capitalization weighted indices. However, Haugen and Baker (1991) concluded that this strategy is inefficient in terms of return and variance. Roncalli (2011) was also critical to capitalization weighted indices because of their lack of risk diversification. Such findings have led to the introduction of many alternative strategies. Choueifaty and Coignard (2008) introduced a framework to construct portfolios with diversifi- 30

39 cation as criteria. The concept of maximum diversification was first formally introduced when Choueifaty (2006) introduced the diversification ratio (DR). This idea was further developed when Choueifaty and Coignard (2008) expanded the framework to include the portfolio which solves the DR, called the most diversified portfolio (MDP). In this section we will discuss both the DR and the MDP based on Choueifaty et al. (2013) and Choueifaty and Coignard (2008) Diversification ratio In this section the mathematics and ideas behind the diversification ratio will be presented before later being used to find the most diversified portfolio. Choueifaty et al. (2013) defines the DR as the ratio of the portfolio s weighted average volatility to its overall volatility. The DR incorporates the basis of diversification since the volatility of a long-only portfolio is less than or equal to the weighted sum of the assets volatilities in the portfolio (Choueifaty et al., 2013). A portfolio of only one asset has a DR of one, and a long-only portfolio of more than one asset has a DR greater than or equal to one (Choueifaty et al., 2013). The DR of a portfolio quantify the diversification gained from holding assets that are not perfectly correlated (Choueifaty et al., 2013). The intuition behind this will be discussed mathematically in this section by defining the DR, and by exploring its properties in accordance with Choueifaty et al. (2013). In the derivation of the DR we use the same notation as in the derivation of the global minimum variance portfolio. It is important to note that we assume that all portfolio weights in this section and the next section are constrained to be greater or equal to one, x i 0. Choueifaty et al. (2013) defines a portfolio s weighted average volatility as n x i σ i = x v (2.135) i=1 where the vector v is the vector of asset volatilities v = (σ 1,..., σ n ) (2.136) We know the portfolios volatility measured in standard deviation is x Σx. By definition, the DR is then given by DR = x v x Σx (2.137) If we decompose the above DR we can examine the properties regarding a portfolios diversification. Portfolios with a high concentration in few assets, and portfolios consisting of highly correlated assets will have a low diversification, and therefore a low DR. If we decompose the DR of a portfolio into its weighted-correlation and weighted-concentration, we can formalize 31

40 this intuition (Choueifaty et al., 2013). Choueifaty et al. (2013) decompose the DR into DR = [AC(1 CR) + CR] 1/2 (2.138) Here Choueifaty et al. (2013) defines AC as the volatility weighted average correlations of the components in the portfolio, and CR as the volatility weighted concentration ratio of the portfolio. Mathematically the AC and CR can be expressed as i j AC = (x iσ i x j σ j )ρ i,j i i j (x, CR = (x iσ i ) 2 iσ i x j σ j ) ( i x iσ i ) 2 (2.139) From the above we see that a long-only portfolio with on single asset is concentrated as it would have a CR of one. The lowest CR belongs to the portfolio with equal volatility weighting, which is equal to the inverse of the number of assets in the portfolio (Choueifaty et al., 2013). When the CR decreases we see that the DR increases, which is natural as a low CR indicates low volatility concentration of the components and therefore diversification (Choueifaty et al., 2013). If the average correlation among the stocks decreases the DR will also increase The most diversified portfolio According to Choueifaty et al. (2013), the portfolio that maximizes the DR, under chosen constraints, is the most diversified portfolio. In this thesis we have chosen these constraints to be the requirement that all weights are equal to or greater than zero, and the requirement that the weights sum to one. The most diversified portfolio in this thesis is then the portfolio that solves x v max x x Σx s.t. x 1 = 1 x i 0 (2.140) To ease further calculations we log transform the objective function x v log(dr) = log( x Σx ) = log(x v) 1 2 log(x Σx) (2.141) The equivalent as maximizing the DR is to minimize the -DR. We can therefore consider the following optimization problem instead of the optimization problem (2.140) min x 1 2 log(x Σx) log(x v) s.t. x 1 = 1 x i 0 (2.142) 32

41 The Lagrangian problem is L ( x, λ 1, λ ) = log 1 2 (x Σx) log(x v) λ 1 (x 1 1) λ x (2.143) where λ 1, λ i 0. The Kuhn-Tucker conditions are Σx x Σx 1 x v v λ 11 λ = 0 (2.144) x 1 1 = 0 (2.145) min(λ i, x i) = 0 (2.146) If we multiply the first condition on the left with x as Choueifaty et al. (2013), we see that λ 1 must be equal to zero. We also see that the first condition is independent of the second condition, meaning the first condition is independent of weights summing to one. If we use the fact that λ 1 must be zero, we know that the optimal portfolio must satisfy Σx = x Σx x v v + λ (2.147) x 1 1 = 0 (2.148) min(λ i, x i) = 0 (2.149) where we have defined λ = x Σxλ (2.150) following Choueifaty et al. (2013). If we multiply the numerator and denominator in equation (2.147) by 1 x Σx (2.151) we can then in the spirit of Choueifaty et al. (2013) say that the portfolio weights of the MDP must satisfy Σx MDP = x MDP Σx MDP DR MDP v + λ (2.152) where λ is such that min(λ, x MDP ) = 0. As with the constrained global minimum variance portfolio, this problem does not have a unique solution. We will therefore use quadratic programming to find the MDP. The methodology for this is explained in the methodology chapter. 33

42 2.6 The equally-weighted portfolio When measuring the performance of the GMV, MDP, and ERC portfolios, we want to measure them up against a strategy with no requirements of estimations of any kind. One popular strategy is the equally-weighted portfolio, 1/n. The 1/n strategy is refereed to by DeMiguel et al. (2009) as a naive portfolio diversification rule. This portfolio is an interesting benchmark since many studies have found it to outperform more sophisticated strategies. DeMiguel et al. (2009) argue that the naive rule is a good benchmark because many investors use this strategy in practice. The investors following the 1/n strategy simply invest a fraction 1/n of their initial wealth to each of the n available assets. Investors therefore invest even across all assets regardless of their sizes, sectors, risks or other characteristics. Even though various theoretical models have been developed since the mean-variance model, investors continue to use the simple naive allocation rule (Benartzi and Thaler, 2001). The simplicity of this strategy makes it easy to implement, and the investors can both disregard estimating expected returns and volatilities. One of the results of this strategy is that investors sell winning stocks and buy losing stocks every time the portfolio is rebalanced. The portfolio weights in this strategy is defined as x i = 1 n (2.153) and we define the vector consisting of all the portfolio weights as x 1/n = (1/n,..., 1/n) (2.154) 2.7 Risk parity The risk parity investment strategy comes from the observation that traditional asset allocation strategies, such as investing 60% in stocks and 40% in bonds is not sufficiently diversified when looking at risk contributions (Asness et al., 2012). The 60/40 strategy does not sufficiently take risk into account, and simply suggest to invest 60% of the wealth in stocks and 40% of the wealth in bonds. Since stocks are significantly more volatile than bonds, and the nominal stock investments make up 60% of the portfolio in this strategy, the overall risk contribution from stocks is dominating. The 60/40 portfolio will therefore follow the movements in the stock market, since this is the major source of risk in the portfolios. According to Ilmanen and Villalon (2012), the nominal 60/40 allocation in stocks and bonds often implies the risk allocation showed in figure 2.2. We see from figure 2.2 that the stock investment stands for 90% of the risk in the portfolio, while the bond investments only contribute with 10% of the risk. It is clear that investors that only focus on how to allocate wealth without taking the contribution of risk into account, can end up with portfolios with undesired 34

properties. Figure 2.2: Nominal/risk allocation in the 60/40 strategy. Left: Nominal allocation, Right: Risk allocation Several suggestions on how to even out the risk in portfolios exist.

43 properties. Figure 2.2: Nominal/risk allocation in the 60/40 strategy. Left: Nominal allocation, Right: Risk allocation Several suggestions on how to even out the risk in portfolios exist. Investors can for example allocate funds to a wider range of investment categories, such as commodities and real estate. However, in this thesis we will focus on how to allocate across stocks based on their contributions of risk. We will look at the portfolio where all components contribute equally to the total risk in the portfolio. Maillard et al. (2008) calls the portfolio achieving this the equally-weighted risk contributions (ERC) portfolio. Investing after risk contributions is often referred to as risk budgeting (Maillard et al., 2008). The ERC portfolio we will discuss in this section will have the risk characteristics illustrated in figure 2.3. Here we see that each component contributes with the same risk, in contrast with the uneven contribution of risk in figure 2.2. In this way this strategy shares philosophy with the most diversified portfolio discussed earlier, as we can regard it as diversification of risk (Maillard et al., 2008) 35

44 Figure 2.3: Equal risk contributions Kaya and Lee (2012) argue that risk parity is to date justified by its concept rather than its theoretical foundation, and can therefore be considered as a heuristic asset allocation strategy. This strategy is placed in between the equally-weighted portfolio and the global minimum variance portfolio (Maillard et al., 2008). Even though it is argued that this model is based on intuition over theory, the mathematics behind the model will be derived in this section. The following derivation of the risk contribution framework leading up to the ERC portfolio is inspired by Maillard et al. (2008) and Roncalli and Weisang (2012) Risk contribution and marginal risk contribution To understand the ERC portfolio we first need to formalize the marginal risk contribution and risk contribution of assets. According to Maillard et al. (2008) the marginal risk contribution is defined as the change in the total risk of the portfolio by an infinitesimal increase of x i. Since we know the total risk of a portfolio from earlier, we can write the marginal risk contribution of asset i as MRC i = x Σx x i = (Σx) i x Σx (2.155) where (Σx) i denotes the i th row of the vector from the product of Σ with x (Maillard et al., 2008). We can collect all the n marginal risk contributions in the vector MRC = Σx x Σx (2.156) 36

45 The total risk contribution from asset i is computed as the product of the allocation in asset i with its marginal risk contribution. Maillard et al. (2008) defines the risk contribution of asset i as the share of the total portfolio risk from that asset (Σx) i RC i = x i x Σx = x imrc i (2.157) Since the volatility is a homogeneous function of degree 1, it satisfies Euler s theorem and we can write it as the sum of its arguments multiplied by their first partial derivatives (Maillard et al., 2008). We therefore know that we can summarize all the risk contributions from all assets and get the total risk of the portfolio T R = n RC i = x Σx x Σx = x Σx (2.158) i= The equally-weighted risk contributions portfolio In this section we will explain how to construct the ERC portfolio based on the previous definitions. We know that the ERC portfolio is a portfolio where no stock should contribute more to the risk of the portfolio then any other component. The weight x i is therefore determined after how risky asset i is. If all assets had the same volatility, then the ERC portfolio would be the equal to the equally-weighted portfolio (Maillard et al., 2008). It is important to notice that the ERC portfolio uses all of the available assets in the investment universe. This is in contrast to the MDP and GMV portfolios, which typically only use a fraction of the available assets to construct their portfolios (Clarke et al., 2013). The portfolio where the contribution of risk from all assets are equal must satisfy RC i = RC j, i, j = 1...n (2.159) We follow Maillard et al. (2008) in this thesis and impose the restriction of no short-sales, and the restriction that all weights must sum to one. It should be noted that the restriction of weights summing to one is not a necessity, but it works as a normalizing restriction and makes the ERC weights easier to compare with the other portfolio weights. Under the above constraints, Maillard et al. (2008) formulate the problem of finding the risk-balanced portfolio such that the risk contribution from all assets are equal as x ERC = { x i 0 : x 1 = 1, RC(x i ) = RC(x j ) } i, j = 1...n (2.160) 37

46 As showed by Maillard et al. (2008), the MRC i is proportional to (Σx) i. We can therefore rewrite the above to x ERC = { x i 0 : x 1 = 1, x i (Σx) i = x j (Σx) j } i, j = 1...n (2.161) When solving the above problem, our assumptions regarding volatility and correlations are important. We will consider three different situations, all where n > 2. The first situation we will consider is when the correlations are equal. The second situation we will consider is when all the volatilities are equal. Finally we look at the situation when both the volatilities and correlations differ. First we assume that all correlations are equal for every pair of variables, i.e. ρ i,j = ρ i, j (2.162) Only under this assumptions, we find an analytical solution to the ERC. Maillard et al. (2008) derive the analytical solution to be x i = σ 1 i n j=1 σ 1 j (2.163) This solution is often used in practice since most risk parity providers ignore correlations, and this approach is sometimes referred to as a naive risk parity strategy (Jurczenko et al., 2013). The naive part being that one assumes that the correlations between assets are equal for every pair. We can see from equation (2.163) that the more volatile a component is, the lower its weight in the ERC portfolio will be, and opposite. The next situation we will consider is where we assume all volatilities are equal, but where the correlations differ, i.e. σ i = σ i (2.164) Maillard et al. (2008) show that the solution to ERC portfolio under the assumption of equal volatility to be x i = ( n k=1 x kρ ik ) 1 n j=1 ( n k=1 x kρ jk ) 1 (2.165) There is no explicit solution to equation (2.165), and therefore no explicit solution to the ERC portfolio under the assumption of equal volatility and different correlations. If we evaluate the function, we see that the reason for this is the fact that x i is a function of itself. 38

47 The last case we will look at is where both the correlations and volatilities differ. As in equation (2.165), there is no closed form solution for this case (Maillard et al., 2008). To find an expression for the ERC portfolio under this assumption, we follow the derivation from Maillard et al. (2008). The definition of the covariance between the return of asset i and the return of the portfolio is σ ix = cov(r i, j x j r j ) = j x j σ ij (2.166) We can then write the risk contribution of asset i as RC i = x iσ ix x Σx (2.167) We now follow Maillard et al. (2008) and introduce the beta β i of asset i with the portfolio as β i = σ ix x Σx (2.168) and then combine equation (2.167) with equation (2.168) and define Since we know that the ERC portfolio must fulfil RC i = x i β i x Σx (2.169) RC i = RC j = x Σx n i, j (2.170) Following Maillard et al. (2008), we then get the expression for the ERC portfolio weights as x i = β 1 i n j=1 β 1 j = β 1 i n (2.171) We can see from the above expression for x i that the weight attribution to asset i, is inversely proportional to its beta (Maillard et al., 2008). High beta values results in low weights, and low beta values results in hight weights. Maillard et al. (2008) explain that because of this, the portfolio components with high volatility or high correlation with the other components will be penalized. We see that the above expression for x i does not have a closed form solution since x i is a function of β i, and β i is a function of x. To find the portfolio weights of the ERC portfolio we need to use a numerical algorithm. Maillard et al. (2008) suggest solving the following optimization problem using a sequential quadratic programming algorithm x ERC = argmin f(x) (2.172) s.t. x 1 = 0 x i 0 39

48 where f(x) is given by f(x) = n n (x i (Σx) i x j (Σx) j ) 2 (2.173) i=1 j=1 As Maillard et al. (2008) explains, this program minimizes the variance of the risk contributions. The ERC portfolio is only obtained when f(x) = 0, meaning that x i (Σx) i x j (Σx) j = 0 i, j (2.174) It should be noted that multiple portfolios satisfy the ERC conditions discussed in this chapter if the constraint of weights greater or equal to zero is omitted (Maillard et al., 2008). 2.8 Numerical examples In this section we will show two examples with the two global minimum variance portfolios, the equally-weighted risk contributions portfolio, and the most diversified portfolio. We will not include the 1/n portfolio in the examples because we know per definition that it will invest equally across the assets regardless of volatilities, covariances, and correlations. The investment universe in all examples will be three assets, namely asset A, asset B, and asset C. While these assets remain the same, we will differ their volatilities and correlations. By doing this we can explore how the strategies allocate in some special cases. First, we will look at the case where all assets have equal volatility, but where the correlations among them differ. Second, we will consider the case where the correlation is constant for all assets, but their volatilities differ. These special cases provide some interesting results and have been discussed earlier by either Maillard et al. (2008), Jurczenko et al. (2013), or Choueifaty and Coignard (2008). The correlation matrices and volatilities we will use in the examples can be seen under. The correlation matrix can be converted into the covariance matrix by using the earlier defined formulas. The values for c and ĉ will be specified where C c and σ c are used C c = c 1.00 c c 1.00, C = , σ c = ĉ 0.10 ĉ, σ = 0.15 ĉ 0.20 (2.175) 40

49 2.8.1 Equal volatility example In this case we will consider the case where all volatilities are equal, i.e. σ A = σ B = σ C = ĉ (2.176) The correlations among the assets will differ on the other hand. In this example we will therefore use C and σ c defined earlier. We will consider two different levels of volatilities, where ĉ = 0.15 and ĉ = The constant correlation matrix C will be used in both analyses. The portfolio weights constructed by the different strategies with the specified data can be seen in table 2.1. Two interesting results regarding the strategies portfolio weights can be seen from the output of this example. The first is that the portfolio weights of the ERC, GMV, and MDP portfolios does not change when the volatility change with constant correlation. The reason the ERC portfolio weights are equal can be seen from equation (2.165), which is the solution for the ERC portfolio weights with equal volatility. In equation (2.165) we can see that the volatilities are out of the equation and only the correlations are included. Since we hold the correlations at the same level, but vary the volatility, the portfolio weights will not change. The other portfolios are equal simply because the two different levels of volatility with similar correlation will result in a scaled covariance matrix, and therefore a scaled optimization problem with the same solution. The other interesting result is that with equal volatilities, the GMV portfolio and the MDP are equal. This special case is pointed out by Choueifaty and Coignard (2008). Choueifaty and Coignard (2008) explain that the reason for this is that when stocks have equal volatility, the maximum diversification strategy minimize the volatility, and therefore is equal to the GMV. Table 2.1: Equal volatility example Volatility=0.15 Volatility=0.45 GMV GMV (Long only) ERC MDP GMV GMV (long-only) ERC MDP A B C Note: The numbers under GMV, GMV (long-only), ERC, and MDP refer to the portfolio weight in the corresponding asset 41

50 2.8.2 Constant correlation example In this example we consider the scenario where all the correlations are equal, i.e. ρ i,j = c i, j (2.177) We will in this example allow the volatilities to differ. We therefore use C c and σ defined earlier. As in the previous example we will consider two cases, this time where the correlations are c = 0.80 and c = Table 2.2: Constant correlation example Correlation=0.20 Correlation=0.60 GMV GMV (long-only) ERC MDP GMV GMV (long-only) ERC MDP A B C Note: The numbers under GMV, GMV (long-only), ERC, and MDP refer to the portfolio weight in the corresponding asset In this example the ERC portfolio has the closed form solution in equation (2.163). We see that this solution is only dependent on the volatilities. The portfolio weights are therefore the same when the correlations vary, but the volatility is kept constant. The two GMV portfolios on the other hand depends on the correlation between the asset in this case. As we increase the correlation GMV becomes more and more concentrated in asset A since it has the lowest volatility. As this example shows, the ERC strategy and the MDP are more balanced in terms of portfolio weights than the two GMV portfolios. We also see that the ERC portfolio is equal to the MDP. These two portfolios are always equal when correlations are constant (Clarke et al., 2013). 2.9 Statistical measures Selecting appropriate methods to measure the performance of the different strategies is an important part of the performance discussion. Even though returns are not included in the optimization problems for the strategies, it would not make sense to only measure the performances based solely on risk. If risk was the only measure of performance, all strategies would be inferior to risk-free asset. Investors are not likely to be interested in products who are only measured in terms of standard deviation, with no information regarding returns. As part of the performance measure we would like the portfolio performances to be measured both on in terms of realized risk and realized returns. 42

51 To aid in the discussion of the strategies performances, we will calculate three measures. The first measure we will calculate is the Sharpe ratio introduced by Sharpe (1966), because it is the most popular performance measure in practice (Pedersen and Rudholm-Alfvin, 2003). By including this measure our results can be compared to other studies over the same time period, as well as other investment products. The second measure we will calculate is the maximum drawdown, which is commonly used in the financial industry (Pedersen and Rudholm-Alfvin, 2003). This measure is interesting as it would provide information about how well the portfolios handle bear markets. In the spirit of downside risk, we will include the Sortino ratio introduced by Sortino and Price (1994) Sharpe ratio We will now present the Sharpe ratio. The Sharpe ratio was introduced in Sharpe (1966), as a tool for measuring the risk-adjusted performances of funds. Today this measure is still a very popular among practitioners and scholars. The ratio is based on the thought introduced in Treynor (1965), that the risk of the assets should be included when measuring performance. The expansion introduced by Sharpe (1966), includes a wider risk measure. Sharpe used the standard deviation although there were other suggestions. For example, Treynor proposed to use the betas of the assets as a measure of risk. It is important to differentiate between the ex-ante and ex-post definition of the ratio. The difference being that the ex-ante is the forward looking measure. While the ex-post is the ratio based on the observed results. The two definitions are Latin and means before the event and after the event. Ex-ante is a forecast of future events, while ex-post is the results encountered in the past. We know that in our framework, all of our strategies have taken expected returns out of the equation. This because of the trouble and uncertainty that is related to estimating these values. From this we know that we want our Sharpe ratio to be ex-post. We now proceed to define the ex-post ratio. First we define the excess return in line with Sharpe (1994) r ER = r P r rf (2.178) This gives us the excess return, defined as r ER, which is the return over the risk-free asset in our analysis. Often when measuring Sharpe ratios one wishes to measure the return over a number of periods. Since the excess return defined, is only the excess return in one period, we need to define the return over a number of periods. This is done as the simple average of the excess returns over the specified period when the investor is not re-investing returns. In mathematical terms this is defined as r ER = 1 T T r ER,t (2.179) t=1 43

52 When the investor is re-investing returns in all periods we consider the geometric average ( T r ER = (1 + r ER,t ) t=1 ) 1 T 1 (2.180) Since we know how to calculate the standard deviation from the definition chapter we have obtained the variables we need to define the Sharpe ratio. We present the expression, as given by Sharpe (1994) Sharpe Ratio = r ER σ p (2.181) i.e. the average excess return over the volatility of the portfolio. From the above equation we can see that the Sharpe ratio tells us how well the investor is rewarded for taking on risk. We see that many portfolios can have the same Sharpe ratio with different levels of returns because equation (2.181) is depending on the volatility. It should be clear that the ratio depends on the period of the analysis. The ratio will not be the same using, for example, daily or monthly data. The ratio is also sensitive to the time period Sortino ratio In this section we will present the Sortino ratio. The Sortino ratio uses downside-risk as an alternative measure of risk. Measuring risk in form of standard deviation can provide some challenges for investors when evaluating performance. One of these challenges is that variability above the mean of an asset is often considered preferred to the variability below the mean. As Estrada (2006) emphasize, one of the main problems with standard deviation as a measure of risk is that it treats fluctuations below and above the mean equally. As an alternative way of regarding risk, Sortino and Price (1994) formally introduced the downside risk into performance measures. In this thesis we derive the Sortino ratio based on the article by Van der Meer et al. (2001). To calculate this ratio the investor needs a minimal acceptable rate of return (MAR), which represent the investors objective return (Van der Meer et al., 2001). The volatility above the MAR is then considered to be good volatility, and the volatility below the MAR is considered bad volatility. In this thesis we will always set the MAR equal to the risk-free rate. Van der Meer et al. (2001) defines the downside risk as the second partial moment with respect to the MAR. 44

53 With a slightly rewriting of the downside risk defined by Van der Meer et al. (2001) we get δ MAR = T 1 T min(0, r t r MAR ) 2 (2.182) t=1 where t is the time of the observation, r t is the return at time t, and r MAR is the MAR. The minimum function makes sure that if we obtain returns over the r MAR, these are not included in the downside risk calculation. The Sortino ratio is then defined by Van der Meer et al. (2001) as Sortino Ratio = r r MAR δ MAR (2.183) This ratio is also depending on the time period as the Sharpe ratio is, and should therefore be calculated with the same data if compared Maximum drawdown The maximum drawdown is a risk management measure to evaluate the maximum loss in value of portfolio over a specified time period. Chekhlov et al. (2005) defines the maximum drawdown as the loss from a portfolios highest peak in a chosen period of time, to its minimum value after the peak. This measure is often used by portfolio managers in practice (Acar and James, 1997). Chekhlov et al. (2005) formulates the maximum drawdown of a portfolio mathematically as Maximum drawdown = max τ [0,t] (W τ W t ) (2.184) where W t is the value of the portfolio at time t in the interval [0, T ]. It is important to notice that τ t. This ensures that the peak occurred before the loss within the specified period. In this thesis we are interested in the maximum drawdown relative to the highest peak, i.e. Maximum relative drawdown = max τ [0,t](W τ W t ) W τ (2.185) We will from now refer to the maximum relative drawdown as the maximum drawdown. Maximum drawdown is often related to the risk of a portfolio, because it tells us the maximum drop of the portfolios value. It should be taken into account that it only relates to the actual drop, not the length of the drop or how many times there was an occurrence of a drawdown. When the measure is used, this should be kept in mind. Using the measure in the right way can be a valuable tool because it gives information on the preservation of capital in the portfolio. 45

54 As an example of some of the challenges with this measure we consider two portfolios, portfolio A and portfolio B. Say we calculated the maximum drawdowns for both portfolios over a 3 year period. If portfolio A had a maximum drawdown of 40% with two years between the peak and bottom, and portfolio B had the same maximum drawdown, but over one month, this measure will not provide the information about the length of the drop. Looking at the drawdown value alone will make these two portfolios look equally risky. How fast drawdowns occur is something investors are interested in, and information about the length of periods for calculations is important to be aware of. If the two portfolios had multiple drawdowns close to the maximum drawdown, this is not possible to see from this measure alone. This example points out that when using the measure, one should be aware of the information it contains, and what it does not contain. In the article by Chekhlov et al. (2005) the maximum drawdown is also implemented as a constraint when optimizing portfolios. This is another application of the maximum drawdown, but this is not a constraint used in this thesis Skewness and kurtosis Skewness and kurtosis are both important measures when describing characteristics of a distribution. The distributions we will discuss in in this thesis are the return distributions of the different portfolios. Kurtosis and skewness will therefore be explained in the context of investment and risk. The mean and standard deviation of portfolios returns does not alone provide information about fat tails and the symmetry of return distributions. Fat tails are important to consider in portfolio management since these indicate higher chances of extremes outcomes, which is not wanted in risk management. To measure fat tails, we will look at the kurtosis since it indicates fat tails and peakedness relative to a normal distribution (DeCarlo, 1997). Distributions with negative skewness indicates greater chances for large negative returns and should also be examined. There are many suggested definitions for these concepts, and we will base our choice of formulas on the article by Joanes and Gill (1998). In computer programs, different calculations methods apply for kurtosis and skewness. Because of these differences in calculation methods for kurtosis and skewness calculations, we will have a short discussion for our choice of method. We find this important so the results of the analysis can be compared to other findings. According to Joanes and Gill (1998), the common definition in many text books is to define kurtosis, g 1, and skewness, g 2 as by Cramer (1946) g 1 = m 3 m 3/2 2 g 2 = m4 m 2 3 (2.186) 2 46

55 The above kurtosis can be regarded as the excess kurtosis of a normal distribution since the normal distributions kurtosis of 3 is subtracted. It is important to notice about the kurtosis, that the mean will not affect the kurtosis. The standard deviation will though, and a change in the standard deviation of the stock returns will change the kurtosis as well. The kth sample moment for an n sized sample of returns is given by m k = 1 n (ri r) k (2.187) where r is the mean sample return. As the the above sample moments are not unbiased of the population moments (Joanes and Gill, 1998), we will use unbiased estimates used by SAS, SPSS and Excel for sample skewness and kurtosis n(n 1) G 1 = g 1 n 2 (2.188) n 1 G 2 = (n 2)(n 3) ((n + 1)g 2 + 6) (2.189) The reason for this choice is that Joanes and Gill (1998) found these estimates to have the smallest mean squared errors in skewed distributions in simulations, and they are unbiased in normal distributions Turnover and transaction costs In this section the measures and assumptions behind portfolio turnover and transaction costs will be outlined. Both of these measures are important to include when discussing the performance of the strategies. If we would disregard the costs of transactions the performances would look stronger than what an actual implementation of these strategies would achieve. Both of our measures will be simplified from what reality looks like, but it would still provide important information about the stability of portfolio weights in addition the approximated transaction costs. Studies in the U.S have showed that on average, active equity mangers underperformed the S&P 500 because of transaction costs (Grinold and Kahn, 1999). Because of such findings, the transactions costs of the different risk-based investment strategies needs to be assessed. Since the transaction cost is a function of turnover, we will first define the turnover rate, and then define the formula for the transaction costs. In this thesis, the requirement x 1 is included in each optimization problem for finding portfolio weights. This means that the weights must sum to one at each rebalancing. In this case, it means that all sales must be invested among the other stocks when rebalancing, and opposite. The turnover under these constraints results in sales and purchases must be equal. 47

56 The calculation of the turnover rate is a three-step process. First we must calculate the portfolio weights the moment before rebalancing. At period t we have the portfolio x t of n stocks where the portfolio weights sum to one, x t 1 = 1. At the point of rebalancing, the portfolio components values have been subject to changes due to their returns over the invested period, r 1, r 2,..., r n. The new weights of portfolio x t the moment before rebalancing with returns included are x 1 (1 + r 1 ) x ˆx t+1 = 2 (1 + r 2 ) (2.190). x n (1 + r n ) Note that the notation ˆx t+1 is chosen to not confuse it with the portfolio to rebalance into at t + 1, x t+1. From the above it is clear that the constraint x 1 does not necessarily apply any more since the portfolio now can be worth more or less than in the period before. We therefore rescale the weights of ˆx t+1, to sum to 1 to measure the individual weights relative to the overall portfolio value x t+1 = ˆx t+1 ˆx t+11 (2.191) Since everything is invested in every period the turnover is then equivalent to the fraction of the stocks sold and bought between the two periods to achieve the desired portfolio weights at time t + 1 ζ t+1 = ( x t+1 x t+1 ) 1 (2.192) When we know the turnover rate, we can use this rate when calculating transaction costs. By transaction costs we refer to the costs of rebalancing a portfolio from one period into the desired portfolio in the next period. The costs of rebalancing the portfolio will in this thesis be simplified in many ways. We will assume that there is only a fixed cost measured in a percentage of the traded value. The cost of buying and selling stocks are the same, as well as short-sales. This fixed transaction cost percentage will stay the same regardless of the amount traded. Meaning that unlimited trades are possible, as well as potential very small transactions. There is also no delay in purchases or sales regardless of order size. So we assume the portfolios can be rebalanced instantaneously. There is also no tax of the gains before rebalancing. The simplified transaction costs as a proportion of the portfolio value at time t is defined as T C t = ζ t ξ (2.193) 48

57 where ξ is the fee charged for the transaction. The transaction fee we will use in this thesis is ξ = 0.1% as this is what Euronext charge for tier 1 stocks trades (Euronext, 2015). The 0.1% will be charged of the amount bought or sold. According to Euronext (2015), the tier 1 list consists of the Euronext 100 components, in addition to more indices. We therefore consider this fee as a good benchmark for costs associated with EURO STOXX 50 component trading. In our framework this cost is 0.1% of the value of the number of stocks bought or sold multiplied with the closing price at the point of transaction. Turnover and trading cost example As an example we consider the 1/n portfolio with n = 4. That means that at both time t and t + 1 the portfolio weights should by definition be x 1 = x t+1 = (0.25, 0, 0.25, 0.25, 0.25) (2.194) If for example, the returns from period t to period t + 1 is r t,t+1 = (0.2, 0.01, 0.02, 0.15) (2.195) We get 0.25(1.2) 0.3 ˆx t = 0.25(1.01) 0.25(1.02) = (0.85) (2.196) Where we see that ˆx t 1 = 1.02, meaning that the portfolio had a 2% increase between t and t + 1. We can rescale ˆx t to x t = The turnover at time t + 1 is then ˆx t 1.02 = (0.294, 0.248, 0.250, 0.208) (2.197) ζ t+1 = (0.044, 0.002, 0.000, 0.042) 1 = = 8.8% (2.198) We can see that the 1/n strategy will sell the winners and buy the losers at every rebalancing. If we set the transaction cost to 0.1% as discussed earlier, the transaction cost is then T C t+1 = 8.8% 0.1% = % (2.199) 49

58 It is clear that the choice of transaction fee and the way to calculate turnover can greatly affect the results of an analysis. In table 2.3 below, the different annualized costs of different levels of turnover with a 0.1% transaction fee can be seen Table 2.3: Transaction costs 20% 40% 60% 80% 100% 120% 140% 160% 180% 200% TC 0.24% 0.48% 0.72% 0.96% 1.20% 1.44% 1.68% 1.92% 2.16% 2.40% Note: The transaction costs are annualized and calculated with average monthly turnover of % and 0.1% in transaction fee From table 2.3 above it is clear that high levels of turnover quickly can reduce portfolios performances. This can be the reason why active managers have problems beating the market, as discussed in the beginning of the section. 50

59 Chapter 3 Methodology The methodology chapter is divided into two sections with subsections. In the first section we will discuss the data, data types, and the investment universe used in the analysis. In the second section we explain in detail how we implemented the theory from the theory chapter in the analysis on historical data. 3.1 Data The way we have treated data in this thesis is important to explain in detail because all our choices and assumption can influence the results of the empirical analysis. We will therefore in this section explain where we collected the data, and what kind of data types we have used in the analysis. The time period chosen in this thesis will be justified and our assumptions regarding the data will be discussed. Important information regarding the EURO STOXX 50 will be explained in detail, in addition to how we have created the investment universe based on this index. The risk-free rate is also discussed, and justified why we believe it is a good approximation for the risk-free rate in our analysis Data source and data type STOXX Limited reports that Thomson Reuters provide all stock data used in their calculations (STOXX Ltd., 2015b). To best possible create a dataset based on a STOXX index, we will use the same source for data as them. All stock data is therefore gathered from Datastream (2015). In this way we minimize potential errors regarding stock prices when we compare our portfolio performances against the index. All stock data are based on the last trading day each month, and the data have been selected to be Datastream s total return index. The total return index shows a theoretical growth in the value of a stock assuming all dividends are re-invested in the stock at the adjusted closing price (Datastream, 2015). The adjusted closing prices are corrected for all subsequent capital actions, 51

60 such as stock splits. According to Datastream (2015) the total return index is calculated as the following after 1988 when there is no dividend payouts T RI t = T RI t 1 S t S t 1 (3.1) When t is the ex-date of the dividend payment, D t, the T RI t is calculated as T RI t = T RI t 1 St + D t S t 1 (3.2) where S t is the adjusted stock price at time t and D t is the dividend payment associated with ex-date t (Datastream, 2015). As we will discuss later, the risk-free rate is mostly defined as the Euribor 1-month rate in this thesis. However, in the period before the euro was introduced we will use the LIBOR 1-month in Deutche Mark. The Euribor 1-month rate was collected from Datastream (2015), and the LIBOR 1-month in Deutsche Mark was gathered from EconStats (2015). The interest rates are gathered as annualized rates, and we have later transformed them to monthly rates according to the market convention. The formula for this can be seen in the general calculations section. The data used for the EURO STOXX 50 index was the net return from Datastream which was calculated as the total return index defined over. It therefore reflects all gains of holding the components of the index with the same weights as the EURO STOXX 50. This data should not be confused with the price of the index The investment universe In this section we will explain how we have created the investment universe used in the analysis. Because our choices and assumptions regarding the investment universe influence the results, this process is discussed in detail. The investment universe for the investors will in this thesis consist of the EURO STOXX 50 components from February 27th 1998 to March 31st There are several reasons why we consider the components of this index to be a good investment universe in Europe. Firstly, all components in this index are listed in euro, so there will be no exchange rate risk for an investors that only allocate their wealth among these stocks. Secondly, STOXX choose the components of the index to be balanced among 12 eurozone countries and 19 supersectors. This secures a diversified universe across sectors and markets. Finally, all stocks in the index have to have a minimum level of liquidity. This is beneficial because it secures that all stocks are traded every day, and there will therefore be no dead periods in the data. 52

61 Our investment universe is created to be the 50 components of the EURO STOXX 50 at all times through the analysis. Every time there have been changes in the index, we have made the equivalent changes. However, the timing of our changes differ from when the actual change happened. The reason for this is that we use monthly data, and changes can occur every day of the month. Since we use the data from the last trading day in each month, there will be a delay if changes occur earlier in the month. For example, if a component change occurred on March 14th, we will keep the stock until March 31st and then make the change. It should be noted that the information about component changes have been gathered from three different sources. STOXX provided us with this information from 2010 to The second source is the Annual Blue Chip Index reviews from 1999 to 2014 from STOXX s news page. The third source is a component change list from Harvey (2014) from his lecture slides at Duke University. All the information from these three different sources corresponds where they overlap in time. The complete component change list we have used based on these three sources can be seen in table 3.2 in the end of this section. Since the index was launched for the first time in February 1998, there have been 41 component changes on 26 different dates. Even though a stock is included in the index, the stock must fulfil our minimum data requirements to be included in the analysis. We have set this minimum data requirement to 24 months. That means we must have data for the previous 24 months at the point of portfolio rebalancing for the stock to be included. Since the index consists of 50 stocks at all times, insufficient data will result in less than 50 investment candidates at times. Figure 3.1 summarizes the number of stocks in the index that fulfilled our data requirements over time. Figure 3.1: Number of stocks fulfilling data requirement 53

62 Extra data for covariance matrix estimation Our empirical analysis of the portfolio strategies started on February 27th That means we needed 24 months more data to estimate the covariance matrix to be able to start on this date. We therefore expanded the dataset with data from March 31st 1996 to February 27th The index did not exist in this extended data period, but most of the stocks that were in included from February 27th 1998 did. We therefore expanded the data of all stocks with 24 months so that we obtained sufficient data for the covariance matrix from February 27th and for the next 24 months. It should be noted that this data were only used in the covariance matrix estimation, and not in any other part of the analysis. EURO STOXX 50 general information In this section we will provide information about the EURO STOXX 50 index. This includes the selection procedure of components, which countries and sectors that are included, and other requirements for stocks to be selected as components. The EURO STOXX 50 index captures the performance of the 50 largest companies in the eurozone in terms of their free-float market capitalization (STOXX Ltd., 2015a). STOXX Ltd. (2015c) defines their free-float market capitalization as the share of a stocks total market capitalization that is available for trading. The index covers 19 supersectors which are undersectors of STOXX s ten industries and the index covers 12 of the eurozone countries. These supersectors and countries can be seen in table 3.1. Table 3.1: Countries and supersectors covered by the EURO STOXX 50 Countries Spain, Portugal, France, Germany, Italy, Luxembourg, Netherlands, Austria, Belgium, France, Greece, and Finland Supersectors Oil & Gas, Chemicals, Basic Resources, Construction & Materials, Industrial Goods & Services, Automobiles & Parts, Food & Beverage, Personal & Household Goods, Healthcare, Retail, Media, Travel & Leisure, Telecommunications, Utilities, Banks, Insurance, Real Estate, Financial Services, and Technology Source: (STOXX Ltd., 2015c). Note: Not all countries and supersectors are represented in the index at all times. This depends on the companies free-float market capitalizations 54

63 According to STOXX Ltd. (2015a), the EURO STOXX 50 index was introduced on February 26th 1998, but historical prices and net returns are calculated back to December 31st The base value for the net return and price were set to 1000 on December 31th In figure 3.2 the historical net return in euro over the analysed time period in this thesis can be seen. Figure 3.2: Net return EURO STOXX 50 in euro For a stock to be selected as a component in the index it needs to fulfil the minimum liquidity criteria from STOXX. In addition to fulfilling liquidity requirement, the stock needs to be leading in its respective supersector. This ensures that the index represents some of the largest and most liquid companies in the eurozone across sectors and countries. The selection procedure results in that the EURO STOXX 50 captures approximately 60% of the free-float market capitalization of the EURO STOXX Total Market Index STOXX Ltd. (2015a). According to STOXX Ltd. (2015c) the procedure of component selection is performed according to the following three step process: 1. All the stocks in the 19 EURO STOXX supersector indices are ranked by their free-float market capitalization. The largest stocks in terms of free-float market capitalization from each of the supersector indices are added to the selection list. This process continue until the stocks on the list make up close to, but less than, 60% of the free-float market capitalization of the EURO STOXX TMI supersector index. Then, all the current components of the EURO STOXX 50 are added to the selection list. 55

64 2. The final selection list is then created by ranking all the stocks from the above selection list after their free-float market capitalization. 3. The largest 40 stocks from the final selection list are then selected and the remaining 10 stocks are selected from the stocks ranked between The index is reviewed every year in September, but STOXX also have fast exit and fast entry rules. The replacements by fast exit and fast entry is based on the latest selection list which is updated monthly (STOXX Ltd., 2015c). If a stock is ranked below 75 on the selection list for two consecutive months, it will be deleted from the index and replaced by the highest ranking stock which is not currently in the index (STOXX Ltd., 2015c). If a stock is ranked between 1-25 on the selection list at the end of February, May, August, or November it will be added as fast entry. If added, the stock will replace the currently smallest stock in the index. When there are mergers and acquisitions, the original stock is replaced by the new stock (STOXX Ltd., 2015c) Risk-free rates In this section we will argue for our choice of risk-free rates. This discussion mainly includes background information about the Euribor 1-month rate because this rate is used in 196 out of 205 months in the analysis. We provide only a short discussion of the LIBOR 1-month in Deutsche Mark as this rate is used in only 9 out of 205 months in the analysis. The choice of monthly rates is natural because the portfolios are rebalanced every month. We are aware that there is a small credit risk associated with these rates, but we will assume this risk to be small enough to ignore. We have not yet observed defaults on such short term loans between high rated banks. The discussion if risk-free rates actually exist or not in practice is not included in this thesis, and both rates will be treated as risk-free. We have chosen the European Interbank Offered Rate (Euribor) to be the risk-free rate from December 31st 1998 in this thesis. We believe this rate is suitable to use as the risk-free rate as it is mainly calculated based on expectations from banks in the eurozone. This rate is defined in the same currency as the stocks used in the analysis. We can therefore overlook any form of exchange rate risk if we assume all investors invest in euro. According to the European Banking Federation (2014) the interest rate is set after a panel of banks submit their daily expectations to quotes. The rate they submit is the rate they believe other banks in the eurozone are quoting to each other, rounded to three decimals (European Banking Federation, 2014). The highest and lowest 15% of the quotes collected are omitted from the calculations of the rate. As of May 8th 2015, the panel consists of 22 large eurozone banks and two banks from outside the eurozone (European Money Markets Institute, 2014). Banks that have important eurozone operations can be included in addition to the eurozone 56

65 banks. The Euribor rate was first available for trading in the start of January 1999, at the same time as the euro was introduced. Because of this, there only exist data for the Euribor rate from December 31st 1998 until present day. To fill the data gap between February 28th and December 31st 1998 we will use the LIBOR 1-month in Deutsche Mark. To use German data for pre euro periods in research is done by Boeckx et al. (2013), and we will therefore do the same. The historical annualized Euribor 1-month rate and LIBOR 1-month in Deutsche mark rate are presented in figure 3.3. The Euribor rate can be seen in blue, and the LIBOR in Deutsche Mark rate in red. The smooth transition between the rates further supports our choice of replacement prior to the Euribor. Source: Thompson Reuters Datastream and Econstats. Note: Euribor 1-month rate in blue, LIBOR 1-month in Deutsche Mark in red Figure 3.3: Risk-free rate From figure 3.3 we see that the interest rate has reached negative values recently. The fact that the rates are negative is an interesting case. This implies that there will be a cost when lending out money to other banks. As opposed by the regular, that it costs money to borrow money from banks. 57

66 Table 3.2: Component changes in the EURO STOXX 50 Date of Change Additions Deletions 22/9/1999 Banco Santander Akzo Nobel 22/9/1999 BASF Allies irish Bank 22/9/1999 Bayr. Hypo. Vereinsbank Elsevier 22/9/1999 Dresdner Bank Fiat 22/9/1999 Munich Re Lufthansa 22/9/1999 Sanofi-Synthelabo Portugal Telecom 22/9/1999 Suez Lyonnaise des Eaux Schneider Electric 01/11/1999 BNP Pariabas 01/11/1999 Pinault Printemps Redoute ELF Aquitaine 20/12/1999 Aventis Rhone-Poulenc 14/02/2000 Canal + Mannesmann 20/03/2000 ENEL St. Gobain 18/09/2000 Danone Electrabel 18/09/2000 Sao Paolo Metro 11/12/2000 Volkswagen Canal + 23/07/2001 Telecom Italia Mobile Dresdner Bank 24/09/2001 St. Gobain KPN 23/09/2002 Lafarge Pinault Printemps Redoute 22/09/2003 Iberdrola Bayr. Hypo. Vereinsbank 28/07/2004 SAP Aventis 20/09/2004 Credit Agricole Volkswagen 30/06/2005 Allied Irish Banks Telecom Italia Mobile 20/07/2005 Renault Royal Dutch Shell A 24/09/2007 Arcelormittal Ahold 24/09/2007 Schneider Electric Allied Irish Banks 24/09/2007 VINCI Lafarge 10/10/2007 Volkswagen Endesa 15/10/2007 Deutsche Boerse AG ABN Amro 22/07/2008 Gaz de France Suez 22/09/2008 Alstrom Alcatel Lucent 21/09/2009 Anheuser-Busch Inbev Fortis 21/09/2009 CRH PLC Renault S.S 08/02/2010 Unibail-Rodamco Volkswagen 20/09/2010 BMW AEGON 19/09/2011 Volkswagen PREF Alstrom 19/09/2011 Inditex Credit Agricol 18/06/2012 ASML HLDG Deutsche Boerse 18/06/2012 Essilor International Telecom Italia 18/03/2013 EADS (Airbus Group) Nokia 18/03/2013 Deutsche Post ArcelorMittal 22/09/2014 Nokia CRH Source: The annual blue chip index reviews from 1999 to 2014 from STOXX s news page, URL: information.html?symbol=sx5e, STOXX index data for provided by STOXX, and Harvey (2014) 58

67 3.2 Calculation methodology In this section we will discuss the methods used in all calculations in the analysis. This includes how we have created the portfolios and the covariance matrices, in addition to the calculation method of various ratios, transactions costs, and other values. All portfolios were constructed rolling on the last trading day of each month. The first portfolios were constructed on February 28th 1998, and the final portfolios were constructed on the last trading day of February The covariance matrices were also calculated rolling on the last trading day of each month based on the past 24 months of data. This resulted in 205 shrinked covariance matrices using the constant correlation model, and 205 shrinked covariance matrices using the single-index model. Since all calculations were performed with both covariance matrices as input we constructed 410 portfolios for each risk-based asset allocation strategy. We constructed 205 1/n portfolios because it is independent of the covariance matrix and is identical in both situations. All covariance matrices were saved as matrices. All portfolio weights were saved in vectors every month, and all corresponding component returns were saved in vectors. When we implemented the theory on the historical data, we used the statistical computing program R for all calculations and graphs (R Core Team, 2014). The entire script can be seen in the appendix. We used some R packages or parts of R packages in the code. Only well documented packages that followed the theory discussed earlier were used. We will discuss where we have used packages, and these will be referred to. The package manuals can be found by following the references Basic calculations When we calculated returns from holding a stock or the index from one period to the next, we calculated the percentage difference between the total return index for stocks and the net return for the index between the two periods. r i,t = T RI i,t T RI i,t 1 T RI i,t 1, r EuroStoxx50,t = NR t NR t 1 NR t 1 (3.3) Here r i,t is the return from holding stock i from t 1 to t. r EuroStoxx50,t is the return of the index from t 1 to t. As we have monthly data, each t represent one month. When we calculated the means of returns the calculation method depended on if we considered a situation where returns were re-invested or not. If we looked at the situation where investors re-invested the full portfolio value in each period we calculated the geometric monthly mean for 59

68 asset i as ( T r i,gmmonthly = (1 + r i,t ) t=1 ) 1 T 1 (3.4) where T was the total number of months. We annualized the above geometric mean for asset i by r i,gmannualized = (1 + r i,gmmonthly ) 12 1 (3.5) When the investor did not re-invest returns we calculated the average arithmetic monthly mean return for asset i by r i,ammonthly = 1 T T r i,t (3.6) t=1 where T was the number of months. annualized by The arithmetic monthly mean return for asset i was r i,amnnualized = 12 r i,ammonthly (3.7) For the geometric mean it is important to note that the asset return calculated needs to be of the form 1 + return. This so that we do not experience negative values in our formula. The result therefore also needs to be subtracted by 1. When we annualized standard deviations, we followed the common practice in finance and multiplied the standard deviation of monthly data by 12 σ annualized = 12σ monthly (3.8) Since the risk-free data was annualized we needed to calculate it back to monthly rates. We did this as the industry standard defined by Intercontinental Exchange, Inc. (2015) r monthly = r Y early = r Y early 12 When we calculated standard deviations and covariances in the program we have used the built in functions in R. According to R Core Team (2014), these functions calculate the standard deviation and covariance as σ(r) = 1 T (r t r) T 1 2, σ ij (R) = t=1 T t=1 (r i,t r i )(r j,t r j ) T 1 (3.9) 60

69 We see that R uses T 1 where we have defined T earlier. To follow our definitions from the theory chapter we adjusted the values from R in the following way T 1 σ = σ(r) T, σ ij = σ(r) T 1 T (3.10) When we have calculated the variance, we have used the relation between standard deviation, covariance, and variance. We therefore did not need to make corrections of variance since we did not use the built in R function for this calculation Covariance matrix estimation In this section we will explain the method of calculating the shrinked covariance matrix in this thesis. The calculations were done writing an R-script that implemented the theory from Ledoit and Wolf (2003a,b). The covariance matrix estimation can in this thesis be regarded as a 6 step process. By explaining these 6 steps the methodology becomes clear. Where we explain the method for one month, it should be noted that the program repeated these steps 205 times, one time for each month in the analysis. The first two steps was the process of selecting correct data every month, and to make sure only stocks fulfilling data requirements were included. The steps that follows are the calculation methodology itself. Before the steps are discussed, we will go through some initial work needed for the program to run properly. As discussed earlier, we have chosen our investment universe to consist of the components in the EURO STOXX 50 at the point of each portfolio rebalancing. That means that the return data for the covariance matrix estimation does not consist of the same 50 stocks, as some stocks enter and some stocks leave the index. The raw data consisted of 84 stocks over 205 months, starting with the first data of return on March 31st 1998, which is the date of the return from the investments the month before. Each row represented dates, and each column represented stocks. To always include the data of the correct stocks in the analysis we created a matrix of 1 s and 0 s. 1 means that the stock was a component of the index at that time, and 0 means the stock was not a component. Using these two matrices, the first step was sorting out the correct data from the raw data every month. Step 1 At March 31st 1998, the program selected the past 24 months of data for all 84 stocks from the raw data. Then the program checked the corresponding entry in the 1 and 0 matrix. If the entry was 1 the stock was kept, if the entry was 0 the stock was deleted. Then the updated raw data was saved in a matrix numbered by the month. This step was repeated 205 times with rolling 24-month data. As a result, matrices 61

70 were created, containing the correct data for all periods. Step 2 The second step was to ensure all stocks fulfilled the 24 months data requirement. The program selected all 205 matrices from step 1, and counted the number of observations for each stock. If this number was less than 24, the stock was deleted. This updated data was then saved in 205 new matrices. This data was then, at a given time, the stocks that were included in the index, with a minimum of 24 months historical data. One of the results of this step was that the dimension of these matrices varied, which again led to covariance matrices with different dimensions. However, this did not affect the results because the entire program adjusted after the number of stocks at all times, but it should be noted. The dimension of the covariance matrix corresponded to the number of stocks in figure 3.1. Step 3 From the returns in step 2, we calculated the sample covariance matrix for each of the 205 periods. The sample covariance matrix was computed as in equation (2.32). We see that we needed the mean of the returns in each period of time, and for this we used equation (2.31). The sample covariance matrices were symmetric, and had dimensions n n. Where n corresponded to the number of stocks that satisfied step 2 in each period. Step 4 The target of our shrinkage estimation needed to be chosen. Here our program had two different models to implement. The methodology of the two models differed not only in their target matrix, F, but also the estimators ˆρ, ˆγ and ˆκ differed as they were dependent on the input of F. Since the estimators differed, so did the shrinkage intensity calculation of the of the single-index model and the constant correlation model. Step 4.1 The constant correlation model was implemented by rewriting the matlab code written by Ledoit and Wolf (2004a) as suggested in Ledoit and Wolf (2003a). We used the sample covariance matrix from each period found in step 3 to find the constant correlation matrix of the corresponding period. The sample covariance matrix was the only input needed to find the corresponding constant correlation matrix. The first step of finding the constant correlation matrix was to convert the entries in the sample covariance matrix to the correlations using the standard function in R, cov2cor. This function transformed the covariances into correlations, in accordance with equation (2.15). With these values, we implemented equation (2.39), which we again used to implement (2.43) to find f ij. The diagonal of F was replaced by the diagonal of the sample covariance matrix. 62

71 Step 4.2 The single-index model was implemented in this thesis based on the matlab code written by Ledoit and Wolf (2004a). This was the code Ledoit and Wolf (2003b) suggested to use. The code was rewritten to work in R. As suggested by Ledoit and Wolf (2003b), the market was set to be the equally-weighted portfolio. We made this choice because they argued that this portfolio explains stock market variance better than the market capitalization index, which in our case would be the EURO STOXX 50. The target matrix in the singleindex model was calculated as in the theory section 2.3.3, i.e. we found F by following equation (2.51) every month. Step 5 We used the R-package tawny written by Rowe (2014) to calculate the estimators and the shrinkage intensity in line with the theory presented in section This package calculated the three estimators needed to find ˆκ by estimating the sum of asymptotic variances, ˆπ, in line with equation (2.70) for both models, the sum of asymptotic covariances ˆρ by implementing equation (2.77) for the constant correlation model and (2.84) for the single-index model. The misspecification of the target ˆγ was calculated by equation (2.85) for both models, with the input from the different targets. These estimators were the inputs needed to find ˆκ with equation (2.86). The latter was substituted into equation (2.87) to find the two different optimal shrinkage intensities for the two models. Step 6 From the previous steps we calculated all of the values and matrices needed to find the shrinked covariance matrix, ˆΣ from equation (2.35). This matrix was calculated twice every month, one using each of the targets. These matrices were then used as inputs in the risk-based models Global minimum variance portfolio The construction of the global minimum variance portfolio depended on our assumptions regarding short-sales. If we allow short-sales we knew we could find the global minimum variance portfolio from equation (2.129) every month. In the case with the long-only restriction, there was no closed-form solution and we needed to use quadratic programming. We used the R package quadprog by Turlach and Weingessel (2013). This package implemented the dual method from (Goldfarb and Idnani, 1982, 1983). The quadprog package solved quadratic programming problems on the form min x s.t. d x x Dx Ax b (3.11) 63

72 Here A was an m n matrix, D was an n n symmetric matrix, b was an m-vector, and x and d were n-vectors. We picked suitable values for d, b, D, and A to rewrite the above into the global minimum variance problem. We did this by defining d and D as d = (0,..., 0) (3.12) D = 2Σ (3.13) and transformed equation (3.11) into the desired optimization problem min x x Σx (3.14) To ensure the long-only weight restrictions and that x 1 = 1 we defined A and b as b = (1, 0, 0,..., 0)., A = (3.15) We had now transformed the standard quadratic problem in quadprog into the problem needed to find the global minimum variance portfolio with specified constraints. The R program solved (3.11) every month to find the global minimum variance portfolio with long-only weights Most diversified portfolio According to Choueifaty et al. (2013) the most diversified portfolio optimization is a quadratic programming problem because the DR is invariant by scalar multiplication. That means solving the problem below, and afterwards rescaling the portfolio weights to sum to one, is the same as maximizing the diversification ratio min x 1 2 x Σx s.t. x v = 1 x i 0 (3.16) There were some advantages by restating the problem in this way. We now had the problem on the same form as when solving for the global minimum variance portfolio. We then only needed to define A, D, b, and d to transform the problem to into fitting the quadprog package to find the most diversified portfolio. We did this by defining d and D again as d = (0,..., 0) (3.17) D = 2Σ (3.18) 64

73 And by changing the A and b discussed earlier to σ σ b = (1, 0, 0,..., 0)., A = σ σ n (3.19) By doing this, we converted the problem into fitting the form required by the quadprog package to find the most diversified portfolio. The R program solved (3.11) with the above defined variables every month to find the most diversified portfolio Equally-weighted risk contributions portfolio Finding the equally-weighted risk contributions portfolio could be done with several algorithms. When picking the fastest and most suitable package in R to perform the minimization problems, we followed the advice from Gerrard and Johnson (2015) in this section. If we rescaled the weights to sum to one after the minimization problem, we saw that we can regard the problem from the theory section in the following way where we rescaled the weights to sum to one afterwards. x ERC = argmin f(x) (3.20) 0 x i 1 (3.21) We see that the above is a boxconstrained minimization problem, and to solve this in R, Gerrard and Johnson (2015) suggests to use the nlmindb function. This function solves box-constrained optimization problems using PORT routines (R Core Team, 2014). The optimizer is part of the stats package, which is again a part of the core of R. We combined this optimizer with a slightly modifying the FRAPO package by Pfaff (2013). By following Pfaff (2013) we defined f as ( f(x) = σ x (Σx) ) x Σx (3.22) which is the standard deviation of the risk contributions. It was important that the mathematical operations of equation (3.22) was done in the right way. The Σx were multiplied using the dot product, which gave us a vector. Each element was then divided by the standard deviation of the portfolio, which was a number, given by x Σx. The scaled vector was then multiplied 65

74 element wise by x. That meant that the function f(x) found the standard deviation of x 1 x 2 x n (Σx) x 1 Σx (Σx) x 2 Σx. (Σx) n x Σx (3.23) This operation was similar to minimizing the standard deviation of the risk contributions from equation (2.157) in chapter When this standard deviation was zero, we knew that all the risk contributions were equal, and we had minimized f(x). We chose the absolute tolerance to be 1e 15. We believed this was close enough to zero to say that we had found the ERC portfolio. The R program constructed the ERC portfolio every month, following the methodology explained above Equally-weighted portfolio The 1/n strategy did not require any inputs other than the number of assets, n, to be computed. We implemented this strategy simply by investing equally among all stocks in the investment universe at each time of rebalancing as discussed in the theory section Sharpe ratio When we computed the ex-post Sharpe ratios in our analysis we followed the theory previously discussed in section We calculated the Sharpe ratios over the entire period, and the Sharpe ratios over 24-month rolling windows. All Sharpe ratios were calculated by implementing equation (2.181) in R. All data used in the calculations were annualized. The risk-free rate used when calculating the Sharpe ratios was the rate presented in section In the situation where investors re-invest returns we calculated the average monthly returns with equation (3.4). To annualize these returns we used equation (3.5). Where investors did not re-invest we calculated the average monthly returns with equation (3.6) and annualized them by following equation (3.7). This method was used for both the risk-free rate and for the portfolio returns. The standard deviation was annualized by following equation (3.8). The rolling Sharpe ratios were calculated as the performance over a rolling 24-month period. This meant that the first rolling Sharpe ratio was calculated 24 months after the first portfolios were constructed. The first calculation window was therefore from March 31st 1998 to March 31st We calculated the rolling ratios over this window, and up to March This resulted in 181 rolling Sharpe ratios per portfolio for each covariance matrix. 66

75 The Sharpe ratios over the entire period were calculated with annualized values over the entire period. This calculation was done using all available data in the calculations. The risk-free rate was therefore a combination between the Euribor 1-month and LIBOR 1-month in Deutsche Mark Sortino ratio The Sortino ratios were calculated over a 24-month rolling window, and over the entire period. The periods were the same as for the Sharpe ratio calculations. We therefore calculated the same amount of Sortino ratios as Sharpe ratios to be able to compare them. We calculated returns as discussed in the Sharpe ratio methodology section. All Sortino ratios were calculated by implementing equation (2.183) in R. In this thesis we chose to set R MAR = r rf. That means that the numerator in the Sortino ratio was the same as in the Sharpe ratio. We can see this from equation (2.178) and equation (2.183). The only difference when calculating the Sortino ratio from the Sharpe ratio was then the downside standard deviation. To calculate the downside standard deviation, we used the package PerformanceAnalytics by Peterson and Carl (2014) to solve the equation δ MAR = T 1 T min(0, r t r MAR ) 2 (3.24) t= Maximum drawdown When we calculated the maximum drawdown in the analysis, we used the PerformanceAnalytics package in R by Peterson and Carl (2014). This package implemented the maximum drawdown as in equation (2.185). To measure the drawdown in both financial crises and over the full period, we split the maximum drawdown calculations in to three periods. We calculated the maximum drawdown over the full period, and we split the drawdown calculations in two non overlapping sub periods. These periods were between March 1998 and June 2006, and July 2006 and March Skewness and kurtosis To implement the previously discussed theory behind kurtosis and skewness in R, we used the e1071 package. This package was created by Meyer et al. (2014) from the Department of Statistics at TU Wien. In this package all the formulas from Joanes and Gill (1998) were imple- 67

76 mented. We specifically selected the package to calculate the skewness and kurtosis according to equation (2.189) and equation (2.188). These measures were calculated on the portfolio returns over the complete analysed period, i.e. 205 observations for each portfolio Turnover and transaction costs When we calculated the turnover and transaction costs we followed the process from the theory in section We calculated the turnover every month. The first turnovers were calculated on the first time of rebalancing, at March 31st The last turnovers were calculated on the rebalancing at February 28th We will now explain how we calculated the turnover between two periods, period t and period t + 1. This methodology was the same for all 204 periods with rebalancing. Since the components in the index were not constant over time, we needed to be sure that turnover was calculated correctly. The program was created so that each vector of portfolio weights, and each vector of returns were updated in each period. This framework worked correctly when the weights are multiplied with the corresponding return. However, vectors of portfolio weights across periods did not necessary contain the same stocks. We therefore needed to make a few adjustments before following the procedure from the theory. We created helping variables for the portfolio weights and returns with one entry for each stock that at some point had been included in the index in the analysis. This resulted in new weights and returns vectors with 84 entries each. We then matched the helping vectors with the return and portfolio weights vectors by the name of the stocks. The helping vectors then correctly copied the return and portfolio weight values where the names matched. Where there were no match, the entries were set to zero. Since the helping variables always had the same stocks in the same order, we could perform the turnover calculation process from the theory. The following calculations were similar for all portfolio strategies. We implemented the turnover and transaction costs in accordance with the theory from section 2.10 as follows Step 1 At the moment of rebalancing we first multiplied the portfolio weights from period t, with the returns each component had gained from t to t + 1. That means that we calculated ˆx t as in equation (2.190) every month x 1 (1 + r 1 ) x ˆx t = 2 (1 + r 2 ) (3.25). x 84 (1 + r 84 ) 68

77 Step 2 The next step was then to normalize the weights to sum to 1 as in equation (2.191) ˆx 1 /q ˆx x t = 2 /q (3.26). ˆx 84 /q where q = 84 i=1 ˆx i (3.27) Step 3 The portfolio turnover is then the sum of the component turnovers. This sum is calculated as follows Turnover = 84 i=1 x i x i,t+1 (3.28) where x t+1 is the portfolio to rebalance into at t + 1. Step 4 The transaction cost was then calculated as TC = Turnover (3.29) When we provided numbers for annualized turnover, we used the following formula Turnover Annualized = Turnover January + Turnover F ebruary + + Turnover december (3.30) That means that we calculated annualized turnover as the sum of all the monthly turnovers from January to December in a year. That means we did not calculate the annualized turnover for 1998 and 2015 because we did not have all the data for these years. However, the monthly turnover and transaction costs were calculated for all years. 69

78 Constant risk targeting In the analysis with the constant risk target we used all the portfolios from the previous calculations and included the Euribor 1-month as the risk-free asset. To maintain a constant expected risk, the new portfolios now shifted between the risk-free asset, and the risky asset in form of a scalar of the previous found portfolios. The methodology will be explained in general. From the previous calculations at time t we had the portfolios x GMV,t, x GMV (long only),t, x ERC,t, x MDP,t, x 1/n,t (3.31) and the risk-free asset, rf t. The time t represents the months, and goes from 1 to 205. We defined the target volatility which we wanted to obtain as σ T arget. This target volatility was kept constant and did not vary with time. The portfolios that had the goal of achieving the target volatility was defined as y GMV,t, y GMV (long only),t, y ERC,t, y MDP,t, y 1/n,t (3.32) These target portfolios invested k in its respective risky portfolio, and (1 k) in the risk-free asset. In this way their weights summed to one. We had no restrictions on leverage. In theory, this means that k could take on any positive number. As the risk-free asset was risk-free we assumed zero correlation between the portfolios and the risk-free asset. And because the risk-free asset per definition did not have any risk, we set the volatility to zero. For the two asset portfolios described above, we knew that the volatility therefore had to be σ pf = x 2 Risk σ2 Risk + x2 rf σ2 rf + 2x Riskx rf σ Risk σ rf ρ Risk,rf (3.33) σ pf = x Risk σ Risk (3.34) To ease to the notation in the following calculations we define a = GMV (3.35) b = GMV (long-only) (3.36) c = MDP (3.37) d = ERC (3.38) e = 1/n (3.39) 70

79 To find the scalars at time t for each portfolio that fulfilled the volatility target, we solved equation (3.34) for all portfolios every month k a,t = k b,t = k c,t = k d,t = k e,t = σ T arget x a,t Σx a,t σ T arget x b,t Σx b,t σ T arget x c,t Σx c,t σ T arget x d,t Σx d,t σ T arget x e,t Σx e,t (3.40) (3.41) (3.42) (3.43) (3.44) The risk target portfolios were then the two asset portfolios of the risk-free asset and the earlier calculated portfolios y a,t = (k a,t x a,t, (1 k a,t )rf) (3.45) y b,t = (k b,t x b,t, (1 k b,t )rf) (3.46) y c,t = (k c,t x c,t, (1 k c,t )rf) (3.47) y d,t = (k d,t x d,t, (1 k d,t )rf) (3.48) y e,t = (k e,t x e,t, (1 k e,t )rf) (3.49) In the empirical analysis we defined the target volatility to be 10%. The target portfolios were put together by a combination of the risk-free asset and one of the risky portfolios. If the estimated volatilities of the portfolios were above 10%, k was lower than 1. If the volatilities were under 10%, k was over 1. 71

80 Chapter 4 Results In this chapter we will present the results of two different analyses. The results are based on the theories and methods derived earlier in this thesis. All results are presented in tables and figures. In the first case, we have analyzed a scenario with five investors. On February 27th 1998 they invested according to each of the five strategies derived in this thesis. In the first month, the investors invested 1 euro according to each of the different strategies. This is of course not a realistic amount to invest in real life, but in our framework the initial amount invested does not change the results. On the last day of every month, the investors re-invested in accordance with the strategies. The investors re-invested the entire portfolio value every time, meaning all returns were re-invested and nothing were cashed out during the selected time period. This case of investors which re-invest the entire portfolio value in each period is interesting because it reflects the way many investors invest in real life. For example, investors saving for retirement. These investors invest the full portfolio value at each rebalancing as the funds are not supposed to be spent before retirement. The period where the spending of the portfolios starts is not included in the analysis. In the second case, we considered five new investors over the same time period. The difference from the previous case was that these investors invested according to a constant risk profile. This risk profile was set to be equal to a constant annualized volatility of 10%. The investors invested in a combination of the risk-free asset and one of the risky portfolios to maintain the constant risk profile. This case is equally interesting as the previous as this strategy is commonly used in practice by pension funds. Pension funds often invest a fixed amount in the market because they pay out the same amount as their clients pay in. We therefore considered the bonus reserve which was invested every month to be constant. The results presented in section 4.1 to 4.5 was found using the single-index model as the shrinkage target matrix, F. We also applied the constant correlation model as shrinkage target to compare the results. Some of these results can be found in section 4.6, and the rest can be found in the appendix. This is done to answer how the choice of covariance matrix affected the results. 72

81 We have organized the presentation of the results so that we answer each of the sub questions on its own. We will answer the sub questions in the order they are stated in the problem formulation section. 4.1 How well do these portfolios perform when re-investing returns? In this section we will present the portfolio performances when re-investing returns. This presentation will include portfolio returns, volatilities, Sortino ratios, Sharpe ratios and maximum drawdowns. We begin the performance presentation by showing the cumulative performance of the portfolios in figure 4.1 Figure 4.1: Cumulative performance - single-index model We can see that the terminal portfolio values for all the constructed portfolios, are significantly larger than that of the market. Among the constructed portfolios the unconstrained GMV portfolio obtained the highest terminal value. The 1/n portfolio obtained the lowest terminal value among the strategies. It should be noted that the terminal values of the ERC, MDP, and 1/n portfolios are marginally different. We can see from figure 4.1 that all the portfolios 73

82 in general followed the market movements, but to different degrees. This is natural because we know that all the portfolios were constructed with the components of the market index. To more accurately present the portfolio performances we can consider the returns and standard deviations in table 4.1. Table 4.1: Returns and standard deviations - single-index model GMV GMV (long-only) 1/n MDP ERC EURO STOXX 50 Total Return % % % % % % Monthly Return 0.59% 0.55% 0.49% 0.51% 0.51% 0.33% Ann. Return 7.31% 6.76% 6.06% 6.27% 6.23% 4.08% Monthly vol. 3.83% 3.91% 5.99% 4.07% 4.98% 5.60% Ann. vol % 13.53% 20.74% 14.10% 17.25% 19.38% Note: Monthly and annualized returns are geometric average returns, i.e. monthly return = (1 + total return) 1/205 1, and annualized return = (1 + monthly return) The risk-free rate in the same period had a total return of 47.16%, monthly return of 0.19%, and annualized return of 2.30%. Volatilities are annualized by scaling monthly volatilities by 12. From table 4.1 we see that the GMV portfolio had an annualized return of 7.31%. This was 79% more than the market, 21% more than the 1/n portfolio, 17% more than the ERC portfolio and the MDP, and 8% more than the GMV (long-only) portfolio. It is interesting to see that the four portfolios with the highest returns also had the lowest volatilities. The reasons for this will be highlighted later in this section. We can see from figure 4.1 that the GMV portfolio and GMV (long-only) portfolio had the lowest annualized volatilities with 13.3% and 13.5%. The goal of these portfolios were to minimize the variance. We can see that with or without short-sale restrictions, these portfolios managed to achieve this among the constructed portfolios. The MDP was slightly more volatile with 14.1% annualized volatility. The most volatile portfolio was the 1/n portfolio with 20.7%. This was 56% more than the GMV portfolio. It is clear that all the risk-based asset allocation strategies were able to reduce the risk compared to investing in the market index or equally across assets. In addition to having lower volatility, all the risk-based strategies also had higher returns than the market index and the 1/n portfolio. The relations between the portfolios risks and returns are summarized in the return-standard deviation space in figure 4.2 The higher up a portfolio is placed in figure 4.2, the higher was the return of the portfolio. The further left a portfolio is located, the lower was its volatility. That means that portfolios situated up towards the left corner had the strongest performance in terms of return and volatility. Portfolios placed diagonally up left of another portfolio, both had a higher return and a lower volatility. We can therefore say that the two GMV portfolios and the MDP showed the strongest performance in terms of return and volatility alone. We will now quantify the performances further, with well-known ratios and measures. 74

83 Figure 4.2: Return-standard deviation - single-index model In table 4.2 we present the Sharpe ratios, Sortino ratios, and the maximum drawdowns of the portfolios. If we rank the portfolios after their Sharpe ratios or Sortino ratios, the ranking will be the same. Either way, the GMV portfolio performed best with a Sharpe ratio of 0.38 and Sortino ratio of The GMV (long-only) portfolio performed second best, and the MDP third best. The worst performance among the constructed portfolios in terms of these two measures was the 1/n portfolio. Even though the 1/n portfolio had the lowest Sharpe ratio and Sortino ratio among the constructed portfolios, it had twice as high ratios as the market index. We have calculated the maximum drawdowns over three different periods in table 4.2. These periods were the full period, and the two sub periods capturing the drawdowns in the two financial crises. These values provide information about the most extreme movements in the portfolio values. This information provide more details about the portfolio risks than the volatilities alone. 75

84 Table 4.2: Maximum drawdowns, Sharpe ratios, Sortino ratios - single-index model GMV GMV (long-only) 1/n MDP ERC EURO STOXX 50 Maximum 46.60% 43.98% 56.23% 48.89% 52.12% 59.90% drawdown Max. drawdown 39.64% 43.98% 54.43% 40.96% 47.39% 59.90% Max. drawdown 46.60% 39.77% 56.23% 48.89% 52.12% 54.08% Sortino ratio Sharpe ratio Note: The maximum drawdown captured the drawdown in the first financial crisis, the maximum drawdown captured the drawdown in the second financial crisis. The Sortino ratio and the Sharpe ratio were calculated with the annualized data from table 4.1 We can see that the maximum drawdown for the GMV (long-only) portfolio and the market index occurred during the first financial crisis. For the other portfolios, the maximum drawdowns occurred during the second financial crisis. We can see one of the reasons why the two GMV portfolios performed better than the other portfolios in this period from figure 4.1 and table 4.2. These portfolios contained their portfolio values better than the others, when the market was the most volatile. In both the financial crises the GMV portfolio and GMV (long-only) portfolio had among the lowest drawdowns. If we look at the maximum drawdowns over the entire period in table 4.2 we see that the GMV (long-only) portfolio had the lowest value with 43.98%. The GMV portfolio was close with 46.60% followed by the MDP with 48.89%. The ERC portfolio was further behind with 52.12%, and the 1/n portfolio had the highest maximum drawdown with 56.23%. All the constructed portfolios had lower maximum drawdowns than the market index which had a maximum drawdown of 59.90%. We notice that the GMV (long-only) portfolio performed better than the GMV portfolio in terms of drawdown. This is interesting as this portfolio is subject to more constraints than the GMV portfolio. To further discuss the risk of the portfolios we can consider their rolling volatilities. 24-month rolling volatilities are presented below, in figure 4.3 The 76

85 Figure 4.3: Rolling 24-month volatility - single-index model The results from the rolling volatility figure are in line with our previous findings. We see that the two GMV portfolios and the MDP were able to keep volatilities lower in periods with extreme fluctuations. This explains why they had the lowest maximum drawdowns and the lowest overall volatilities. This is clear from figure 4.3 where we see that the GMV portfolios and the MDP, had the most stable rolling volatilities. From figure 4.3, we see that the two GMV portfolios never exceeded 20% annualized rolling volatility. The MDP exceeded 20% annualized volatility in a single period of time. The 1/n portfolio had the largest fluctuations in rolling volatility. This portfolio exceeded 30% annualized volatility over multiple periods. The ERC portfolio fluctuated more than the other risk-based strategies during the two crises. This portfolio exceeded 25% annualized volatility in both crises. This explains the higher overall volatility and drawdowns of this portfolio. As with the rolling volatilities, the rolling Sharpe ratios and rolling Sortino ratios will be presented in figures below. These two ratios in table 4.2 were calculated over approximately 17 years. These ratios are often calculated over shorter periods of time in the financial industry. We have therefore calculated them over a rolling 24-month window. We will first present the rolling Sharpe ratios from figure

86 We see that GMV (long-only) portfolio achieved the highest overall rolling Sharpe ratio among the portfolios in November This was because of the long period of positive returns with low volatility during the past months. In addition to having the highest Sharpe ratio, the GMV (long-only) portfolio also had the lowest Sharpe ratio. In general, the MDP, GMV, 1/n, and ERC portfolios followed each other closely, with some exceptions. The GMV (long-only) portfolio seemed to fluctuate more extreme around the others. We see that in the period between August 2011 and October 2013 the portfolios had the most spread in rolling Sharpe ratio. The same spread can be seen in the rolling volatilities in figure 4.3 and overall performances in figure 4.1. Figure 4.4: Rolling 24-month Sharpe ratio - single-index model The rolling Sortino ratios shows similar results as the rolling Sharpe ratios in figure 4.5. The GMV (long-only) portfolio had the highest and lowest rolling Sortino ratio. Its value also spiked in the same point as with the rolling Sharpe ratio. We see the same pattern of how the other portfolios Sortino ratios followed each other closely, except during the same period presented above. 78

87 Figure 4.5: Rolling 24-month Sortino ratio - single-index model 4.2 What are the characteristics of the portfolios returns distribution? In this section the return distribution characteristics of the different portfolios and the EURO STOXX 50 will be presented. This presentation includes a graph of the return distributions in figure 4.6 and the key results in table 4.3. It should be noted that we used the arithmetic mean of returns in this section. This is because we analyzed the distributions of returns, and not the cumulative performances as earlier. It should be noticed that the tails in the density plots have higher values in the ends than the numbers from the table. This is because the plots are smoothed presentations of the densities based on the bandwidth optimization performed by R. 79

88 Figure 4.6: Density of portfolio returns - single-index model From table 4.3 we can see some similarities among the return distributions for the portfolios and the EURO STOXX 50. We can see that they all had similar characteristics regarding the return distribution shape. All portfolios had an excess kurtosis and negative skewness. The kurtosis indicates that these distributions had fatter tails than what of a normal distribution. These fat tails means that these portfolios had more extreme returns than what we could expect from normal distributed returns. The negative skewness tells us these distributions are not symmetric, but skewed to the left. This means that these portfolios had more extreme negative returns than extreme positive returns. The 1/n portfolio and the ERC portfolio had the highest maximum values, with 19.99% and 15.15% respectively. The two portfolios with the lowest maximum values were the GMV portfolio and the MDP. Their maximum values were 9.08% and 9.74%. We can see that these values were under half the size of the maximum value of the 1/n portfolio. As with the most extreme positive returns, the 1/n portfolio and the ERC portfolio had the most negative minimum returns. The 1/n portfolio had a maximum negative return of 21.61%, and the ERC portfolio had a maximum negative return of 17.39%. The GMV portfolios and the MDP had the lowest maximum negative returns. If we look at the top and bottom 5% returns for each portfolio we can see some differences. The GMV (long-only) portfolio and the MDP had larger 95 th percentiles than absolute 5 th percentiles. The other portfolios had larger 95 th percentiles than absolute 5 th percentiles. The 1/n portfolio had the largest absolute percentiles. This means that it had more extreme values 80

89 in both tails. The large negative extreme values and 5 th percentile can explain why the 1/n portfolio and the ERC portfolio had the largest maximum drawdowns. The 1/n portfolio also had the most negative returns, with 40.98% of the returns being below zero. The two GMV portfolios had the fewest returns below zero, with 34.15% each. We also notice that the MDP and ERC portfolio had equally many returns below zero, with 37.56% each. If we look at figure 4.6 the above presentation is summarized. We see that the two GMV portfolios and the MDP return densities were more concentrated around their means with smaller tails. They had higher peaks than the other portfolios because they had lower volatilities. The 1/n portfolio had the largest tails and the lowest peak among the constructed portfolios. This means that its density in the extremes was larger than the others. Table 4.3: Return distribution characteristics - single-index model GMV GMV (long-only) 1/n MDP ERC EURO STOXX 50 Max. value 9.08% 11.05% 19.99% 9.74% 15.15% 15.49% Min. value % % % % % % % Under zero 34.15% 34.15% 40.98% 37.56% 37.56% 41.95% 5 th percentile -5.90% -6.04% % -6.33% -8.46% % 95 th percentile 5.62% 6.09% 8.68% 6.46% 6.85% 8.30% Kurtosis Skewness Mean return 0.66% 0.62% 0.67% 0.59% 0.63% 0.49% Std. dev. 3.83% 3.91% 5.99% 4.07% 4.98% 5.60% Note: The returns displayed in this table are the monthly arithmetic means. These returns should not be confused with the geometric returns from table 4.1. All data in this table is monthly data. 4.3 How concentrated are the portfolio weights? To answer how concentrated the portfolio weights are we will look at maximum weights, minimum weights, and the number of stocks in the portfolios over time. These findings are summarized in table 4.4. We begin by considering the number of stocks in the portfolios over time. In figure 4.7 we can see the number of stocks in the MDP, the GMV (long-only) portfolio, and the GMV portfolio. The 1/n portfolio and the ERC portfolio are not included as these portfolios per definition used all stocks available at all times. We notice that the GMV portfolio always used all the available stocks. The GMV (long-only) 81

90 Figure 4.7: Number of stocks invested in for each portfolio - single-index model portfolio on the other hand invested in the fewest stocks among the strategies. The minimum number of stocks the GMV (long-only) portfolio invested in was only 8 stocks. The most stocks this portfolio invested in was 24 stocks. On average, this portfolio invested in stocks. This made it the most concentrated portfolio among the strategies in terms of number of stocks included. The MDP was the second most concentrated in terms of number of stocks. This strategy s components ranged between 11 stocks and 28 stocks. The average number of stocks in this portfolio was stocks. Table 4.4: Portfolio concentrations - single-index model GMV GMV (long-only) 1/n MDP ERC Average number of stocks Min. number of stocks Max. number of stocks Average maximum weights 15.87% 25.31% 2.02% 20.44% 8.34% Average minimum weights 0.16% 0.83% 2.02% 0.42% 0.65% Maximum weight 43.65% 61.47% 2.33% 36.84% 19.91% Minimum weight 0.002% 0.004% 2.00% % 0.053% Note: Weights of zero were kept out of the minimum weight calculations. Maximum and minimum weights are absolute values In figure 4.8 we have presented the maximum absolute portfolio weights over time. This means that the maximum weights for the GMV portfolio can be long or short positions. From figure 82

91 4.8, we see that the GMV (long-only) portfolio was the most concentrated in a single stock over time. From table 4.4 we see that this portfolio had an average maximum weight 25.31%. At the most the GMV (long-only) portfolio invested 61.47% in a single stock. The MDP was the second most concentrated in terms of maximum weights. This portfolio had an average maximum weight of 20.44%, and a maximum weight of 36.84%. The GMV portfolio had incidents of larger maximum weights than the MDP, but on average it had a maximum concentration of 15.87%. The average maximum weight of the ERC portfolio was 8.34%. The maximum weight of this portfolio was 19.91%. This means that at times, some stocks had significantly lower volatility than the rest. These large concentrations in single stocks for all these strategies makes it clear why concentration limits are often imposed to such strategies. Figure 4.8: Maximum absolute values of portfolio weights - single-index model It is also interesting to consider the minimum weights. Some weights were very small because they were results of a pure mathematical optimization program. In practice this can make little sense for small scale investors. We see that the GMV portfolio, the GMV (long-only) portfolio, and the MDP constructed minimum weights of 0.002%, 0.004%, and %. The minimum weights over time can be seen in figure 4.9. The minimum weights also provided additional information about the concentration of the GMV (long-only) portfolio and the MDP. These two portfolios had on average few components in their portfolios. At the same time these two portfolios also had an average minimum weight of 0.83% and 0.42%. This means that these portfolios were even more concentrated in a few stocks than the average component numbers tell. 83

92 Figure 4.9: Minimum absolute values of portfolio weights - single-index model 4.4 What are the levels of turnover and transaction costs? To present the levels of turnover and transaction costs we have calculated maximum annualized turnover, minimum annualized turnover, average annualized turnover, and average transaction costs. These values are summarized in table 4.5. Table 4.5: Turnover and transaction costs - single-index model GMV GMV (long-only) 1/n MDP ERC EURO STOXX 50 Max. turnover 757% 571% 114% 431% 261% - Min. turnover 226% 90% 38% 194% 80% - Avg. turnover 407% 282% 72% 282% 158% - Avg. trans. cost 0.41% 0.28% 0.07% 0.28% 0.16% - Note: The average annualized turnovers were calculated as the average monthly turnovers multiplied by 12. Maximum and minimum annualized turnovers were calculated from the data in figure We do not know the turnover of the EURO STOXX 50 since we do not know the historical component weights. Avg. trans. costs = Avg. turnover From table 4.5 we can see that the 1/n portfolio had the lowest turnover and transaction costs. This is because this portfolio rebalanced by selling the winner stocks, and buying the loser stocks at each rebalancing. This resulted in few large changes. The maximum annualized 84

93 turnover for this portfolio was 114%, and the average annualized turnover was 72%. The ERC portfolio had the second lowest average turnover with 158%, which we consider to be a low turnover per year. The MDP and the GMV (long-only) portfolio had the same average annualized turnover with 282% each. However, the GMV (long-only) portfolio had both the lowest and the highest turnover among these two. This means that the MDP had more stable turnover over time. The GMV portfolio clearly had the highest average turnover, maximum turnover, and minimum turnover. The average turnover of the GMV portfolio was 407%. In figure 4.10 we can see the annualized turnover per year. Note: 1998 and 2015 are not included as we do not have data for all months in these two years. Figure 4.10: Annualized turnover - single-index model In figure 4.11 we have calculated the cumulative performances again. The difference this time is that we have subtracted the transaction costs from the returns every month. We see that the gap between the terminal value of the GMV portfolio and the other portfolios was smaller when including the transaction costs. This is because the GMV portfolio s performance was reduced the most, as we could expect from the high turnover. 85

94 Figure 4.11: Performance after transaction costs - single-index model We have calculated the Sharpe ratios to compare the performance after adjusting for transaction costs. These Sharpe ratios can be seen in table 4.6. After adjusting for transaction costs the GMV portfolio still achieved the highest Sharpe ratio. We see that the difference between the Sharpe ratios did change, but the order is the same. No portfolio had high enough transaction costs to change the order between the portfolios in terms of Sharpe ratio. We notice that the portfolios with the highest transaction costs also had the highest Sharpe ratios. Table 4.6: Sharpe ratios after transaction costs - single-index model GMV GMV (long-only) 1/n MDP ERC EURO STOXX 50 Sharpe ratio % change in Sharpe ratio % -6.59% -5.23% -9.50% -7.02% - Note: The percentage change in Sharpe ratio is the percentage difference from the ratios in table

95 4.5 To what degree can these portfolios be used to obtain a constant risk target? In this section we will investigate to what degree the constructed portfolios can be used to obtain a constant risk target of 10%. The rolling 24-month volatilities with the risk target can be seen in figure The annualized volatility over the entire period with the risk target can be seen in table 4.7 Figure 4.12: Rolling 24-month volatilities with volatility target 10% - single-index model It is clear from figure 4.12 and table 4.7 that the GMV portfolio, the GMV (long-only) portfolio, and the MDP performed poorly. The two GMV portfolios achieved significantly higher volatilities in this analysis compared to the cumulative performance analysis. The reason for this is that they consistently underestimated the ex-ante volatilities. If we look at figure 4.13 we see that the GMV portfolio had a leverage over 1 at all times. This leverage also reached values up to over 2.5. This means that the ex-ante volatility was under half of 10%, which is clearly an underestimate of the 10% target. Therefore, this portfolio missed the target significantly. The GMV (long-only) portfolio and the MDP also had high leverage values. This means they also underestimated the volatilities. The ERC portfolio performed best for volatility targeting among the risk-based asset allocation strategies. This portfolio had a volatility of 12.91% over the period. The 1/n portfolio performed best among all the constructed portfolios. Over the entire period, this portfolio had a volatility of 11.53%. This can be considered to be fairly close to the target. 87

96 Table 4.7: Volatilities with target - single-index model GMV GMV (long-only) 1/n MDP ERC EURO STOXX 50 Annualized vol 24.10% 18.31% 11.53% 16.61% 12.91% - Figure 4.13: Leverage constant over time - single-index model 4.6 How can our choice of covariance matrix affect results? In this section we will present how our choice of covariance matrix can affect results. This presentation will be based on the results from the previous sections, and the results from the same calculations using the constant correlation model as shrinkage target. We have performed the exact same analysis as earlier, using the constant correlation model as shrinkage target. The differences we find the most important are presented in detail in this section. The rest of the findings can be seen in the appendix. In the end of this section we have summarized the key findings in table In table 4.12 we have calculated the percentage changes in the results using the two covariance matrices. The first significant difference we will look at is the differences in the portfolios risk targeting abilities. In figure 4.14 we have performed the risk targeting analysis when using the constant correlation model. 88

97 Figure 4.14: Rolling 24-month volatilities with volatility target 10% - constant correlation model By using the constant correlation model as target, the results became more stable around the target volatility of 10%. The GMV portfolio was still far off target with a volatility 18.38%. However, this was a decrease from the volatility of 24.10% in the analysis using the single-index model. The volatility of the MDP was in this analysis 12.06%, and the volatility of the ERC portfolio was 12.21%. The GMV (long-only) portfolio obtained a volatility of 14.18%, which was still quite far off target. Overall we can say that the strategies underestimated the risk less with the constant correlation model. Table 4.8: Volatilities with target - constant correlation model GMV GMV (long-only) 1/n MDP ERC EURO STOXX 50 Annualized vol 18.38% 14.18% 11.89% 12.06% 12.21% - In figure 4.15 we have presented the leverage constant using the constant correlation model. We see that the leverage was lower for all strategies. The reason for this is that they underestimated the risk less than in the previous analysis. 89

98 Figure 4.15: Leverage constant over time - constant correlation model We see that the performances had the same pattern regardless of which covariance matrix we used. After some periods with volatilities above target, we observe a drop in the leverage constant. In most periods this led to an overcompensation, leading to volatilities below target. This was of course not the case for the GMV portfolio as this strategy was consistently above target in both analyses. We will now look at how much the sample covariance matrix was included in each analysis. From figure 4.16 and figure 4.17, we can see the shrinkage intensities for the two targets over time. In table 4.9 we can see the maximum intensity, minimum intensity, and average intensity. We can see that the average intensities for the two targets were close. This means that the sample covariance matrix was on average used approximately the same regardless of shrinkage target. However, we can see a large difference in how often the shrinkage intensity was 1. When the intensity was equal to 1, the covariance matrix used in the analysis was solely the shrinkage target. This means that the sample covariance matrix was kept out of the analysis. When we used the single-index matrix as target, the sample covariance matrix was not used at all in 66 months. When using the constant correlation model as target, this occurred in 16 months. 90

99 Figure 4.16: Shrinkage intensity - single-index model Figure 4.17: Shrinkage intensity - constant correlation model 91

Since the seminal work of Markowitz [1952],

Since the seminal work of Markowitz [1952], Honey, I Shrunk the Sample Covariance Matrix Problems in mean-variance optimization. Olivier Ledoit Michael Wolf OLIVIER LEDOIT is a managing director in the Equities Division of Credit Suisse First Boston