Online Appendix to ESTIMATING MUTUAL FUND SKILL: A NEW APPROACH. August 2016

Similar documents
Gamma Distribution Fitting

Bayesian Estimation of the Markov-Switching GARCH(1,1) Model with Student-t Innovations

Time Invariant and Time Varying Inefficiency: Airlines Panel Data

Window Width Selection for L 2 Adjusted Quantile Regression

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Applied Macro Finance

Random Variables and Probability Distributions

An Introduction to Bayesian Inference and MCMC Methods for Capture-Recapture

A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the Determination of Specifications in Food Industry

University of California Berkeley

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

Model Construction & Forecast Based Portfolio Allocation:

Monthly Holdings Data and the Selection of Superior Mutual Funds + Edwin J. Elton* Martin J. Gruber*

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

Final Exam Suggested Solutions

Internet Appendix to Do the Rich Get Richer in the Stock Market? Evidence from India

Tests for One Variance

Online Appendix for Overpriced Winners

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Yafu Zhao Department of Economics East Carolina University M.S. Research Paper. Abstract

Objective Bayesian Analysis for Heteroscedastic Regression

Income inequality and the growth of redistributive spending in the U.S. states: Is there a link?

1 Bayesian Bias Correction Model

Technical Appendix: Policy Uncertainty and Aggregate Fluctuations.

Supplementary Material: Strategies for exploration in the domain of losses

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

The Persistent Effect of Temporary Affirmative Action: Online Appendix

Properties of the estimated five-factor model

An Improved Skewness Measure

Chapter 7: Estimation Sections

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Application of MCMC Algorithm in Interest Rate Modeling

Beating the market, using linear regression to outperform the market average

Financial Econometrics

Market Risk Analysis Volume I

Basic Procedure for Histograms

Some Characteristics of Data

Modeling Co-movements and Tail Dependency in the International Stock Market via Copulae

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Assicurazioni Generali: An Option Pricing Case with NAGARCH

Further Evidence on the Performance of Funds of Funds: The Case of Real Estate Mutual Funds. Kevin C.H. Chiang*

Trading Costs of Asset Pricing Anomalies Appendix: Additional Empirical Results

Market Timing Does Work: Evidence from the NYSE 1

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Data Analysis and Statistical Methods Statistics 651

1. You are given the following information about a stationary AR(2) model:

Analysis of extreme values with random location Abstract Keywords: 1. Introduction and Model

Evidence from Large Indemnity and Medical Triangles

Non-Inferiority Tests for Two Means in a 2x2 Cross-Over Design using Differences

Comparison of OLS and LAD regression techniques for estimating beta

discussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models

FINANCIAL ECONOMETRICS AND EMPIRICAL FINANCE MODULE 2

APPLYING MULTIVARIATE

Continuous Distributions

Cross-Sectional Distribution of GARCH Coefficients across S&P 500 Constituents : Time-Variation over the Period

Economics 424/Applied Mathematics 540. Final Exam Solutions

Model 0: We start with a linear regression model: log Y t = β 0 + β 1 (t 1980) + ε, with ε N(0,

Linda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach

Testing Out-of-Sample Portfolio Performance

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Asymmetric Price Transmission: A Copula Approach

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Relevant parameter changes in structural break models

Financial Econometrics Notes. Kevin Sheppard University of Oxford

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Internet Appendix to Leverage Constraints and Asset Prices: Insights from Mutual Fund Risk Taking

Portfolio Performance Measurement

How Does Reputation Affect Subsequent Mutual Fund Flows?

Financial Mathematics III Theory summary

Chapter 7. Inferences about Population Variances

Are Market Neutral Hedge Funds Really Market Neutral?

Estimation Appendix to Dynamics of Fiscal Financing in the United States

Lecture 2. Probability Distributions Theophanis Tsandilas

(5) Multi-parameter models - Summarizing the posterior

Financial Econometrics Jeffrey R. Russell. Midterm 2014 Suggested Solutions. TA: B. B. Deng

CEO Attributes, Compensation, and Firm Value: Evidence from a Structural Estimation. Internet Appendix

Statistical Inference and Methods

Improving Returns-Based Style Analysis

Equity correlations implied by index options: estimation and model uncertainty analysis

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Portfolio Construction Research by

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

A Hidden Markov Model Approach to Information-Based Trading: Theory and Applications

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Overnight Index Rate: Model, calibration and simulation

The Two-Sample Independent Sample t Test

The Multinomial Logit Model Revisited: A Semiparametric Approach in Discrete Choice Analysis

Monotonically Constrained Bayesian Additive Regression Trees

ROBUST CHAUVENET OUTLIER REJECTION

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Internet Appendix for Asymmetry in Stock Comovements: An Entropy Approach

DATA SUMMARIZATION AND VISUALIZATION

WC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology

Weight Smoothing with Laplace Prior and Its Application in GLM Model

Lecture 9: Markov and Regime

Transcription:

Online Appendix to ESTIMATING MUTUAL FUND SKILL: A NEW APPROACH Angie Andrikogiannopoulou London School of Economics Filippos Papakonstantinou Imperial College London August 26

C. Hierarchical mixture model In Figure C., we present the directed acyclic graph representation of the hierarchical version of our baseline mixture model described in Section 2 of the paper. In this graph, squares represent quantities that are fixed or observed, e.g., prior parameters and data, while circles represent unknown model parameters that need to be estimated. κ κ α Κ κ α λ Kα Λ Kα λ Λα Λ Λα λ K π κ Κ λ Λ κ Κ λ Λ p q r s α α α α β β β β K q π q μ α q V α q q μ β V β q κ h λ h q e i α i β i h i F t r it Figure C.: Representation of the hierarchical mixture model of fund returns as a directed acyclic graph. Squares represent quantities that are fixed or observed, e.g., prior parameters and data, while circles represent unknown model parameters that need to be estimated. Comparing the graph in this figure with that for the non-hierarchical mixture model in Figure in the paper, we see that the difference between the two versions is that the hierarchical one takes as given the prior distributions for the population parameters. First, this is necessary, since classical estimation would be intractable. Second, we generally use weak priors, and furthermore we perform a prior sensitivity analysis which shows that our posteriors are quite robust to varying the priors (see Section 8. in the paper for a brief summary and Section C.2 in this appendix for details). C-

C.2 Simulations Here, we present additional results on the simulations we perform in Section 3 of the paper. First, we present results relating to the first set of our simulations (see Table in the paper), in which we generate alphas from mixed distributions and compare our estimated proportions of skill types with those obtained from fund-level hypothesis tests, with and without the FDR correction. In Table C., we present the true percentiles of each simulated alpha distribution, as well as point and interval estimates the posterior mean and 9% Highest Posterior Density Interval (HPDI) of these percentiles using our methodology. These results show that our methodology is flexible enough to estimate well not only the proportions of skill types but also the entire alpha distribution even, e.g., in cases in which nonzero alphas are discrete or normal. Next, we present results relating to the second set of our simulations (see Figure 2 in the paper), in which we generate alphas from continuous distributions and compare our estimated distribution of alpha with that of the hierarchical normal model. In Table C.2, we present the true percentiles of each simulated alpha distribution, as well as point and interval estimates the posterior mean and 9% HPDI of these percentiles using our methodology. The results in this table show that our model is flexible enough to estimate reasonably well the entire alpha distribution in the case of both a normal distribution as well as a skewed and fat-tailed distribution without a point mass at zero. In Table C.3, we present the true percentiles of each simulated alpha distribution, as well as point and interval estimates the posterior mean and 9% HPDI of these percentiles using the hierarchical normal model. As expected, the results in this table show that the normal model can accurately estimate the distribution if alphas are drawn from a normal, but grossly mis-estimates it if alphas are drawn from a skewed and fat-tailed distribution. C-2

Table C.: Simulations with Mixed Distributions True and Estimated Percentiles of Fund Skill Distribution Results from simulations in which alphas (expressed as annualized percentages) are generated from mixed distributions with a point mass at zero and with nonzero alphas drawn from a discrete distribution (in Panel A), a normal distribution (in Panel B), and a log-normal distribution (in Panel C). The data generating processes (DGPs) within each panel differ in the proportions π, π, π + of funds with zero, negative, and positive alpha, respectively, and/or in the distance of nonzero alphas from zero. In Panel A, α π δ + π δ x + π + δ x +, with large nonzero alphas x = 3.2 and x + = 3.8 and unequal proportions π =.75, π =.23, π + =.2 (DGP D-), and with small nonzero alphas x =.2 and x + =.8 and equal proportions π =.34, π =.33, π + =.33 (DGP D-2). In Panel B, α π δ + π,+ f N ( α.45, σ 2 ) with large variance σ 2 = 72 and a large point mass π =.9 (DGP N-), and with small variance σ 2 = 7.2 and a smaller point mass π =.35 (DGP N-2). In Panel C, α π δ + π f ln N ( α µ, σ 2 ) + π + f ln N ( α µ, σ 2 ) with nonzero alphas far from zero i.e. µ = 2 and σ 2 =.2 (DGP L-), and close to zero i.e. µ = and σ 2 =.35 (DGP L-2), and with proportions π =.45, π =.28, π + =.27 in both cases. For each DGP, we report the true percentiles of the alpha distribution and their posterior mean and 9% HPDI estimated using our methodology. C-3 Panel A: Discrete nonzero alphas Percentiles.5 th st 5 th th 2 th 3 th 4 th 5 th 6 th 7 th 8 th 9 th 95 th 99 th 99.5 th DGP D-: Large alphas True -3.2-3.2-3.2-3.2-3.2........ 3.8 3.8 Posterior Mean -3.66-3.57-3.35-3.22-2.94........ 3.73 3.97 5% -3.78-3.69-3.42-3.26-3.4........ 3.5 3.78 95% -3.5-3.46-3.3-3.7-2.83........ 3.9 4.6 DGP D-2: Small alphas True -.2 -.2 -.2 -.2 -.2 -.2....8.8.8.8.8.8 Posterior Mean -.52 -.46 -.3 -.23 -.2 -.97....58.75.88.97 2.4 2.2 5% -.66 -.58 -.37 -.27 -.6 -.5....5.7.83.9 2.2 2.6 95% -.37 -.34 -.24 -.8 -.6 -.84....66.79.92 2.3 2.24 2.32 Panel B: Normal nonzero alphas DGP N-: Large variance True -5.48-2.8 -.43.......... 9.65 2.8 Posterior Mean -5.95 -.75 -.69.......... 9.3 2.5 5% -8.9-3.24-3........... 7.85.92 95% -3.97 -.43............2 4.39 DGP N-2: Small variance True -7.76-7.3-5.24-4.8-2.78 -.65 -.66.....39 2.42 4.5 5.25 Posterior Mean -8.2-7.5-5. -4. -2.8 -.6 -.75....6.28 2.26 4.32 5.25 5% -8.6-7.56-5.28-4.26-2.97 -.78 -.95.....6 2.5 3.98 4.76 95% -7.54-6.8-4.93-3.95-2.65 -.43....4.38.45 2.46 4.73 5.89 Panel C: Log-normal nonzero alphas DGP L-: Far from zero True -9.92-7.76-2.38-9.8-6.32..... 5.87 9.3.97 8.5 2.65 Posterior Mean -2.93-8.3-2.29-9.56-6.25..... 6. 9.46 2.8 8.8 2.8 5% -22.4-9.4-2.7-9.92-6.6..... 5.75 9.5.78 7.36 9.76 95% -9.87-7.56 -.87-9.25-5.94..... 6.4 9.77 2.63 9. 2.9 DGP L-2: Close to zero True -9.8-8.42-5.23-3.84-2.5......95 3.58 5. 8.67.95 Posterior Mean -.9-8.59-5.7-3.75-2.5..... 2.5 3.68 5.6 8.78.5 5% -. -9.22-5.43-3.94-2.3......88 3.5 4.92 8.24 9.78 95% -9.45-8.6-4.95-3.57 -.97..... 2.23 3.86 5.39 9.34.3

Table C.2: Simulations with Continuous Distributions True and Estimated Percentiles of Skill Distribution from Our Model Results from simulations in which alphas (expressed as annualized percentages) are generated from continuous-distribution data generating processes (DGPs) and are estimated using our methodology. In Panel A we present results for alphas simulated from a normal distribution, and in Panel B we present results for alphas simulated from a negatively-skewed and fat-tailed distribution. Specifically, in Panel A, we present results for DGP C-, i.e., α f N ( µ, σ 2 ), with µ = 2.5 and σ 2 = 4. In Panel B, we present results for DGP C-2, i.e., α π N f N + π f + π + f +, with π N =., π =.8, π + =., f N (α) = f N (α,.), f (α) = f ln N ( α.,.5) for α <, and f + (α) = f ln N ( α.,.5) for α >. For each DGP, we report the true percentiles of the alpha distribution and their posterior mean and 9% HPDI estimated using our methodology. The 9% HPDI is the smallest interval such that the posterior probability that a parameter lies in it is.9. Panel A: Normal alphas Percentiles.5 th st 5 th th 2 th 3 th 4 th 5 th 6 th 7 th 8 th 9 th 95 th 99 th 99.5 th C-4 DGP C- True -7.56-6.95-5.7-5. -4.5-3.5-2.99-2.47 -.96 -.46 -.8..76.96 2.57 Posterior Mean -8.34-7.55-5.74-4.92-4.2-3.4-2.9-2.44 -.94 -.4 -.94..4.94 2.3 5% -9.2-8.7-5.93-5.6-4.4-3.52-3.2-2.56-2.7 -.59 -...86.73 2. 95% -7.72-7.3-5.56-4.78-3.88-3.28-2.79-2.32 -.78 -.26 -.78.56.2 2.7 2.65 Panel B: Negatively skewed and fat-tailed alphas DGP C-2 True -6.53-5.43-3.2-2.47 -.74 -.35 -. -.88 -.67 -.5 -.27.42. 2.75 3.42 Posterior Mean -6.3-5.2-3. -2.36 -.69 -.32 -.6 -.85 -.67 -.5 -.27.29.3 2.78 3.59 5% -6.97-5.67-3.29-2.48 -.76 -.39 -.2 -.92 -.74 -.56 -.36..9 2.45 3. 95% -5.69-4.78-2.92-2.24 -.6 -.25 -. -.79 -.62 -.43..57.3 3.7 4.23

Table C.3: Simulations with Continuous Distributions True and Estimated Percentiles of Skill Distribution from Hierarchical Normal Results from simulations in which alphas (expressed as annualized percentages) are generated from continuous-distribution data generating processes (DGPs) and are estimated using the hierarchical normal model. In Panel A we present results for alphas simulated from a normal distribution, and in Panel B we present results for alphas simulated from a negatively-skewed and fat-tailed distribution. Specifically, in Panel A, we present results for DGP C-, i.e., α f N ( µ, σ 2 ), with µ = 2.5 and σ 2 = 4. In Panel B, we present results for DGP C-2, i.e., α π N f N + π f + π + f +, with π N =., π =.8, π + =., f N (α) = f N (α,.), f (α) = f ln N ( α.,.5) for α <, and f + (α) = f ln N ( α.,.5) for α >. For each DGP, we report the true percentiles of the alpha distribution and their posterior mean and 9% HPDI estimated using the hierarchical normal model. The 9% HPDI is the smallest interval such that the posterior probability that a parameter lies in it is.9. Panel A: Normal alphas Percentiles.5 th st 5 th th 2 th 3 th 4 th 5 th 6 th 7 th 8 th 9 th 95 th 99 th 99.5 th C-5 DGP C- True -7.56-6.95-5.7-5. -4.5-3.5-2.99-2.47 -.96 -.46 -.8..76.96 2.57 Posterior Mean -7.6-7. -5.73-5. -4.2-3.48-2.94-2.43 -.92 -.38 -.75.4.87 2.23 2.73 5% -7.75-7.24-5.84-5. -4.2-3.56-3.2-2.5-2. -.46 -.83.4.74 2.7 2.56 95% -7.43-6.95-5.62-4.9-4.4-3.4-2.87-2.37 -.86 -.3 -.67.23.97 2.38 2.89 Panel B: Negatively skewed and fat-tailed alphas DGP C-2 True -6.53-5.43-3.2-2.47 -.74 -.35 -. -.88 -.67 -.5 -.27.42. 2.75 3.42 Posterior Mean -4.48-4.4-3.2-2.7-2. -.67 -.3 -.95 -.6 -.23.2.8.3 2.24 2.58 5% -4.63-4.27-3.3-2.79-2.7 -.73 -.35 -. -.66 -.29.4.73.22 2.3 2.46 95% -4.35-4.2-3. -2.62-2.4 -.6 -.24 -.89 -.55 -.7.27.89.4 2.36 2.7

C.3 Summary statistics Here, we present summary information for the funds in the two samples of actively managed open-end US equity funds that we use in our analyses in Sections 5 through 8 of the paper: the baseline sample of 3,497 funds and the restricted sample with reliable investment objective data for,865 funds. Table C.4: Summary Statistics of Fund Characteristics Summary statistics of fund characteristics for the two samples of actively managed open-end US equity funds used in the empirical analyses in Sections 5 through 8 of the paper. In Panel A, we present summary statistics for the baseline sample of 3,497 funds, and in Panel B for the restricted sample of,865 funds with reliable investment objective information; both samples span the period January 975 through December 2. Fund age is the number of years since the fund s establishment. Total net asset value (TNAV) is measured in millions of dollars. Expense ratio is defined as total annual management, administrative, and 2b- fees and expenses divided by year-end TNAV, and is expressed as a percentage. Turnover ratio is defined as the minimum of aggregate purchases and sales of securities divided by the average TNAV over the calendar year, and is expressed as a percentage. Fund inflows are defined as the net fund flows into the mutual fund over the calendar year, divided by the TNAV at the end of the previous calendar year, and they are expressed as a percentage; negative values indicate net outflows. The summary statistics reported are calculated across all fund-months in each sample. Panel A: Baseline sample Percentiles Mean Std.Dev. 5 th th 25 th 5 th 75 th 9 th 95 th Fund age 2.56 3.2 2 4 8 6 3 42 Total net asset value 942 3,992 4 37 43 55,747 3,634 Expense ratio.3%.%.27%.66%.95%.24%.58%.97% 2.24% Turnover ratio 96% 6% % 7% 34% 66% 6% 83% 249% Fund inflows 46% 59% 37% 27% 4%.4% 33% 24% 273% Panel B: Restricted sample Percentiles Mean Std.Dev. 5 th th 25 th 5 th 75 th 9 th 95 th Fund age 5.25 4.26 2 5 2 35 47 Total net asset value,245 4,84 7 4 53 22 765 2,43 4,928 Expense ratio.29%.96%.2%.65%.94%.22%.54%.95% 2.2% Turnover ratio 88% 9% % 7% 34% 66% 3% 77% 237% Fund inflows 37% 42% 36% 27% 4%.5% 27% % 27% Table C.5: Assignment of Funds to Investment Strategies The number and fraction of funds allocated to each investment objective Growth & Income, Growth, Aggressive Growth in the restricted sample of,865 funds with reliable investment objective information from the Thomson database. Investment Objective # of Funds %age of Funds Growth & Income 45 2.7% Growth,23 66.% Aggressive Growth 23 2.3% C-6

C.4 Fund fees Here, we present a plot of the empirical density of annual fund fees and expenses, expressed as a percent of total net asset value. This empirical density is constructed from the average (across the lifetime of each fund) annual fees and expenses for the 3,497 funds in our sample. Fees and expenses are reported annually in the CRSP Survivorship-Bias-Free US Mutual Fund Database, and they include annual management, administrative, and 2b- fees, and expenses. The empirical density of fees and expenses shown in Figure C.2 has a mode at.95%. The mean, median, and standard deviation of fees and expenses are.6%,.9%, and.68%, respectively..8.6.4.2 % % 2% 3% 4% 5% Fund Fees Figure C.2: Plot of the empirical density of annual fund fees and expenses (expressed as a percent of total net asset value) across 3,497 funds. C-7

C.5 The distribution of skill In this section, we present some additional figures and tables regarding the estimation of the baseline model presented in Section 2 of the paper using returns net of expenses for 3,497 funds. These results supplement those presented in Section 5 of the paper. First, we present results on the posterior distributions of the population mean and standard deviation of alpha and the factor loadings (in Table C.6), and of the population correlations between factor loadings (in Table C.7). We present these results conditional on K = 2, K + =, rather than presenting tables for each of the 6 possible models. Table C.6: Population Mean and Standard Deviation of Alpha and Factor Loadings Results on the posterior distributions of the population mean and standard deviation of annualized alpha (expressed as a percent) and the factor loadings, estimated using our baseline model presented in Section 2 with returns net of expenses, conditional on the model with the highest posterior probability, i.e., K = 2 and K + =. The 95% HPDI is the smallest interval such that the posterior probability that a parameter lies in it is.95. NSE stands for autocorrelation-adjusted numerical standard errors for the posterior mean estimate of each parameter. The population mean and variance of alpha for the zero-alpha funds is constrained to equal zero. Means Standard Deviations Mean Median Std.Dev. 95% HPDI NSE Mean Median Std.Dev. 95% HPDI NSE α [, ] [, ] α...6 [.39,.75]..66.64.9 [.36,.6]. α 2 2..5.68 [ 7.39,.93].7 3.4 2.6 3.5 [.8, 9.9].8 α +.4..3 [.54,.77].2.29.22.37 [.77, 2.8].2 β M.95.95. [.94,.96]..2.2. [.2,.22]. β SMB.9.9. [.8,.2]..3.3. [.3,.3]. β HML.2.2. [.,.3]..34.34. [.33,.35]. β UMD... [.,.].... [.,.]. Table C.7: Population Correlation Matrix Means and standard deviations (in parentheses) of the posteriors of population correlations between the factor loadings, for our baseline model presented in Section 2 with returns net of expenses. β SMB β HML β UMD β M.3.47. (.) (.) (.) β SMB.2.8 (.) (.) β HML.44 (.) C-8

As explained in Section 2.4 of the paper, to estimate our model we need to derive the joint posterior distribution of the model parameters conditional on the data. Since this joint posterior cannot be calculated analytically, we obtain information about it by drawing from it using a Markov chain Monte Carlo (MCMC) algorithm. Section 2.4 of the paper and Section B of the paper s appendix provide details about the MCMC algorithm we employ. Using this algorithm, we make 5 million draws from which we discard the first % as burn-in and retain every 5 th after that to mitigate serial correlation. These draws form a Markov chain with stationary distribution equal to the joint posterior. In Figure C.3, we present trace plots (plots of the retained draws against the iteration number) of the proportions of funds with zero, negative, and positive alpha. In Figure C.4, we present trace plots of the population means and variances of the distributions of alpha for negativeand positive-alpha funds, and of the factor loadings, conditional on the highest-posterior-probability model, i.e., K = 2, K + =. These plots indicate no convergence problems. We note that, in mixture models, the posterior distribution of parameters is invariant to permutations of the components labels. As a result, inference regarding parameters that are not invariant to component relabeling in the MCMC draws is problematic. We circumvent this issue in two ways. First, we focus on inferences that are invariant to label switching, i.e., inference on: the numbers of components K, K + ; the population proportions π, π, π + ; the population mean µ β and variance V β of the loadings; the population shape κ h and scale λ h of the error distribution; the individual-level alpha α i, loadings β i, and error precisions h i ; the individual-level latent allocations to groups e i, k K e i,k, k K + e+ i,k ; and the density of alpha and the loadings. Second, to conduct inferences that are not invariant to label switching, i.e., on component-specific probabilities { π q } {( q k and distribution parameters µ α,k, V α,k)} q, we retrospectively relabel components in the MCMC draws so the estimated marginal posteriors of parameters of interest are close to unimodality (see Stephens, 997). This achieves a unique labeling throughout the draws, so we obtain point estimates through averaging over the draws. We see in Figure C.4 that we have successfully removed the label-switching behavior from the means and variances. We do not impose artificial identifiability restrictions through the priors, because they do not guarantee a unique labeling and can produce biased estimates (see Celeux, 998). Also, see Jasra, Holmes and Stephens (25) for a review of the various methods that have been proposed to solve the label switching problem. C-9

Proportion π of zero-alpha funds.75.5.25 Proportion π of negative-alpha funds.75.5.25 Proportion π + of positive-alpha funds.75.5.25 Figure C.3: Trace plots of the MCMC draws for the population proportions of zero-alpha funds (in the top panel, using black dots), negative-alpha funds (in the middle panel, using red dots), and positive-alpha funds (in the bottom panel, using blue dots). 4 µ α,, V α, 6 4 µ α,2, V α,2 6 4 µ + α,,v + α, 6 6 4 6 4 6 4 8 8 8 2 2 2 2 2 2 µ βm,v βm.6.22 µ βsmb,v βsmb.4.4 µ βhml,v βhml.8.2 µ βumd,v βumd.6.95.55.8.2.2.6..4.9.5.4.2.85.4.45..2.2...8.4..8.4..2.8 Figure C.4: Trace plots of the MCMC draws for the population means (purple dots toward the top of each panel, with values associated with the left vertical axes) and the population variances (green dots toward the bottom of each panel, with values associated with the right vertical axes) of alpha and the factor loadings, conditional on the model with the highest posterior probability, i.e., K = 2, K + =. The mean and variance of alpha are those of the underlying normal distribution. C-

C.6 Robust skewness, tail weight, and distance for standard distributions In this section, we present robust quantile-based measures of skewness and tail weight for various well-known distributions, to provide context for the measures we calculate for the alpha distribution we estimate in Section 5 of the paper. We also present distance measures between the standard normal and various well-known distributions, again to provide context for the distance we calculate between the alpha distribution estimated from our model and from the hierarchical normal model in Section 5 of the paper. The robust measure of skewness for our estimated alpha distribution is.2, its left tail weight is.34, and its right tail weight is.27 (all quoted in excess of the values corresponding to the normal; see Table 5 in the paper). In Table C.8, we see that the robust skewness is similar (in absolute value) to that of a χ 2 distribution that has between 3 and 5 degrees of freedom, at.22 and.7 respectively, and the left and right tail weight measures are similar to those of a t (2) and a t (3) distribution, at.36 and.28 respectively. The Hellinger distance between our estimated alpha distribution and the one estimated from the normal model is H 2 =. (see Table 6 in the paper). As we can see in Table 5, this is close to i) the Hellinger distance (H 2 =.) between two normals that have the same mean but one has twice the standard deviation of the other, ii) the Hellinger distance (H 2 =.8) between the standard normal N (, ) and a χ 2 (3) distribution that is scaled to have the same mean and variance as the standard normal, and iii) the Hellinger distance (H 2 =.) between the standard normal N (, ) and the t () distribution. The Wasserstein distance between our estimated distribution and the one estimated from the normal model is W =.22 (see Table 6 in the paper). From 5, we can also see that this distance is close to i) the Wasserstein distance (W =.9) between a standard normal and a normal that has the same mean but 25% smaller/greater standard deviation, ii) the Wasserstein distance (W =.2) between the standard normal N (, ) and a χ 2 (4) distribution that is scaled to have the same mean and variance as the standard normal, and iii) the Wasserstein distance (W =.25) between the standard normal N (, ) and the t (3) distribution. C-

Table C.8: Robust Measures of Skewness and Tail Weight for the normal, χ 2, and t distributions Robust quantile-based measures of skewness and tail weight that rely on 99% of the range of each distribution, for the normal distribution (in Panel A), and for the χ 2 (in Panel B) and t distributions (in Panel C) with various degrees of freedom. The measure of skewness is as in Groeneveld and Meeden (984) S := [Q( p)+q( p) 2Q(.5)]/[Q( p) Q( p)] and the measures of left and right tail weight are as in Brys, Hubert, and Struyf (26) LTW := [ ( ) ] Q p 2 +Q( p [ ( ) 2 ) 2Q(.25) / Q p Q( p 2 ) ] and ( ) +q ] / [ ( ) +q RTW := [ Q 2 +Q( q 2 ) 2Q(.75) Q 2 Q( q 2 ) ] where Q(x) is the x th quantile of the distribution, and we use p =.5 and q =.995. The measures are reported as deviations from the corresponding values for the normal distribution ( for the skewness and.52 for the left and right tail weight measures). Panel A: N ( µ, σ 2) distribution Quantile Left Right Skewness Tail Weight Tail Weight (S) (LTW) (RTW) µ, σ 2 2 Panel B: χ 2 (k) distributions Quantile Left Right Skewness Tail Weight Tail Weight (S) (LTW) (RTW) k = 3.64.5.9 k = 4.57.4.7 k = 5.52.34.6 k =.37.2.2 k = 2.27.3.9 k = 3.22..8 k = 5.7.8.6 Panel C: t (k) distributions Quantile Left Right Skewness Tail Weight Tail Weight (S) (LTW) (RTW) k =.46.46 k = 2.36.36 k = 3.28.28 k = 4.22.22 k = 5.8.8 k =.9.9 k = 5.2.2 C-2

Table C.9: Measures of Distance Between the Standard Normal and Other Distributions Distance measures between the standard normal distribution and various normal, χ 2, and t distributions. The Hellinger distance between densities f X, f Y is H 2 := f X (s) f Y (s)ds, and takes values in [, ]. The Wasserstein distance between densities f X, f Y is W := inf f XY E [ X Y ] where f XY is any joint density with marginals f X, f Y, and takes values in [, + ). For the Wasserstein distance, we present values that rely on 99% of the range of the distribution, i.e., we exclude the extreme tails to make the distance measure robust. In Panel A, we present the distances between N (, ) and normal distributions with the same mean but different standard deviation, as indicated in each row of the panel. In Panel B, we present the distances between N (, ) and χ 2 distributions with various degrees of freedom as indicated in each row of the panel; these distributions are scaled to have the same mean () and variance () as the standard normal. In Panel C, we present the distances between N (, ) and t distributions with various degrees of freedom as indicated in each row of the panel. Panel A: Distance between N (, ) and σ N (, ) distributions Hellinger Wasserstein Distance ( Distance H 2 ) (W ) σ =.25.3.58 σ =.5..39 σ =.75.2.9 σ =.25.2.9 σ =.5.4.39 σ =.75.7.58 σ = 2...78 Panel B: Distance between N (, ) and [ χ 2 (k) k ] / 2k distributions Hellinger Wasserstein Distance ( Distance H 2 ) (W ) k = 3.8.24 k = 4.6.2 k = 5.5.9 k =.2.3 k = 2..9 k = 3..8 k = 5..6 Panel C: Distance between N (, ) and t (k) distributions Hellinger Wasserstein Distance ( Distance H 2 ) (W ) k =..89 k = 2.5.45 k = 3.3.25 k = 4.2.7 k = 5..3 k =..6 k = 5.. C-3

C.7 Portfolio performance In this section, we present additional results regarding the out-of-sample performance of portfolios that select top-performing funds using i) the FDR methodology, ii) a hierarchical model in which fund alphas are drawn from one normal, iii) a hierarchical model in which fund alphas are drawn from two normals, and iv) our estimation methodology. Our baseline portfolio formation rule described in detail in Section 6. of the paper is the following: At the beginning of each month in the period 98 2, we use the preceding 6 months of fund returns to estimate the 4-factor model using each methodology, and we form and hold until the end of the month a portfolio of funds with high estimated probability of having a positive alpha; if all funds have a low probability of having a positive alpha, we select funds whose posterior mean alpha (for the Bayesian methodologies) or OLS t-statistic (for the FDR methodology) is in the top % among all funds in the data set for the preceding 5 years. In Table C., we present results on portfolio performance under alternative portfolio formation rules: portfolios formed using a 36-month (instead of a 6-month) rolling estimation window, portfolios that are left empty and portfolios that keep the top 2% (instead of the top %) of funds in months in which all funds have a low probability of having a positive alpha, and portfolios that always keep the top % of funds sorted by their posterior mean alpha. In particular, for each portfolio we construct, we use its monthly portfolio returns for the period 98 2 to estimate its annualized OLS 4-factor alpha, ˆα, and the associated ˆα t-statistic and residual standard deviation, its information ratio, the mean and standard deviation of its return in excess of the risk-free return, and its Sharpe ratio. We see that, as with our baseline portfolio formation rule used in Section 6. in the paper, for portfolios constructed using these alternative formation rules, those based on our methodology yield higher performance than those based on the other methodologies. This is true not only in terms of estimated alpha, but also in terms of the information ratio and even in terms of the Sharpe ratio. The exception to this finding is that the conservative portfolio based on our methodology yields a lower Sharpe ratio than those based on the other methodologies (see Panel B of the table). However, we note that, by construction, the conservative portfolios are not active in all months, and indeed conservative portfolios based on different methodologies are active in different months, therefore their performance is not directly comparable. C-4

Furthermore, our methodology estimates alpha, therefore it is not surprising that its advantage for the Sharpe ratio is smaller than it is for the estimated alpha or for the information ratio. In Table C., we present results on portfolio performance using the measures described above for each of the two halves of our sample period (98 995 and 995 2). We find that our portfolio exhibits superior performance in both subperiods. In Table C.2, we present results on the performance of quantile-based portfolios. That is, at the beginning of each month in the period 98 2, we use the preceding 6 months of fund returns to estimate the 4-factor model using each methodology the hierarchical model in which fund alphas are drawn from one normal or from two normals, and our methodology then we sort funds into ten quantiles based on the posterior mean alpha, and we hold these quantile-based portfolios until the end of the month. As before, for each portfolio we construct, we use its monthly portfolio returns for the period 98 2 to estimate its annualized OLS 4-factor alpha, ˆα, and the associated ˆα t-statistic and residual standard deviation, its information ratio, the mean and standard deviation of its return in excess of the risk-free return, and its Sharpe ratio. We see that the slope of returns going from the bottom quantile to the top quantile is steeper for the portfolios constructed using our methodology than for those constructed using the alternatives. In particular, as we see in the column labeled Q Q, the portfolio that buys the funds in the top quantile and sells the funds in the bottom quantile has ˆα = 3.23% per year for the hierarchical model with one normal, ˆα = 3.6% for the hierarchical model with two normals, and ˆα = 4.29% for our methodology. The difference in ˆα between the portfolio constructed using our methodology and the one constructed using the hierarchical model with one normal (two normals) is.6% (.68%) and is statistically significant at the % level, with a t-statistic of 4.54 (3.59). These results show that our methodology can better identify funds at the tails (both the right and the left tail) of the skill distribution, which is consistent with our theoretical argument that our more flexible semi-parametric model can better capture the tails of the distribution. C-5

Table C.: Out-of-sample Portfolio Performance Alternative Portfolio Construction Rules C-6 Out-of-sample performance measures for portfolios that use alternative portfolio construction rules to select funds using the FDR methodology, hierarchical models in which fund αs are drawn from one normal or from two normals, and our estimation methodology. At the beginning of each month in the period 98 2, we use the preceding 36 or 6 months of fund returns to estimate the 4-factor model using each methodology, and we form and hold until the end of the month equal-weighted portfolios of funds that are estimated to have high performance. In Panels A and B, we construct portfolios using 6-month rolling estimation windows, and we select funds with high estimated probability of having a positive α. During months in which all funds have a low probability of having a positive α, in Panel A we select funds whose posterior mean α or OLS α t-statistic is in the top 2% among all funds in the ranking period (the aggressive portfolio), while in Panel B, we leave the portfolio empty (the conservative portfolio). In Panel C, we present results on the aggressive portfolio constructed using 36-month rolling estimation periods and keeping the top % instead of the top 2% during months in which all funds have a low probability of having a positive α. In Panel D, we use a 6-month rolling estimation window, and in all months we select funds whose posterior mean α is in the top % among all funds in the ranking period. For each portfolio we construct, we use its monthly returns from 98 through 2 to estimate its annualized OLS 4-factor alpha ˆα and residual standard deviation ˆσ ε (both expressed as percents), ˆα t-statistic, Information Ratio ( ˆα/ ˆσ ε ), mean and standard deviation (both expressed as percents) of its return in excess of the risk-free return, and its Sharpe Ratio (mean/std. dev. of excess return). Panel A: Aggressive Portfolio with top 2% Panel B: Conservative Portfolio FDR Normal 2 Normals Our Model FDR Normal 2 Normals Our Model ˆα.63.62.35 2.38 ˆα.86 2..68 2.84 ˆα t-statistic 2.3.87.55 3.27 ˆα t-statistic 2.22 2..77 3.4 ˆσ ε 4.4 4.42 4.6 3.93 ˆσ ε 3.27 4.58 4.75 3.8 Information Ratio.39.36.29.6 Information Ratio.57.44.35.75 Mean Return 7.27 7.42 7.25 8.36 Mean Return 9.75 7.79 7.83 7.65 Std. dev. Return 6.69 3.64 3.8 5. Std. dev. Return 5.8 3.95 4.9 6.3 Sharpe Ratio.44.54.53.56 Sharpe Ratio.62.56.55.47 Panel C: Aggressive Portfolio with 36-month window Panel D: Alpha-sorted Portfolio FDR Normal 2 Normals Our Model Normal 2 Normals Our Model ˆα.24.36.86 2. ˆα.45.69 2.35 ˆα t-statistic.58.68.89 2.2 ˆα t-statistic 2. 2.9 2.87 ˆσ ε 4.22 4.34 6.9 6.4 ˆσ ε 3.85 4.8 4.59 Information Ratio.29.3.3.33 Information Ratio.38.4.5 Mean Return 7.49 7.6 8.7 9.7 Mean Return 7.9 7.62 8.55 Std. dev. Return 7.5 4.57 6.46 7.34 Std. dev. Return 4.6 4.93 5.78 Sharpe Ratio.44.48.5.52 Sharpe Ratio.49.5.54

Table C.: Out-of-sample Portfolio Performance Sub-samples Out-of-sample performance measures for two non-overlapping sub-samples, for portfolios that select funds using the FDR methodology, hierarchical models in which fund αs are drawn from one normal or from two normals, and our estimation methodology. At the beginning of each month in the period 98 2, we use the preceding 6 months of fund returns to estimate the 4-factor model using each methodology, and we form and hold until the end of the month equal-weighted portfolios of funds that are estimated to have high probability of having a positive α (see Section 6. of the paper for more details). During months in which all funds have a low probability of having a positive α, we select funds whose posterior mean α (for the three hierarchical methodologies) or OLS t-statistic (for the FDR methodology) is in the top % among all funds in the data set for the preceding 6 months. For each portfolio we construct, we use its monthly portfolio returns from 98 to 995 (in Panel A) and from 995 to 2 (in Panel B) to estimate its annualized OLS 4-factor alpha ˆα and residual standard deviation ˆσ ε (both expressed as percents), ˆα t-statistic, Information Ratio ( ˆα/ ˆσ ε ), mean and standard deviation (both expressed as percents) of its return in excess of the risk-free return, and its Sharpe Ratio (mean/std. dev. of excess return). Panel A: st half Sub-sample Panel B: 2 nd half Sub-sample FDR Normal 2 Normals Our Model FDR Normal 2 Normals Our Model C-7 ˆα 2.45 2.22.8 3.36 ˆα.54.92.62 2.24 ˆα t-statistic 2.73 2.74 2.22 3.5 ˆα t-statistic.27.7.33 2.28 ˆσ ε 3.5 3.3 3.7 3.73 ˆσ ε 4.99 4.2 4.6 3.74 Information Ratio.8.73.59.9 Information Ratio.3.46.35.6 Mean Return 8.52 8.49 8.7.59 Mean Return 7.4 6.5 6.47 7.2 Std. dev. Return 5.9 4.83 4.9 5.47 Std. dev. Return 7.48 2.4 2.82 4.46 Sharpe Ratio.54.57.54.68 Sharpe Ratio.4.52.5.5

Table C.2: Out-of-sample Portfolio Performance Quantile-based Portfolios Out-of-sample performance measures for portfolios that select funds using hierarchical models in which fund αs are drawn from one normal (in Panel A) or from two normals (in Panel B), and our estimation methodology (in Panel C). At the beginning of each month in the period 98 2, we use the preceding 6 months of fund returns to estimate the 4-factor model using each methodology, we sort funds into ten quantiles (Q through Q) based on the posterior mean α, and we hold these quantile-based portfolios until the end of the month. We also form the portfolio (labeled Q Q ) which buys the funds belonging to the top quantile and sells the funds belonging to the bottom quantile. For each portfolio, we use its monthly returns from 98 through 2 to estimate its annualized OLS 4-factor alpha ˆα and residual standard deviation ˆσ ε (both expressed as percents), ˆα t-statistic, Information Ratio ( ˆα/ ˆσ ε ), mean and standard deviation (both expressed as percents) of its return in excess of the risk-free return, and its Sharpe Ratio (mean/std. dev. of excess return). Panel A: Normal Quantiles Q Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q Q Q ˆα 2.86 2.4.94.86.8.9.94.26.3.37 3.23 ˆα t-statistic 6.2 5.47 2.2.57.87.86.86.52.6.76 5.94 ˆσ ε 2.48 2.2 2.45 2.94 3.3 3.7 2.6 2.5 2.62 2.69 2.84 Information Ratio.5.97.38.29.36.38.36...4.4 Mean Return 3.33 4.8 5.28 5.68 5.25 5.33 5.64 6.6 6.46 6.42 3. Std dev Return 4.96 5.32 5.5 5.7 5.69 5.77 6.9 6.24 6.26 5.29 2.9 Sharpe Ratio.22.27.34.36.33.34.35.4.4.42.6 Panel B: 2 Normals Quantiles Q Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q Q Q ˆα 3.9.84.2.9..23.49.9.24.52 3.6 ˆα t-statistic 6.35 4.63 2.24 2.29.64 2.9.96.7.49.2 6.42 ˆσ ε 2.56 2.22 2.39 2.8 3.5 2.93 2.73 2.49 2.6 2.8 3. Information Ratio.2.83.43.43.32.42.8.3.9.9.2 Mean Return 3.4 4.45 5.22 5.29 5.47 5.26 6.7 6.27 6.42 6.56 3.42 Std dev Return 5. 5.3 5.4 5.6 5.59 5.85 6.6 6.24 6.24 5.37 3. Sharpe Ratio.2.29.34.34.35.33.38.39.4.43. Panel C: Our Model Quantiles Q Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q Q Q ˆα 3.48.62.65.5.85.6.78.6.4.8 4.29 ˆα t-statistic 7.7 3.9.22 2.85.63.79.66..7.48 6.89 ˆσ ε 2.63 2.23 2.82 2.79 2.68 2.84 2.46 2.63 2.68 3.5 3.4 Information Ratio.32.72.23.54.32.37.32.2..26.26 Mean Return 2.83 4.68 5.69 5.4 5.48 5.37 5.83 6.25 6.9 6.77 3.96 Std dev Return 5.36 5.42 5.5 5.53 5.43 5.6 6. 6.5 6.2 5.73 3.53 Sharpe Ratio.8.3.37.32.36.34.36.39.38.43.2 C-8

Observing Tables C. and C. above, it is interesting that the performance of the portfolio constructed using the hierarchical model with two normals is in some cases worse than that of the portfolio constructed using the hierarchical model with one normal. This may be due to noise, since funds are allocated to portfolios using only a few years of data, but it could also be explained as follows. The hierarchical model with two normals attempts to estimate the fat tails of the alpha distribution and is therefore more aggressive about placing funds in the right tail, but due to its limited flexibility it may be unable to do so accurately. This intuition is consistent with the evidence in Figure C.5, which presents representative Quantile-Quantile plots of the posterior alphas from the hierarchical model with one normal (in Panel a) and with two normals (in Panel b) versus the posterior alphas from our model, for one of the estimation periods used in the construction of the portfolios. We see that, while the model with two normals does a better job than the model with one normal in estimating the largest alpha at more than 3% annualized, it also overestimates (relative to our model) a number of large alphas, and therefore would over-aggressively include them in the portfolio of the best-performing funds. Posterior α from -normal model 4 3 2 2 3 4 5 5 4 3 2 2 3 4 Posterior α from our model Posterior α from 2-normal model 4 3 2 2 3 4 5 5 4 3 2 2 3 4 Posterior α from our model (a) -normal model vs. our model. (b) 2-normal model vs. our model Figure C.5: Representative Quantile-Quantile plots of posterior mean alphas estimated from the hierarchical model with one normal (in Panel a) and with two normals (in Panel b) versus posterior mean alphas estimated using our methodology, for one of the 6-month periods used in the rolling estimation employed to construct the portfolios in Section 6. of the paper. The blue cross marks plot the quantiles, and the solid red line plots the 45 line. C-9

C.8 Posterior predictive densities In this section, we explain in detail how we make draws from the posterior predictive densities of benchmark portfolio returns and of fund returns. These draws are necessary for our calculation of optimal portfolios in Section 6.2 of the paper. The posterior predictive density p ( r i,t+,f T+ r,f ) is p ( ri,t+,f T+,χ i,χ F r, F ) dχ i χ F, where χ i and χ F denote the parameters of the distribution of r i and F, respectively, and where r and F collect all fund and benchmark portfolio returns, respectively. Using simple rules of probability, we can rewrite p ( r i,t+, F T+, χ i, χ F r, F ) as p ( r i,t+, F T+ χ i, χ F ) times p (χ i, χ F r, F) and p ( r i,t+, F T+ χi, χ F ) as p ( ri,t+ F T+, χ i ) p( FT+ χ F ), while p (χ i, χ F r, F) is proportional to p (χ i r, F ) p (χ F F ). That is, p ( r i,t +, F T + r, F ) p ( r i,t + F T +, χ i ) p ( FT + χ F ) p (χi r, F) p ( χ F F ) dχi χ F. Thus, to make draws from the posterior predictive density p ( r i,t+, F T+ r,f ), we make draws from p(χ F F ) and p(χ i r,f ), which we use to make draws from p ( F T + χ F ) and subsequently from p ( r i,t + F T +, χ i ). To make draws from p (χi r, F ), we work as in Section 2.4 of the paper, and to make draws from p ( r i,t + F T +, χ i ) we use the linear factor model in Equation of the paper. Below, we describe how we make draws from p (χ F F ) and from p ( F T + χ F ). For the factor returns, we assume that they are i.i.d. normal, that is F t µ F, F N (µ F, F ), and that the distribution parameters (µ F, F ) follow the conjugate Normal-inverse-Wishart prior given by ) µf µ F, F, κ F, ν F, F NIW (µ F, κ F, ν F, F, i.e., ( µf µ F, κ F, F N µ F, ) F κ F F ν F, F W ( ν F, ) F, where µ F, κ F, ν F, and F are prior parameters. In particular, using the Jeffrey s prior p (µ F, F ) F k F + 2 (with k F the number of factors) and observing data F :={F t } t= T, the posterior of (µ F, F ) ( is Normal-inverse-Wishart µ F, F F NIW ˆµ F, T, T, T ˆ F ), i.e., ( µ F F, F N ˆµ F, ) T F ) ) F (T F W, (T ˆ F, C-2

where ˆµ F := T ˆ F := T T t= F t T ( ) ( ) Ft ˆµ F Ft ˆµ F. t= Thus, to generate m =,..., M draws for the benchmark portfolio returns from the posterior predictive density, we generate draw (m) F from F above, and we use both to generate draw F (m) T + from N ( F above, we use this to generate draw µ(m) F from µ F F, F ) µ (m), (m). To generate m =,..., M draws for fund i s returns from the posterior predictive density, first we randomly pick m =,..., M draws from our MCMC draws for α i, β i, and h i, whose joint distribution converges to their joint posterior distribution. Then, we generate m =,..., M draws ( ) ) ε (m) i,t + (, N h (m) i. Finally, we combine draws α (m) i, β (m) i, ε (m) (m) i,t + with draw F T + for the benchmark portfolio returns whose generation is described above, and substitute them in the linear factor ( ) model equation to calculate the draw r (m) i,t + = α(m) i + F (m) T + β (m) i + ε (m) i,t +. F F C-2

C.9 The distribution of skill by fund investment objective In this section, we present additional results regarding the estimation of the K = 2, K + = model separately for funds classified to each of the three investment objectives (Growth & Income, Growth, and Aggressive Growth); see Section 7. of the paper for details. In particular, in Table C.3, we present the percentiles of the estimated distributions of alpha and the factor loadings. Table C.3: Percentiles of Estimated Distributions By Investment Objective Percentiles of the estimated population distributions of annualized alpha (expressed as a percent) and factor loadings, estimated with returns net of expenses using the K = 2, K + = model separately for funds classified to the three investment objective categories: Growth & Income (Panel A), Growth (Panel B), and Aggressive Growth (Panel C). Panel A: Growth & Income Objective Percentiles.5 th st 5 th th 2 th 3 th 4 th 5 th 6 th 7 th 8 th 9 th 95 th 99 th 99.5 th α -3.75-3.5-2.4 -.66 -.34 -.4 -.96 -.8 -.63 -.4...47.32.7 β M.29.34.47.55.63.7.75.8.85.9.97.6.3.27.3 β SMB -.46 -.42 -.3 -.24 -.7 -.2 -.7 -.3..6..8.24.36.4 β HML -.34 -.29 -.6 -.9 -..6..6.2.26.32.4.47.6.65 β UMD -.25 -.23 -.7 -.4 -. -.7 -.4 -.2..2.5.9.2.8.2 Panel B: Growth Objective Percentiles.5 th st 5 th th 2 th 3 th 4 th 5 th 6 th 7 th 8 th 9 th 95 th 99 th 99.5 th α -4.9-4. -2.56 -.99 -.45 -.2 -.87 -.64 -.4....48 2.28 3.62 β M.53.58.69.75.83.88.93.97.2.6..9.25.36.4 β SMB -.58 -.5 -.29 -.8 -.5.4.3.2.28.36.45.59.69.9.98 β HML -.9 -.8 -.56 -.43 -.27 -.5 -.6.3.2.22.34.5.63.87.97 β UMD -.28 -.25 -.8 -.4 -.8 -.5 -.2..4.7..6.2.28.3 Panel C: Aggressive Growth Objective Percentiles.5 th st 5 th th 2 th 3 th 4 th 5 th 6 th 7 th 8 th 9 th 95 th 99 th 99.5 th α -7.8-2.26-4.35-2.36 -.96 -.36.....3.2.99 4.63 6.25 β M.65.69.8.86.93.98.3.7..5.2.27.33.44.48 β SMB -.28 -.2 -.2.8.2.29.37.43.5.58.67.79.89.7.4 β HML -.2 -.3 -.79 -.66 -.5 -.39 -.29 -.2 -. -...26.39.64.72 β UMD -.3 -.27 -.7 -.2 -.6 -..3.6..4.9.25.3.4.43 C-22

C. The prevalence of short-term skill and its evolution Here, we present additional tables pertaining to our analysis of short-term skill and its evolution over time (see Section 7.2 of the paper) using returns net of expenses for 3,497 funds. Table C.4 presents the evolution over time of the posterior means of the population proportions of zero-, negative-, and positive-alpha funds, while Table C.5 presents the evolution over time of the percentiles of the estimated distribution of alpha. Tables C.4 and C.5 correspond to Figures 9a and 9b of the paper, respectively. Table C.4: Proportions of Fund Types Short-term Skill Evolution over time of posterior means of population proportions of zero-, negative-, and positive-alpha funds, in a model with short-term skill. Posterior means are estimated at the end of each year using data from the preceding 6 months. All estimations use returns net of expenses in the K = 2, K + = model with two and one components for the alpha distribution of negative-alpha and positive-alpha funds, respectively. π π π + 975 979.29.58.3 976 98.4.29.32 977 98.29.42.29 978 982.37.22.4 979 983.49.5.36 98 984.33.9.48 98 985.4.3.3 982 986.34.7.48 983 987.46.8.37 984 988.43.26.3 985 989.39.34.27 986 99.54.4.32 987 99.26.55.9 988 992.23.58.9 989 993.49.32.8 99 994.42.36.23 99 995.2.73.7 992 996.22.7.8 993 997.24.65. 994 998.6.8.2 995 999.5.9.4 996 2.43.3.27 997 2.4.39.2 998 22.49.37.4 999 23.5.33.7 2 24.35.55.9 2 25.9.88.3 22 26.4.95. 23 27.7.73. 24 28.7.7.2 25 29.22.57.2 26 2.7.72. 27 2.22.62.6 C-23

Table C.5: Percentiles of Estimated Distribution of Alpha Short-term Skill Evolution over time of various percentiles of the estimated distribution of annualized alpha (expressed as a percent), in a model with short-term skill. The distributions are estimated at the end of each year using data from the preceding 6 months. All estimations use returns net of expenses in the K = 2, K + = model with two components for the alpha distribution of negative-alpha funds and one component for that of positive-alpha funds. Percentiles 5 th th 2 th 3 th 4 th 5 th 6 th 7 th 8 th 9 th 95 th 975 979-2.59-2.7 -.55 -.22 -.96 -.7....92.8 976 98 -.7 -.7 -.63.....34.74.2.66 977 98-2.7 -.64 -.8 -.87 -.49....4 2.29 3.7 978 982-3.3 -.86 -.59....42.5.65 2.65 3.7 979 983-3.7 -.49......64.3 2.45 3.78 98 984-3.97-2.34.....74.2.79 2.8 3.89 98 985-2.5 -.34 -.49......5 2.8 3.7 982 986-4.58-2.4.....96.37.83 2.54 3.25 983 987-4.64-2.43......2.83 2.68 3.47 984 988-4.4-2.2 -.67.....49.35 2.42 3.5 985 989-3.67-2. -.9 -.32.....24 2.35 3.39 986 99-3.27 -.24......42.8 2.3 3.5 987 99-2.84 -.86 -.8 -.68 -.42 -.2....73 2.76 988 992-2.92 -.84 -. -.62 -.38 -.2....45 2.5 989 993-2.54 -.47 -.63 -.8......36 2.44 99 994-3.4-2.2 -.94 -.39.....45.42 2.4 99 995-3.75-2.69 -.76 -.28 -.95 -.7 -.48 -.25...33 992 996-3.93-2.8 -.84 -.33 -.98 -.7 -.46....3 993 997-4.39-3.3-2.3 -.75 -.33 -.98 -.6...2.53 994 998-4.29-3.5-2.73-2.26 -.9 -.62 -.35 -.6 -.6.. 995 999-3.47-3.5-2.6-2.32-2. -.9 -.73 -.54 -.32 -.88. 996 2-2.85-2.4 -.2......68.2.67 997 2-3.74-2.92-2.5 -.43.....5.48.76 998 22-3.6-2.26 -.37 -.8......78 2.2 999 23-2.36 -.75 -. -.59......56 2.44 2 24-3.9-2.52 -.85 -.44 -. -.74.....88 2 25-5. -4. -3.4-2.48-2.8 -.75 -.46 -.9 -.88.. 22 26-4.45-3.59-2.77-2.3 -.95 -.68 -.43 -.2 -.97 -.68. 23 27-2.86-2.38 -.9 -.59 -.36 -.5 -.94 -.65.. 2.25 24 28-2. -.74 -.44 -.25 -. -.95 -.8 -.54..3.85 25 29-2.59-2.3 -.48 -.4 -.88 -.6...29.9.3 26 2-2.64-2.22 -.8 -.52 -.3 -. -.9 -.6..3.64 27 2-2.62-2.2 -.77 -.49 -.25 -.2 -.68...2.2 C-24

C. Fund flow analysis In this section, we present additional results on our analysis in Section 7.4 of the paper, for the relation between fund flows and past fund performance as well as the relation between fund flows and subsequent fund performance. In Table C.6 (C.7), we present the average past (future) performance across all funds that belong to each flow quintile for each 5-year non-overlapping period in our sample. These tables are similar, respectively, to Panels A and B of Table 6, which presents results averaged across all 5-year non-overlapping periods in our sample. They show the same effects as those shown in Table 6 and discussed in detail in Section 7.4 of the paper. In Table C.8, we analyze these effects in a regression framework. In Panel A of the table, we examine the relation between fund flows and past performance using the specification F q y = α + α Perf q y 5,y + εq y, where Fy q is the flow in year y averaged across all funds in flow quintile q, and Perf q y 5,y is the posterior performance (alpha relative to the 4-factor model) estimated using our methodology over the previous 5 years (from y 5 to y ) averaged across all funds in flow quintile q. In Panel B of the table, we examine the relation between fund flows and future performance using the specification Perf q y+,y+5 Perf q y 5,y = β + β F q y + uq y, where F q y is as above and the dependent variable is the difference between performance in the 5-year period after and the 5-year period before year y, averaged across all funds in flow quintile q. To eliminate the effect of time, performance measures in both specifications are de-meaned by subtracting the mean performance across all funds operating contemporaneously. The effects estimated from these regressions are consistent with those calculated from the quantilebased analysis. For example, in Panel A of Table C.8, we see that an increase of % in the annualized posterior mean of alpha in the 5-year period prior to flow measurement corresponds to an increase of 28% in the measured flows. In Panel B, we see that an increase of % in capital flows corresponds to a decrease of.25% in the difference between the annualized posterior mean of alpha in the subsequent and the preceding 5-year period. C-25