Five Things You Should Know About Quantile Regression

Similar documents
Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Window Width Selection for L 2 Adjusted Quantile Regression

WC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology

Environmental samples below the limits of detection comparing regression methods to predict environmental concentrations ABSTRACT INTRODUCTION

Wage Determinants Analysis by Quantile Regression Tree

Statistics and Finance

Statistical Case Estimation Modelling

Model Construction & Forecast Based Portfolio Allocation:

Topic 8: Model Diagnostics

Quantile Regression due to Skewness. and Outliers

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Financial Econometrics Jeffrey R. Russell. Midterm 2014 Suggested Solutions. TA: B. B. Deng

Financial Time Series Analysis (FTSA)

To be two or not be two, that is a LOGISTIC question

Analyzing Oil Futures with a Dynamic Nelson-Siegel Model

Quantile Regression in Survival Analysis

Lasso and Ridge Quantile Regression using Cross Validation to Estimate Extreme Rainfall

The Great Moderation Flattens Fat Tails: Disappearing Leptokurtosis

Models of Patterns. Lecture 3, SMMD 2005 Bob Stine

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

9. Logit and Probit Models For Dichotomous Data

Multiple Regression. Review of Regression with One Predictor

The Comovements Along the Term Structure of Oil Forwards in Periods of High and Low Volatility: How Tight Are They?

Analysis of Variance in Matrix form

Maximum Likelihood Estimation

Empirical Asset Pricing for Tactical Asset Allocation

Lecture 8: Markov and Regime

Cross-Sectional Distribution of GARCH Coefficients across S&P 500 Constituents : Time-Variation over the Period

Chapter 6 Part 3 October 21, Bootstrapping

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Midterm Exam. b. What are the continuously compounded returns for the two stocks?

Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY

Lecture 9: Markov and Regime

Lecture notes on risk management, public policy, and the financial system. Credit portfolios. Allan M. Malz. Columbia University

Internet Appendix for Asymmetry in Stock Comovements: An Entropy Approach

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 22 January :00 16:00

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Numerical Descriptions of Data

ก ก ก ก ก ก ก. ก (Food Safety Risk Assessment Workshop) 1 : Fundamental ( ก ( NAC 2010)) 2 3 : Excel and Statistics Simulation Software\

SAS Simple Linear Regression Example

Stat 328, Summer 2005

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Are Market Neutral Hedge Funds Really Market Neutral?

Gamma Distribution Fitting

Lecture 5a: ARCH Models

Jaime Frade Dr. Niu Interest rate modeling

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

GARCH Models. Instructor: G. William Schwert

Bayesian Multinomial Model for Ordinal Data

Some estimates of the height of the podium

Optimal Portfolio Choice under Decision-Based Model Combinations

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Consistent estimators for multilevel generalised linear models using an iterated bootstrap

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Quantile regression and surroundings using SAS

Discussion The Changing Relationship Between Commodity Prices and Prices of Other Assets with Global Market Integration by Barbara Rossi

Descriptive Statistics

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

DATA SUMMARIZATION AND VISUALIZATION

1. You are given the following information about a stationary AR(2) model:

Monetary Economics Measuring Asset Returns. Gerald P. Dwyer Fall 2015

Assessing the performance of Bartlett-Lewis model on the simulation of Athens rainfall

Logit Models for Binary Data

Idiosyncratic risk, insurance, and aggregate consumption dynamics: a likelihood perspective

Financial Econometrics Notes. Kevin Sheppard University of Oxford

is the bandwidth and controls the level of smoothing of the estimator, n is the sample size and

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

ONLINE APPENDIX (NOT FOR PUBLICATION) Appendix A: Appendix Figures and Tables

Linda Allen, Jacob Boudoukh and Anthony Saunders, Understanding Market, Credit and Operational Risk: The Value at Risk Approach

1. Distinguish three missing data mechanisms:

Beating the market, using linear regression to outperform the market average

Economics 424/Applied Mathematics 540. Final Exam Solutions

Properties of the estimated five-factor model

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Solutions to Midterm

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

DECOMPOSITION OF THE CONDITIONAL ASSET RETURN DISTRIBUTION

Intro to GLM Day 2: GLM and Maximum Likelihood

Computational Statistics Handbook with MATLAB

Variance clustering. Two motivations, volatility clustering, and implied volatility

University of Zürich, Switzerland

A Quantile Regression Approach to the Multiple Period Value at Risk Estimation

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Prediction errors in credit loss forecasting models based on macroeconomic data

Analyzing the Determinants of Project Success: A Probit Regression Approach

The histogram should resemble the uniform density, the mean should be close to 0.5, and the standard deviation should be close to 1/ 12 =

John Hull, Risk Management and Financial Institutions, 4th Edition

Study 2: data analysis. Example analysis using R

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Indian Sovereign Yield Curve using Nelson-Siegel-Svensson Model

Longitudinal Modeling of Insurance Company Expenses

Percentiles, STATA, Box Plots, Standardizing, and Other Transformations

Modeling Panel Data: Choosing the Correct Strategy. Roberto G. Gutierrez

Portfolio Optimization. Prof. Daniel P. Palomar

2.4 STATISTICAL FOUNDATIONS

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2017, Mr. Ruey S. Tsay. Solutions to Final Exam

Some Characteristics of Data

VaR vs CVaR in Risk Management and Optimization

R. Kerry 1, M. A. Oliver 2. Telephone: +1 (801) Fax: +1 (801)

Transcription:

Five Things You Should Know About Quantile Regression Robert N. Rodriguez and Yonggang Yao SAS Institute #analyticsx Copyright 2016, SAS Institute Inc. All rights reserved.

Quantile regression brings the familiar concept of a percentile into the framework of linear models Goal Interpretability and accurate prediction y i = β 0 + β 1 x i1 + + β p x ip + ε i, i = 1,, n Outline Basic concepts Fitting and building quantile regression models Application to business performance ranking Application to risk management

Basic Concepts of Quantile Regression

How do you fit a predictive model when your data look like this?

How do you fit a predictive model when your data look like this?

Standard linear regression assumes a constant variance, which is often not the case

and applying a log transformation does not often stabilize the variance

Regression models for percentiles capture the entire conditional distribution 90 th percentile 50 th percentile 10 th percentile

The term quantile is used in place of percentile, but it has the same meaning 0.9 quantile 0.5 quantile 0.1 quantile

and the Greek symbol τ is used for the probability level associated with the quantile 0.9 quantile (τ=0.9) 0.5 quantile (τ=0.5) 0.1 quantile (τ=0.1)

How does quantile regression compare with standard linear regression? Linear Regression Predicts conditional mean Applies with limited n Assumes normality Is sensitive to outliers Is computationally inexpensive Quantile Regression Predicts conditional distribution Needs sufficient data in tails Is distribution agnostic Is robust to outliers Is computationally intensive

Fitting Quantile Regression Models

The coefficient estimates for standard regression minimize a sum of squares The regression model for the average response is E(y i ) = β 0 + β 1 x i1 + + β p x ip, i = 1,, n and the β j s are estimated as arg min β 0,,β p n i=1 y i β 0 p j=1 x ij β j 2

In contrast, the coefficient estimates for quantile regression minimize a sum of check losses The regression model for the τ-th quantile of the response is Q τ (y i ) = β 0 (τ) + β 1 (τ)x i1 + + β p (τ)x ip, i = 1,, n and the β j τ s are estimated as arg min β 0,, β p n i=1 ρ τ y i β 0 p j=1 x ij β j where ρ τ (r) = τ max(0, r) + (1 τ) max(0, r) For each quantile level τ, there is a distinct set of regression coefficients.

The QUANTREG procedure fits quantile regression models and performs statistical inference Example Modeling the 10 th, 50 th, and 90 th percentiles of customer lifetime value Goal Target customers with low, medium, and high value after adjusting for 15 covariates such as Maximum balance Average overdraft Total credit card amount used proc quantreg data=clv ci=sparsity; model CLV = X1-X15 / quantile = 0.1 0.5 0.9; run;

Quantile regression produces a distinct set of parameter estimates and predictions for each quantile level 10 th Percentile 90 th Percentile

The QUANTREG procedure provides extensive features for statistical inference Simplex, interior point, and smooth algorithms for estimation Sparsity and bootstrap resampling methods for confidence limits Wald, likelihood ratio, and rank-score tests Quantile process regression, which fits a model for all values of τ in (0,1)

Quantile process plots display the effects of predictors on different parts of the response distribution X15 positively affects the upper tail of the distribution

Quantile process plots display the effects of predictors on different parts of the response distribution X5 positively affects the lower tail of the distribution

Paneled process plots help you identify which predictors are associated with different parts of the response distribution

Building Quantile Regression Models

Example: Which variables differentiate high-performing stores from low-performing stores? Response: close rate for 500 stores Candidate predictors Store descriptors (X1 X20) Promotion (P1 P6) Layout (L1 L6) Approach 1. Build parsimonious regression models for the 10 th, 50 th, 90 th percentiles 2. Compare the variables selected for each model

The QUANTSELECT procedure selects effects in quantile regression models Features Provides forward, backward, stepwise, and lasso selection methods Provides extensive control over the selection Builds models for specified quantiles or the entire quantile process proc quantselect data=store plots=coefficients; model Close_Rate = X1-X20 L1-L6 P1-P6 / quantile=0.1 0.5 0.9 selection=lasso(sh=3); partition fraction(validate=0.3); run;

Coefficient progression plots show how the model fit evolves during variable selection

The layout variables L2, L3, and L5 are selected only in the model for the 90 th percentile of close rates 10 th Percentile 50 th Percentile 90 th Percentile

Quantile regression gives you insights that would be difficult to obtain with standard regression methods P2 positively affects the lower half of the close rate distribution

The QUANTSELECT procedure is a versatile tool for model building Models can contain main effects consisting of continuous and classification variables, and their interactions. Models can contain constructed effects, such as splines. Each level of a CLASS variable can be treated as an individual effect. Data can be partitioned to avoid overfitting. Syntax and functionality resemble those of the GLMSELECT procedure.

Application to Business Performance Ranking

You can rank observations according to their percentile levels by using quantile regression to adjust for covariates Quantile regression can predict the conditional quantiles of a response distribution for a grid of quantile levels. From the predicted quantiles, you can compute the quantile (percentile) levels for specified observations. This works because the quantile and cumulative distribution functions are inverses of each other.

How do you rank the weekly sales for different stores, after adjusting for advertising cost? Yao (2015)

Where do Stores 1, 1001, and 2001 fall within the distributions of stores with the same advertising costs?

What are the quantile levels of Stores 1, 1001, and 2001 within the distributions of stores with the same advertising costs?

You can estimate the conditional distributions using quantile regression

Begin by predicting the conditional quantiles of sales for a fine grid of quantile levels

Note that the conditional distributions of sales have different shapes, which depend on advertising cost

The distribution functions of sales for specified advertising costs can be computed from the predicted quantiles

You can obtain the quantile levels (ranks) for the three stores from the distributions for stores with their costs

You can obtain the quantile levels (ranks) for the three stores from the distributions for stores with their costs

How do the three stores rank before and after accounting for advertising cost?

Application to Risk Management

Quantile regression provides a robust approach for estimating value at risk (VaR) VaR measures market risk by how much a portfolio can lose within a given time period, for a confidence level (1 τ ) VaR is a conditional quantile of future portfolio values Pr[ y t < -VaR t = Ω t ] = τ where Ω t is the information at time t, and {y t } is the series of portfolio returns Methods for measuring VaR include GARCH models, which estimate the volatility of the portfolio and assume normality for financial returns

Example: weekly return rates of the S&P 500 Index

You can use PROC VARMAX to predict VaR with a GARCH(1,1) model, which assumes normality

Alternatively, you can use PROC QUANTREG to predict VaR by conditioning on lagged standard errors estimated by PROC VARMAX proc varmax data=sp500; model Rate / p=1; garch form=ccc subform=garch q=6; output out=stderr lead=1; id date interval=week; run; proc quantreg data=stderr; model Rate = std1-std7 / quantile=0.05; output out=qr p=var; id date; run; Xiao, Guo, and Lam (2015)

Quantile regression offers robustness in situations where market returns display negative skewness and excess kurtosis Xiao, Guo, and Lam (2015)

Wrap-Up

Five things you should remember about quantile regression 1. Quantile regression gives a complete picture of the conditional response distribution if there is sufficient data in the tails. 2. The QUANTREG and QUANTSELECT procedures provide versatile tools for fitting and building quantile regression models. 3. Quantile process plots reveal effects of predictors on different parts of the response distribution. 4. Quantile regression ranks observations by estimating their conditional percentile levels. 5. Quantile regression yields major insights when the most valuable information lies in the tails.

Learn more at http://support.sas.com/statistics Sign up for e-newsletter Watch short videos Download overview papers

Five Things You Should Know About Quantile Regression Robert N. Rodriguez and Yonggang Yao SAS Institute #analyticsx Copyright 2016, SAS Institute Inc. All rights reserved.