Quantile Regression due to Skewness. and Outliers

Similar documents
PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Window Width Selection for L 2 Adjusted Quantile Regression

σ e, which will be large when prediction errors are Linear regression model

A RIDGE REGRESSION ESTIMATION APPROACH WHEN MULTICOLLINEARITY IS PRESENT

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Data Distributions and Normality

Longitudinal Modeling of Insurance Company Expenses

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Wage Determinants Analysis by Quantile Regression Tree

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

Linear Regression with One Regressor

Bayesian Non-linear Quantile Regression with Application in Decline Curve Analysis for Petroleum Reservoirs.

Appendix A (Pornprasertmanit & Little, in press) Mathematical Proof

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Analysis of Variance in Matrix form

Leasing and Debt in Agriculture: A Quantile Regression Approach

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Application of Conditional Autoregressive Value at Risk Model to Kenyan Stocks: A Comparative Study

Power of t-test for Simple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach

University of California Berkeley

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

R. Kerry 1, M. A. Oliver 2. Telephone: +1 (801) Fax: +1 (801)

INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Example 1 of econometric analysis: the Market Model

Variable Life Insurance

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Modeling Implied Volatility

Random Variables and Probability Distributions

Five Things You Should Know About Quantile Regression

Quantile Regression in Survival Analysis

SAMPLE STANDARD DEVIATION(s) CHART UNDER THE ASSUMPTION OF MODERATENESS AND ITS PERFORMANCE ANALYSIS

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

Discrete Choice Modeling

Portfolio Optimization. Prof. Daniel P. Palomar

Modified ratio estimators of population mean using linear combination of co-efficient of skewness and quartile deviation

The Great Moderation Flattens Fat Tails: Disappearing Leptokurtosis

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Chapter 4. The Normal Distribution

Risk Reduction Potential

ANALYSIS OF THE DISTRIBUTION OF INCOME IN RECENT YEARS IN THE CZECH REPUBLIC BY REGION

A Note on Predicting Returns with Financial Ratios

Normal Probability Distributions

Health Expenditures and Life Expectancy Around the World: a Quantile Regression Approach

On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations

Model Construction & Forecast Based Portfolio Allocation:

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

Basic Procedure for Histograms

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Chapter 7 Notes. Random Variables and Probability Distributions

Stochastic model of flow duration curves for selected rivers in Bangladesh

Basic Regression Analysis with Time Series Data

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

Dynamic Replication of Non-Maturing Assets and Liabilities

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Effect of Firm Age in Expected Loss Estimation for Small Sized Firms

Lecture 3: Probability Distributions (cont d)

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Factors in Implied Volatility Skew in Corn Futures Options

STA 103: Final Exam. Print clearly on this exam. Only correct solutions that can be read will be given credit.

Simple Descriptive Statistics

GARCH Models for Inflation Volatility in Oman

Midterm Exam. b. What are the continuously compounded returns for the two stocks?

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Midterm

Nonlinear Dependence between Stock and Real Estate Markets in China

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

THE OPTIMAL HEDGE RATIO FOR UNCERTAIN MULTI-FOREIGN CURRENCY CASH FLOW

Economics 483. Midterm Exam. 1. Consider the following monthly data for Microsoft stock over the period December 1995 through December 1996:

The histogram should resemble the uniform density, the mean should be close to 0.5, and the standard deviation should be close to 1/ 12 =

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Empirical Rule (P148)

Comparison of OLS and LAD regression techniques for estimating beta

Fixed Effects Maximum Likelihood Estimation of a Flexibly Parametric Proportional Hazard Model with an Application to Job Exits

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Cross- Country Effects of Inflation on National Savings

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

Derivation Of The Capital Asset Pricing Model Part I - A Single Source Of Uncertainty

Topic 8: Model Diagnostics

ECON FINANCIAL ECONOMICS

BEST LINEAR UNBIASED ESTIMATORS FOR THE MULTIPLE LINEAR REGRESSION MODEL USING RANKED SET SAMPLING WITH A CONCOMITANT VARIABLE

Chapter 5: Summarizing Data: Measures of Variation

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

ECON FINANCIAL ECONOMICS

ESTIMATION OF MODIFIED MEASURE OF SKEWNESS. Elsayed Ali Habib *

Journal of Economic Studies. Quantile Treatment Effect and Double Robust estimators: an appraisal on the Italian job market.

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Solutions to Midterm

Modelling Returns: the CER and the CAPM

Fitting financial time series returns distributions: a mixture normality approach

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

The Normal Distribution

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Found under MATH NUM

Transcription:

Applied Mathematical Sciences, Vol. 5, 2011, no. 39, 1947-1951 Quantile Regression due to Skewness and Outliers Neda Jalali and Manoochehr Babanezhad Department of Statistics Faculty of Sciences Golestan University Gorgan, Golestan, Iran m.babanezhad@gu.ac.ir Abstract Regression models explore relationship between a response variable and some explanatory variables based often on conditionally mean function. The choice of mean framework is not always appropriate for two reasons. First, when the distribution of explanatory variable may highly be skewed, and second when sever outliers may be observed in the analysis. In contrast, quantile regression, in special case median regression, remains informative in such situations. In this paper, we briefly define quantile regression. We investigate the efficiency of this method by estimating the effect of age on satisfaction score by median regression. Keywords: Linear regression, Skewness, Outliers, Quantile regression, Median 1 Introduction Ordinary regression models explore the relationship between a response variable and some explanatory variables by a conditionally mean function, Y i = E(Y i X i )+ɛ i (e.g., E(Y i X i ) = α + βx i ). If regular assumptions such as, uncorrelated random distributed of error term ɛ i, with mean zero and constant variance σ 2

1948 N. Jalali and M. Babanezhad are satisfied, then the least square estimator ˆβ for β is the best linear unbiased estimator. In some situation where regular assumptions are not met [1, 2], the conditionally mean functions are then poor to characterize the relationship between Y and X [3, 4, 5]. In contrast, quantile regression, in special case median regression, which is the extension of classical regression model, might lead to the best and unbiased estimator. In next section, we briefly introduce quantile regression, and in Section 3, we investigate the efficiency of quantile regression by estimating the effect of age on satisfaction score adjusted by sex, education and number of children through median regression. Section 4 is ended by conclusion. 2 Quantile regression As stated before, quantile regression is constructed by conditionally quantile given one or more explanatory variables. Following Koenker and Basset [1], linear quantile regression can be modelled as, Q(τ X i )=α + β(τ)x i where β(τ) can be estimated by solving: β(τ) = τ(y i α τ β τ X i )+ (1 + τ)(y i α τ β τ X i ) i;y i α τ +β τ X i i:y i >α τ +β τ X i In special case when τ = 0.5 which minimizes absolute deviations is median regression [1, 3]. The main advantage of median regression is its ability to estimate the effect of X without making assumption about error term. In addition, estimating parameters by least absolute deviations ignore the effect of outliers because it merely considers the sign of them not magnitude. In contrast, ordinary least square consider the magnitude of the deviations and do not control the extension of the outliers [5]. 3 Case Study A research was constructed based on a questioner which includes 20 multiple choice questions. Samples were taken from Gorgan population (a city in north of Iran). Using Cochran formula 406 questionnaires were prepared. Every

Quantile regression 1949 Table I. Parameters estimates and 95 % confidence interval. Covariate Coefficient 95 % confidence intervals P.value Age -0.11 (-0.23, 0.04) 0.042 Education -0.33 (-0.58, 0.31) 0.242 Sex 2.45 (1.14, 3.72) 0.007 Child 0.34 (-0.57, 0.83) 0.407 question was ranked from 1 to 4 and sum of ranks were introduced as a satisfaction criteria for each participant. The aim of the research was to estimate the effect of age on satisfaction score adjusted covariates including sex (man or woman), education, number of children by quantile (median) regression. The model for quantile (median) regression can be written as, Q 0.5 = β 0 + β 1 age + β 2 sex + β 3 education + β 4 children (1) where β 1 is the effect of age. Table I shows parameters estimates and 95 % confidence intervals. For instance, the analysis shows age has significant effect on life satisfaction score. In addition, the analysis shows a 44 years old woman has less than 37.44 % life satisfaction score. In the contrary, this score for a same age man is less than 37.11 %. A histogram of the standardized residuals from median regression and normal fitted density curve shows residuals has normal distribution (Figure 1) with constant variance (Figure 2). It implied that median regression fits the data well.

1950 N. Jalali and M. Babanezhad Histogram of residuals Frequency 0 20 40 60 80 100 120 140 20 10 0 10 20 30 residuals Figure 1: Distribution of residuals Normal Q Q Plot Sample Quantiles 10 0 10 20 30 3 2 1 0 1 2 3 Theoretical Quantiles Figure 2: Quantile-quantile (QQ) plot of residuals 4 Discussion The most of applied statistics may be constructed as linear regression model, and associated estimation method often is ordinary least squares. However, we would be able to check the regular assumptions about data (normality) and residuals. In addition, one may be interested in other position parameter instead of mean [4]. Thus quantile regression is more preferable in such situations. Quantile regression has this capability to analysis the whole distribution whereas ordinary regression merely considers the central distribution [1, 4, 5, 6]. In our example, different covariates influence response variable in different quantiles. Unreported analysis displays that quantile regression results for quantiles 0.2 and 0.9 are also doing better than the ordinary regression

Quantile regression 1951 estimators. References [1] R. Koneker and G.W. Basset, Regression Quantiles, Econometrica, 46 (1987), 33 50. [2] R. Koneker and K. Hallock, Quantile Regression: An Introduction, Journal of Economic Perspective, 15 (2001), 143-156. [3] K. Yu, Z. Lu and J. Stander, Quantile regression: applications and current research areas. The Statistician, 52 (2003), 331-350. [4] P. Cizek, Semiparametrically weighted robust estimation of regression models, Computational Statistics and data analysis, 55 (2011), 774 788. [5] A. Gannoun, J. Saracco and K. Yu, Nonparametric prediction by conditional median and quantiles, Journal of Statistical Planning and Inference, 117 (2003), 207 223. [6] H. Lingxin and D.Q. Naiman, Quantile regression. Press/CRC, 2007. Received: December, 2010