QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016

Similar documents
Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Introduction to Algorithmic Trading Strategies Lecture 8

Mongolia s TOP-20 Index Risk Analysis, Pt. 3

Measuring Financial Risk using Extreme Value Theory: evidence from Pakistan

REINSURANCE RATE-MAKING WITH PARAMETRIC AND NON-PARAMETRIC MODELS

Non-pandemic catastrophe risk modelling: Application to a loan insurance portfolio

Assessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

Frequency Distribution Models 1- Probability Density Function (PDF)

Normal Probability Distributions

John Cotter and Kevin Dowd

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Data Analysis and Statistical Methods Statistics 651

The topics in this section are related and necessary topics for both course objectives.

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

STAT 157 HW1 Solutions

Case Study: Heavy-Tailed Distribution and Reinsurance Rate-making

Lecture 6: Non Normal Distributions

Appendix A. Selecting and Using Probability Distributions. In this appendix

Part V - Chance Variability

Modelling catastrophic risk in international equity markets: An extreme value approach. JOHN COTTER University College Dublin

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Making Sense of Cents

Financial Risk Forecasting Chapter 9 Extreme Value Theory

Continuous random variables

CHAPTER TOPICS STATISTIK & PROBABILITAS. Copyright 2017 By. Ir. Arthur Daniel Limantara, MM, MT.

Probability. An intro for calculus students P= Figure 1: A normal integral

Business Statistics 41000: Probability 3

Unit2: Probabilityanddistributions. 3. Normal distribution

STA 248 H1S Winter 2008 Assignment 1 Solutions

Modelling insured catastrophe losses

Modelling Environmental Extremes

Background. opportunities. the transformation. probability. at the lower. data come

The Central Limit Theorem

Modelling Environmental Extremes

Chapter 7. Inferences about Population Variances

QQ Plots Stat 342, Spring 2014 Prof. Guttorp - TA Aaron Zimmerman

Relative Error of the Generalized Pareto Approximation. to Value-at-Risk

Sampling Distributions and the Central Limit Theorem

Chapter 4. The Normal Distribution

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

The normal distribution is a theoretical model derived mathematically and not empirically.

Continuous Probability Distributions

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

23.1 Probability Distributions

Computational Statistics Handbook with MATLAB

Data Analysis and Statistical Methods Statistics 651

Fitting parametric distributions using R: the fitdistrplus package

Monte Carlo Simulation (General Simulation Models)

This homework assignment uses the material on pages ( A moving average ).

Lab 9 Distributions and the Central Limit Theorem

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Scaling conditional tail probability and quantile estimators

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Math 227 Elementary Statistics. Bluman 5 th edition

Paper Series of Risk Management in Financial Institutions

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Extreme Values Modelling of Nairobi Securities Exchange Index

Probability Weighted Moments. Andrew Smith

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Estimate of Maximum Insurance Loss due to Bushfires

An Improved Skewness Measure

Statistical Intervals (One sample) (Chs )

Statistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015

THRESHOLD PARAMETER OF THE EXPECTED LOSSES

A Markov Chain Monte Carlo Approach to Estimate the Risks of Extremely Large Insurance Claims

starting on 5/1/1953 up until 2/1/2017.

Where s the Beef Does the Mack Method produce an undernourished range of possible outcomes?

Generalized MLE per Martins and Stedinger

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

The misleading nature of correlations

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

Homework Problems Stat 479

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Quality Digest Daily, March 2, 2015 Manuscript 279. Probability Limits. A long standing controversy. Donald J. Wheeler

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Gamma Distribution Fitting

Unit 5: Sampling Distributions of Statistics

Chapter 4 Probability and Probability Distributions. Sections

Lecture 6: Chapter 6

Unit 5: Sampling Distributions of Statistics

Discrete Random Variables

Discrete Random Variables

Module 3: Sampling Distributions and the CLT Statistics (OA3102)

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

STAT Chapter 6: Sampling Distributions

AN EXTREME VALUE APPROACH TO PRICING CREDIT RISK

Evidence from Large Workers

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

KURTOSIS OF THE LOGISTIC-EXPONENTIAL SURVIVAL DISTRIBUTION

Chapter 6: The Normal Distribution

Statistics and Finance

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Sampling Distributions

Why Pooling Works. CAJPA Spring Mujtaba Datoo Actuarial Practice Leader, Public Entities Aon Global Risk Consulting

Transcription:

QQ PLOT INTERPRETATION: Quantiles: QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016 The quantiles are values dividing a probability distribution into equal intervals, with every interval having the same fraction of the total population. QQ-plot: The purpose of the quantile-quantile (QQ) plot is to show if two data sets come from the same distribution. Plotting the first data set s quantiles along the x-axis and plotting the second data set s quantiles along the y-axis is how the plot is constructed. In practice, many data sets are compared to the normal distribution. The normal distribution is the base distribution and its quantiles are plotted along the x-axis as the Theoretical Quantiles while the sample quantiles are plotted along the y-axis as the. A few examples are presented below. The first sample is obtained by simulating a standard normal distribution with sample size 1000. We will test this sample against the standard normal distribution to see if the quantiles match. From the histogram we can see that this sample is bell shaped around zero. This leads us to reason that it most likely comes from a standard normal distribution. Histogram of x Normal Q Q Plot Frequency 0 50 100 150 200 3 2 1 0 1 2 3 3 1 1 2 3 x 3 1 0 1 2 3 1

When looking at the QQ plot, we see the points match up along a straight line which shows that the quantiles match. While the line plotted is not a necessary component of the QQ plot, it allows the reader to visualize where the points should line up should the sample match the base distribution. 2

The next examples will show what various QQ plots look like if two data sets do not come from the same distribution. This next example shows a right skewed sample compared to the standard normal distribution. The sample is obtained by simulating a chi-squared distribution with 2 degrees of freedom and a sample size of 1000. From the histogram we can see that the distribution is right skewed since it contains many observations around zero but then rapidly declines in the frequency of values as w increases. The QQ plot shows this sample s quantiles compared to the standard normal. Intuitively, it makes sense that the points should not align along a line since the data sets are not from the same distribution. Histogram of w Normal Q Q Plot Frequency 0 100 200 300 400 0 2 4 6 8 10 12 0 2 4 6 8 10 w 3 1 0 1 2 3 From the QQ plot, we see that the sample has high frequency in values zero to five, therefore its quantiles will increase slower in this region relative to the standard normal quantiles. However, after the sample s value is above five, the samples quantile will increase faster than the standard normal quantile. 3

This next example shows a left skewed sample compared to the standard normal distribution. The sample is obtained from a chi-squared distribution with 2 degrees of freedom, however each value is multiplied by (-1) in order to reflect the distribution about the y-axis. The histogram shows that this distribution is in fact left skewed. The QQ plot shows that the points do not align along a line since the data sets come from different distributions. Histogram of u Normal Q Q Plot Frequency 0 100 200 300 400 12 8 6 4 2 0 14 10 6 2 u 3 1 0 1 2 3 From the QQ plot, we see that the sample has low frequency in values -15 to -5; therefore its quantiles will increase rapidly in this region. However, after the sample s value is above -5, the samples quantile will increase slowly and tail off at 0, since that is the highest value in the sample. 4

The next two examples show samples that come from heavy and light tail distributions. The first example shows a sample taken from a uniform distribution (-3, 3) compared to the standard normal distribution. Although the comparison of this sample to the standard normal is not truly fair since the sample is strictly bounded between (-3,3), the results of the test are worth mentioning. Looking at the histogram, the sample has no tails beyond -3 or 3. This presents an interesting looking qq plot that is depicted below. The light tailed distributions yield an s shape depicted in the qq plot. Histogram of v Normal Q Q Plot Frequency 0 20 40 60 80 100 3 2 1 0 1 2 3 3 1 0 1 2 3 v 3 1 0 1 2 3 Approximately from the values (-3, -1.5), the sample grows slower than the standard normal distribution; therefore it takes longer for the sample quantiles to increase. This is shown by the concave up portion of the graph. From the values (-1.5,1.5), the sample seems to grow at approximately the same pace as the standard normal distribution; therefore their quantiles match in this region. Lastly, from the values (1.5, 3), the sample grows faster than the standard normal distribution; therefore the sample reaches its highest quantile before the standard normal distribution. This is why the sample quantile looks flat at the top; the sample has reached its highest quantile, but the standard normal has not and still needs to increase a little to reach it. 5

The last example depicts a sample with heavy tails relative to the standard normal distribution. This sample is obtained by simulating a random sample of a student s t distribution with 5 degrees of freedom. The histogram shows that the sample looks bell shaped, however when looking at the QQ plot we see an inverted s shape. Histogram of z Normal Q Q Plot Frequency 0 50 100 200 300 4 2 0 2 4 6 8 4 0 2 4 6 8 z 3 1 0 1 2 3 Approximately from the values (-3, -1.5), the sample grows faster than the standard normal distribution; therefore it takes a shorter time for the sample quantiles to increase. From the values (-1.5,1.5), the sample seems to grow at approximately the same pace as the standard normal distribution; therefore their quantiles match in this region. Lastly, from the values (1.5, 3), the sample grows slower than the standard normal distribution; therefore the sample reaches its highest quantile before the standard normal distribution. This is why the sample quantile looks vertical at the top; the standard normal distribution has reached its highest quantile, but the sample has not and still needs to increase to reach it. 6

These different types of plots help us distinguish how the sample compares to the base distribution. For example, if we have a sample and would like to see how it compares to the standard normal, we construct a QQ plot. If the QQ plot yields an inverted s shape, then we would reason that the sample probably does not come from the normal distribution. In addition, from our analysis of the different QQ plots, we would reason that the sample has heavy tails. Therefore, we have the option of comparing our sample to a heavy tailed distribution such as a two parameter Pareto, or a Weibull distribution. If we now construct a QQ plot of our sample against one of these heavy tailed distributions and the QQ plot yields a straight line, then we have reason to believe that our sample has a high probability of coming from the distribution that we tested. 7

QQ PLOT APPLICATION: Part one of this document discusses an analysis of the extreme valuation theorem. Maximum Likelihood estimates are calculated from simulating different random variables. In this first case, we will look at the sample taken from the Uniform simulation. According to the extreme valuation theorem (explained in greater detail in Part One), this sample should converge to a Weibull distribution as the sample size increases. MaxstarW vs Weibull QQ Plot 4 3 2 1 0 5 4 3 2 1 0 The QQ plot is constructed by plotting the sample generated from part one (we will name it MaxstarW) compared to the Weibull distribution. The parameters of the Weibull distribution are the maximum likelihood estimates found in part one. As one can see, the plot shows a straight line which shows that the quantiles match. Therefore, we have reason to believe that extreme valuation theorem does hold in this case. 8

In the next case, we will look at the sample taken from the Exponential simulation. According to the extreme valuation theorem (explained in greater detail in Part One), this sample should converge to a Gumbel distribution as the sample size increases. MaxstarG vs Gumbel QQ Plot 360 358 356 354 4 5 6 7 8 9 10 11 The QQ plot is constructed by plotting the sample generated from an Exponential simulation (we will name it MaxstarG) compared to the Gumbel distribution. The parameters of the Gumbel distribution are the maximum likelihood estimates found in part one. As one can see, the plot shows a straight line which shows that the quantiles match. Therefore, we have reason to believe that extreme valuation theorem does hold in this case also. 9

In the final case, we will look at the sample taken from the Frechet simulation. According to the extreme valuation theorem (explained in greater detail in Part One), this sample should converge to a Frechet distribution as the sample size increases. Using QQ plots, we will show that the Frechet distribution is the best distribution of the three to use for the Frechet simulation. MaxstarF vs Weibull QQ Plot 0 50 100 150 0 5000 10000 15000 This QQ plot is constructed by plotting the sample generated from Frechet simulation (we will name it MaxstarF) compared to the Weibull distribution. The parameters of the Weibull distribution are found using the maximum likelihood of the Weibull distribution with this sample. As one can see from the plot, the quantiles do not match, and according to our QQ plot interpretation, the sample seems to be skewed compared to the Weibull distribution. Therefore, we do not have reason to believe that this sample tends to a Weibull distribution. 10

MaxstarF vs Gumbel QQ Plot 0 50 100 150 2000 0 2000 4000 6000 8000 This QQ plot is constructed by plotting the sample generated from Frechet simulation (MaxstarF) compared to the Gumbel distribution. The parameters of the Gumbel distribution are found using the maximum likelihood of the Gumbel distribution with this sample. As one can see from the plot, the quantiles do not match, and according to our QQ plot interpretation, the sample seems to be skewed compared to the Gumbel distribution. Therefore, we do not have reason to believe that this sample tends to a Gumbel distribution. 11

MaxstarF vs Frechet QQ Plot 0 50 100 150 0 10000 20000 30000 40000 50000 60000 The final QQ plot is constructed by plotting the sample generated from Frechet simulation (MaxstarF) compared to the Frechet distribution. The parameters of the Frechet distribution are found using the maximum likelihood of the Frechet distribution with this sample. As one can see from the plot, the quantiles do match which leads us to believe that this sample does tend toward a Frechet distribution. This is what the extreme valuation theorem predicts, and therefore we reason that the theorem holds true for all three cases. 12

References: Engineering Statistics Handbook Quantile-Quantile Plot (2016) http://www.itl.nist.gov/div898/handbook/ eda/section3/qqplot.htm Skews and Tails (2016) http://www.google.com/search?q=heavy+tailed+qq+plot&client=safari&rls=en& prmd=ivns&ei=nhc-v5swf8thmqg166ug&start=10&sa=n University of Virginia Library Research Data Services Understanding Q-Q Plots (2016) http://data.library. virginia.edu/understanding-q-q-plots/ 13