H i s t o g r a m o f P ir o. P i r o. H i s t o g r a m o f P i r o. P i r o

Size: px
Start display at page:

Download "H i s t o g r a m o f P ir o. P i r o. H i s t o g r a m o f P i r o. P i r o"

Transcription

1 fit Lecture 3 Common problem in applications: find a density which fits well an eperimental sample. Given a sample 1,..., n, we look for a density f which may generate that sample. There eist infinitely many such densities, for a given sample (think to the case n 1). But, for some of them the sample is natural, typical, for others it is etreme, unusual, even if possible. We look for a density such that the sample is typical for it. Let us treat two eamples: the result of a test for future students in medicine, (large and regular sample), the intensity of the last 19 volcanic eruptions at Campi Flegrei (few data, one outlier). Load on R the file dati_campi_flegrei.tt (first column, ecept last component), save them in the vector Piro: A -read.table(file dati_campi_flegrei.tt,header TRUE) Piro - A[1:19,1] Load also test_medicina.tt, saved in the vector Medi. B -read.table(file test_medicina.tt,header TRUE) Medi - B[,2] These are Piro data: 5.4, 9.3, 23.4, 10, 27.6, 29.5, 52.9, 44.3, 18.3, 38.7, 7.4, 347.6, 5.3, 19.1, 44.3, 29.5, 71.2, 5.4, 18.1 Histograms and empiric cumulatives An histogram is a kind of empiric density. But it is not uniquely determined from data: it depends on the classes. Let us see two histograms of Piro, hist(piro) and hist(piro,15): H i s t o g r a m o f P ir o Frequency P i r o H i s t o g r a m o f P i r o Frequency P i r o They have absolute frequences. If we want area one under the graph, let us use hist(x,15,freq FALSE):

2 H is to g r a m o f P ir o P i ro We get a first idea of data and probability of different values.. Due to the outlier 347.6, most of the histogram is squeezed to the left. We may epand it by Piro.cut - c(piro[1:11],piro[13:19]) hist(piro.cut,7,freq FALSE) H is to g r a m o f P ir o.c u t P i ro.c u t From the epansion we not that there is no ascending part on the left, as we have in Weibull or Gamma distributions with shape 1. Thus, if we use Weibull, we choose shape 1. Much more regular is the histrogram of Medi: H is t o g r a m o f m e d m e d Let us plot the empirical cumulative plot.ecdf(piro). It is absolute, no choice of classes. For Piro and Medi: e c d f ( ) Fn()

3 e c d f ( ) Fn() Parametric and non parametric methods Using a parametric method means choosing a class of distributions (Weibull, normal, ecc.) characterized by few parameters (usually 2) and look for the best parameters; then one compares the results of different classes. Non parametric methods search a density in very large classes, having a very large number of degrees of freedom. Even such classes may be parametrized, but with too many parameters (sometimes infinitely many). Thus they are very fleible and fit data very closely. The previous histograms help us in the choice of the parametric class. For instance, we shall eclude Gaussians for Piro, as well as Beta, but eamine Weibull and possibly Gamma. Moreover, the decreasing shape of the histogram suggests shape 1. Vice versa, for Medi, Gaussians look suitable, although there is a mild asymmetry. Recall the way Gamma and Weibull are asymmetric; it is more natural to try Weibull. For Piro data there is an outlier, so presumably an heavy tail, or sub-eponential. Gamma are not sub-eponential. Weibull yes, if shape 1. Another class offered by R are log-normals. Summarizing, Gaussian and Weibull for Medi, Weibull and log-normal for Piro. One more distribution: log-normal If X is Gaussian or normal, the random variable Y e X is called log-normal. To be at the eponent (X), has the effect that Y takes very large values, sometimes. For instance, if X takes typical values in 2-4, but sometimes 5, the typical values of Y will be 7-55, but sometimes 150. It is eactly what happens to Piro. Parameters of log-normals are mean and standard deviation of the corresponding Gaussian. To mimic the numbers just given above, take a Gaussian with 3 and 1. We have: -1:100 y - dlnorm(,3,1) plot(,y)

4 y The only qualitative drawback of this distribution, for Piro, is the ascending initial step. But it is very fast, so we may choose to forget it. The heavy tail can be seen from the definition, the graph, or the density: f log 2 ep 2 2 for 0. Eponential and logarithm compensate and the decay is polynomial. A non parametric method Let us run: require(kernsmooth) density - bkde(piro, kernel normal, bandwidth 20) plot(density, type l ) density$y d e nsity$ density$y d e ns ity$ The package KernSmooth (kernel smoothing) is uploaded, since it is not default. The aim of this package is to find non parametric densities. using smoothing methods based on suitable kernels. There are several kernels. We try another one below. The feature of this method is to fit very closely our data. Run: hist(piro,15,freq FALSE) lines(density, type l )

5 Histogram of Piro The drawback, fir us, of this method, is its main feature: too close to these particular data. The precise value of the outlier has a physical meaning, of net time we may get 527 or 293? In this eample we think that has no absolute meaning. Thus the density given by kernel smoothing is not physical. Parameter estimate Assume we have chosen a class and we want to find optimal parameters. Two classical approaches are the method of Maimum Likelihood and the method of moments. We may also find the parameters optimizing other quantites, like the L 1 -distance described below. Let us describe here only ML. Given a density f, given an eperimental value, the number f is not the probability of (it is zero). It is called, however, likelihood of. Given a sample 1,..., n, the product Piro L 1,..., n f 1 f n is called likelihood of 1,..., n. When the density depends on parameters, say a,s, we write f a,s and L 1,..., n a,s. The ML method is: given a sample 1,..., n, find a,s which maimizes L 1,..., n a,s. If it were a probability, we could say: which is the choice of parameters that maimizes the probability or our sample? Since most probability densities are related to eponentials and products, taking logarithm is convenient: logl 1,..., n a,s. Maimizing it, it is equivalent. If this function is differentiable in a,s, and the possible maimum is inside the domain of definition, we must have a,s logl 1,..., n a,s 0. These are the ML equations. Sometimes, they can be solved eplicitly. Sometime else, numerical optimization is needed. Software R gives us a routine to compute ML estimates of parameters, for several classes of densities: fitdistr. In our cases: require(mass) fitdistr(piro, weibull ) fitdistr(piro, weibull, list(shape 0.5, scale 20)) fitdistr(piro, weibull, list(shape 2, scale 100)) fitdistr(piro, log-normal ) fitdistr(medi, normal )

6 mean(medi) sd(medi) The case fitdistr(medi, weibull ) gives error because of negative values. We cancel them in the file Medi.plus, and run fitdistr(medi.plus, weibull ) We also changed initial guesses of parameters in fitdistr(piro, weibull ) to check that the maimum did not change. We also checked that Gaussian fit is made just by taking empirical mean and deviation (the method of moments, in its simplest case). The results are: fitdistr(piro, weibull ): 0.85, fitdistr(piro, log-normal ): 3.09, 1.02 fitdistr(medi, normal ): 34.97, fitdistr(medi, weibull ): 3.58, Comparison between density and histogram The first idea is to compare density and histogram. Let us see Piro with Weibull and log-normal: a s (-0:5000)/10 hist(piro,15,freq FALSE) y -dweibull(,a,s) lines(,y) H is to g r a m o f P ir o P i r o H i s t o g r a m o f P ir o P i r o Both look reasonable, but comparison is very difficult. Not so different is Weibull with parameters a -0.8 s -100

7 H is t o g r a m o f P ir o P i r o The fit of the outlier looks improved, worsening a little bit elsewhere. We do not say that this kind of comparison is useless, simply that is is not trivial and final. Let us see Medi, gaussiana and Weibull: H is to g r a m o f m e d m e d H is to g r a m o f M e d i.p lu s M e d i.p lu s Both are very good. There is no evidence of improvement by Weibull to cope with asymmetry (Weibull, with those parameters, is almost symmetric). We have seen an eample, Medi, where the comparison density-histogram is convincing, another where it is poor. The presence of an outlier will always deteriorate a comparison density-histogram. Indeed, to be physical, a density must be distributed over a wide range, not only around the outlier. Comparison between cumulatives Another comparison is that of cumulatives, empirical and theoretical. For Piro, Weibull and log-normale, we have a s (-0:5000)/10 plot.ecdf(piro) y -pweibull(,a,s) lines(,y)

8 e c d f( ) Fn() e c d f( ) Fn() Here, for the first time, we have a hint of the superiority of log-normal. I we try again the Weibull a -0.8 s -100 we get e c d f ( ) Fn() which is much worse. Thus: the comparison of cumulatives is very informative. For Medi, Gaussian and Weibull: ecdf() Fn()

9 ecdf() Fn() Both look perfect. However, we notice a very small discrepancy in the tails. The right tail is better fitted by Weibull, the left tail by Gaussian, and not so much. Recall that Weibull of shape a decays as while Gaussian as ep 3.58 ep 2. The decay on the rught is very strong (even more than Weibull with a -3.58). The decay on the left is slower than Gaussian. Comparison between samples Another comparison, essentially heuritic, is based on the generation of a sample from the given distribution. Try with a s rweibull(19,a,s) Piro If we repeat this a few times, we usually get numbers similar to those of Piro, ecept that we do not get numbers of the order of 300, most often. The same for log-normal. This is the only hint, until now, that we have under-estimated the outlier. Traditional methods of fit have this tendency. One can see that the parameters m s -1.3 rlnorm(19,m,s) give us samples still similar to Piro but most of the times with outliers of the right order. Comparison between cumulatives is good: e c d f( ) Fn()

10 and we see why this is better for the outlier. Which case should we prefer? Q-Q plot Do describe this method, we need to give the definition of quantile. It is the inverse of the cdf. In all our eamples, the cdf F is continuous, strictly increasing (ecept maybe on half-lines). Therefore, given 0,1, there eists one and only one number q such that F q. The number q is called the quantile of order. For instance, if 5%, it is also called fifth percentile (if 25%, 25 percentile, and so on). Moreover, 25 percentile, 50 percentile, 75 percentile are also called first, second and third quartiles. The empirical cdf F is defined as follows: given a sample 1,..., n, we order it; if 1,..., n is the result, we set Some people prefer F i i n. F i i 0.5 n which is more symmetric. If a sample comes from a cdf F, we have F i nearly equal to F i. Compute the inverse of F, the quantile, and get that q F i is roughly equal to i q F i. But then the points i,q F i will be closed to the line y. We plot these points and get a feeling of the goodness of fit. For Piro, Weibull and log-normal: Dati - Piro a s quant - function() {qweibull(,a,s)} - 1:500 L - length(dati) F.hat - (1:L)/L - 0.5/L Dati.ord -sort(dati) plot(,, type l ) q - quant(f.hat)

11 Let us add the modified log-normale ( 1.3) which clearly shows what happens: the fit of the outlier is improved, the fit of some other points is worse. ML log-normal is better than ML Weibull; our modified log-normal is good as well and improves the outlier. For Medi, Gaussian, Weibull: L - length(medi) F.hat - (1:L)/L - 0.5/L Medi.ord -sort(medi) m s q - qnorm(f.hat,m,s) -(-0:700)/10 plot(,, type l ) lines(medi.ord,q, type b )

12 The result is surprising! We epected a very strong fit, and on the contrary we see so clarly the drawbacks of the tails. The problem is only there, the body of the distribution is perfect. The pictures seen until now were dominated by the body. This Q-Q plot confirms what seen previously: the decay on the right is very fast (a little more than Weibull with a -3.58, which however, is very good); slower than gaussian on the left. Numerical summaries, distances After several graphical comparisons, let us see some numerical ones. let us unticipate that they will not be so better than the graphical ones, but will add a few informations. One of the problems with them is that there are too many. If we use these indees to copare two given distributions, it may work, mosto of them will give the same order. If, on the contrary, we hope to use them to identify the optimal density in a class, or similarly to prove that the ML density is the best, we get in trouble. Usually, the optimal parameters depend on the inde. To summarize, a certain degree of subjectivity remains, cannot be eliminated, by the numerical indees. A distance between cumulatives Among many possible ones, particularly natural is the L 1 distance between empirical and theoretical cumulatives I : F F d. It measures the distance between the probability of events of the form X t, averaged in t. For simple dimensional and epository reasons, it may be convenient to use the following small variant, that we may call error of fit:

13 E 100 I ma min where ma and min are referred to the sample 1,..., n. The results for Piro are ML Weibull: I 6.13 ML log-normale: I 5.39, the best between the two modified log-normale: I 5.96, better than ML Weibull. Eercise Write R code which computes, for every positive number k, the inde I k : F F k d and the error of degree k: E k 100 I k ma min 1/k. Which discrepances between the densities are captured, as k? (Pay attention to the typical dimensions of the numbers involved). Are these values typical? We may use the error E to compare different densities, as above. We may use it to compute optimal parameters. But we may use it also as a statistical test, to understand, for instance, whether ML log-normal is acceptable or not in itself (not whether it is better than another density). We do it the following way. Consider ML log-normal. Generate from it a sample of cardinality 19 and compute its error E with respect to our log-normal. Repeat 1000 times, get 1000 values of E: e 1,...,e k A percentage of them will be greater than the value 1000 e obtained conparing the k eperimental sample with the log-normal. We interpret as the probability that, at 1000 random, from that log-normal we may get a sample like the eperimental one, so k etreme. Call empirical p-value the number, or k ecc. depending on the number of trials. If the p-value is small, e. 0.05, it means that it was not easy to get at random such a sample. This indicates that such log-normal is not natural enough. If, on the contrary, the p-value, is not so small, even some 0.15, we cannot eclude that the sample comes out from that distribution. At the end, we have a criterium to reject or not reject a distribution. Not reject does not mean a confirmation: several other distributions have the same property of non rejection. The code gives us E, p-value and histogram of e 1,...,e For Piro, ML log-normal: E 5.39, p-value 0.214

14 H is to g r a m o f * I.r a n d /R a n g e Frequency * I.ra n d /R a ng e (the p-value varies a little bit from trial to trial). We cannot reject this distribution; although this is an indication that the fit is not so good. Much worse is the result for ML Weibull: E 6.127, p-value H i s t o g r a m o f * I. r a n d / R a n g e Frequency * I.r a n d /R a n g e All methods confirm the superiority of log-normal fit. Eercise Find the p-value for the error of degree k introduced in the eercise above. Eercise Analyze the data of this lecture by means of class Gamma. Recall to use dgamma(,shape a,scale s), ecc.

Continuous Distributions

Continuous Distributions Quantitative Methods 2013 Continuous Distributions 1 The most important probability distribution in statistics is the normal distribution. Carl Friedrich Gauss (1777 1855) Normal curve A normal distribution

More information

Lecture 6: Non Normal Distributions

Lecture 6: Non Normal Distributions Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192 Financial Econometrics Spring 2015 Overview Non-normalities in (standardized) residuals from asset return

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, 2013 Abstract Review summary statistics and measures of location. Discuss the placement exam as an exercise

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright Faculty and Institute of Actuaries Claims Reserving Manual v.2 (09/1997) Section D7 [D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright 1. Introduction

More information

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 5, 2015

More information

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR Nelson Mark University of Notre Dame Fall 2017 September 11, 2017 Introduction

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and

More information

STAT 113 Variability

STAT 113 Variability STAT 113 Variability Colin Reimer Dawson Oberlin College September 14, 2017 1 / 48 Outline Last Time: Shape and Center Variability Boxplots and the IQR Variance and Standard Deviaton Transformations 2

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION Subject Paper No and Title Module No and Title Paper No.2: QUANTITATIVE METHODS Module No.7: NORMAL DISTRIBUTION Module Tag PSY_P2_M 7 TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Properties

More information

Assessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

Assessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College Introductory Statistics Lectures Assessing Normality Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the author 2009 (Compile

More information

Simple Descriptive Statistics

Simple Descriptive Statistics Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency

More information

The distribution of the Return on Capital Employed (ROCE)

The distribution of the Return on Capital Employed (ROCE) Appendix A The historical distribution of Return on Capital Employed (ROCE) was studied between 2003 and 2012 for a sample of Italian firms with revenues between euro 10 million and euro 50 million. 1

More information

Probability distributions relevant to radiowave propagation modelling

Probability distributions relevant to radiowave propagation modelling Rec. ITU-R P.57 RECOMMENDATION ITU-R P.57 PROBABILITY DISTRIBUTIONS RELEVANT TO RADIOWAVE PROPAGATION MODELLING (994) Rec. ITU-R P.57 The ITU Radiocommunication Assembly, considering a) that the propagation

More information

Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk?

Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk? Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk? Ramon Alemany, Catalina Bolancé and Montserrat Guillén Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter

More information

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics

Introduction to Computational Finance and Financial Econometrics Descriptive Statistics You can t see this text! Introduction to Computational Finance and Financial Econometrics Descriptive Statistics Eric Zivot Summer 2015 Eric Zivot (Copyright 2015) Descriptive Statistics 1 / 28 Outline

More information

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E Basic Principles of Probability and Statistics Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E Definitions Risk Analysis Assessing probabilities of occurrence for each possible

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 12: Continuous Distributions Uniform Distribution Normal Distribution (motivation) Discrete vs Continuous

More information

Data Distributions and Normality

Data Distributions and Normality Data Distributions and Normality Definition (Non)Parametric Parametric statistics assume that data come from a normal distribution, and make inferences about parameters of that distribution. These statistical

More information

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2012 Prepared by: Thomas W. Engler, Ph.D., P.E

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2012 Prepared by: Thomas W. Engler, Ph.D., P.E Basic Principles of Probability and Statistics Lecture notes for PET 472 Spring 2012 Prepared by: Thomas W. Engler, Ph.D., P.E Definitions Risk Analysis Assessing probabilities of occurrence for each possible

More information

MVE051/MSG Lecture 7

MVE051/MSG Lecture 7 MVE051/MSG810 2017 Lecture 7 Petter Mostad Chalmers November 20, 2017 The purpose of collecting and analyzing data Purpose: To build and select models for parts of the real world (which can be used for

More information

Uncertainty Analysis with UNICORN

Uncertainty Analysis with UNICORN Uncertainty Analysis with UNICORN D.A.Ababei D.Kurowicka R.M.Cooke D.A.Ababei@ewi.tudelft.nl D.Kurowicka@ewi.tudelft.nl R.M.Cooke@ewi.tudelft.nl Delft Institute for Applied Mathematics Delft University

More information

The mathematical definitions are given on screen.

The mathematical definitions are given on screen. Text Lecture 3.3 Coherent measures of risk and back- testing Dear all, welcome back. In this class we will discuss one of the main drawbacks of Value- at- Risk, that is to say the fact that the VaR, as

More information

yuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0

yuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0 yuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0 Emanuele Guidotti, Stefano M. Iacus and Lorenzo Mercuri February 21, 2017 Contents 1 yuimagui: Home 3 2 yuimagui: Data

More information

22.2 Shape, Center, and Spread

22.2 Shape, Center, and Spread Name Class Date 22.2 Shape, Center, and Spread Essential Question: Which measures of center and spread are appropriate for a normal distribution, and which are appropriate for a skewed distribution? Eplore

More information

Introduction to Algorithmic Trading Strategies Lecture 8

Introduction to Algorithmic Trading Strategies Lecture 8 Introduction to Algorithmic Trading Strategies Lecture 8 Risk Management Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Value at Risk (VaR) Extreme Value Theory (EVT) References

More information

The normal distribution is a theoretical model derived mathematically and not empirically.

The normal distribution is a theoretical model derived mathematically and not empirically. Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.

More information

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of

More information

2 DESCRIPTIVE STATISTICS

2 DESCRIPTIVE STATISTICS Chapter 2 Descriptive Statistics 47 2 DESCRIPTIVE STATISTICS Figure 2.1 When you have large amounts of data, you will need to organize it in a way that makes sense. These ballots from an election are rolled

More information

2.1 Properties of PDFs

2.1 Properties of PDFs 2.1 Properties of PDFs mode median epectation values moments mean variance skewness kurtosis 2.1: 1/13 Mode The mode is the most probable outcome. It is often given the symbol, µ ma. For a continuous random

More information

Asset Allocation Model with Tail Risk Parity

Asset Allocation Model with Tail Risk Parity Proceedings of the Asia Pacific Industrial Engineering & Management Systems Conference 2017 Asset Allocation Model with Tail Risk Parity Hirotaka Kato Graduate School of Science and Technology Keio University,

More information

A New Hybrid Estimation Method for the Generalized Pareto Distribution

A New Hybrid Estimation Method for the Generalized Pareto Distribution A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD

More information

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem 1.1.2 Normal distribution 1.1.3 Approimating binomial distribution by normal 2.1 Central Limit Theorem Prof. Tesler Math 283 Fall 216 Prof. Tesler 1.1.2-3, 2.1 Normal distribution Math 283 / Fall 216 1

More information

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual. Chapter 06: The Standard Deviation as a Ruler and the Normal Model This is the worst chapter title ever! This chapter is about the most important random variable distribution of them all the normal distribution.

More information

Commonly Used Distributions

Commonly Used Distributions Chapter 4: Commonly Used Distributions 1 Introduction Statistical inference involves drawing a sample from a population and analyzing the sample data to learn about the population. We often have some knowledge

More information

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Exam 2 Spring 2015 Statistics for Applications 4/9/2015 18.443 Exam 2 Spring 2015 Statistics for Applications 4/9/2015 1. True or False (and state why). (a). The significance level of a statistical test is not equal to the probability that the null hypothesis

More information

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Statistics 431 Spring 2007 P. Shaman. Preliminaries Statistics 4 Spring 007 P. Shaman The Binomial Distribution Preliminaries A binomial experiment is defined by the following conditions: A sequence of n trials is conducted, with each trial having two possible

More information

x is a random variable which is a numerical description of the outcome of an experiment.

x is a random variable which is a numerical description of the outcome of an experiment. Chapter 5 Discrete Probability Distributions Random Variables is a random variable which is a numerical description of the outcome of an eperiment. Discrete: If the possible values change by steps or jumps.

More information

Lecture 2 Describing Data

Lecture 2 Describing Data Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

4.3 Normal distribution

4.3 Normal distribution 43 Normal distribution Prof Tesler Math 186 Winter 216 Prof Tesler 43 Normal distribution Math 186 / Winter 216 1 / 4 Normal distribution aka Bell curve and Gaussian distribution The normal distribution

More information

Software Tutorial ormal Statistics

Software Tutorial ormal Statistics Software Tutorial ormal Statistics The example session with the teaching software, PG2000, which is described below is intended as an example run to familiarise the user with the package. This documented

More information

Describing Uncertain Variables

Describing Uncertain Variables Describing Uncertain Variables L7 Uncertainty in Variables Uncertainty in concepts and models Uncertainty in variables Lack of precision Lack of knowledge Variability in space/time Describing Uncertainty

More information

2 Exploring Univariate Data

2 Exploring Univariate Data 2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting

More information

ECON 214 Elements of Statistics for Economists

ECON 214 Elements of Statistics for Economists ECON 214 Elements of Statistics for Economists Session 7 The Normal Distribution Part 1 Lecturer: Dr. Bernardin Senadza, Dept. of Economics Contact Information: bsenadza@ug.edu.gh College of Education

More information

Chapter 15: Dynamic Programming

Chapter 15: Dynamic Programming Chapter 15: Dynamic Programming Dynamic programming is a general approach to making a sequence of interrelated decisions in an optimum way. While we can describe the general characteristics, the details

More information

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses

Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management.  > Teaching > Courses Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management www.symmys.com > Teaching > Courses Spring 2008, Monday 7:10 pm 9:30 pm, Room 303 Attilio Meucci

More information

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw MAS1403 Quantitative Methods for Business Management Semester 1, 2018 2019 Module leader: Dr. David Walshaw Additional lecturers: Dr. James Waldren and Dr. Stuart Hall Announcements: Written assignment

More information

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali Part I Descriptive Statistics 1 Introduction and Framework... 3 1.1 Population, Sample, and Observations... 3 1.2 Variables.... 4 1.2.1 Qualitative and Quantitative Variables.... 5 1.2.2 Discrete and Continuous

More information

STAT 157 HW1 Solutions

STAT 157 HW1 Solutions STAT 157 HW1 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/10/spring/stats157.dir/ Problem 1. 1.a: (6 points) Determine the Relative Frequency and the Cumulative Relative Frequency (fill

More information

Frequency Distribution Models 1- Probability Density Function (PDF)

Frequency Distribution Models 1- Probability Density Function (PDF) Models 1- Probability Density Function (PDF) What is a PDF model? A mathematical equation that describes the frequency curve or probability distribution of a data set. Why modeling? It represents and summarizes

More information

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions. ME3620 Theory of Engineering Experimentation Chapter III. Random Variables and Probability Distributions Chapter III 1 3.2 Random Variables In an experiment, a measurement is usually denoted by a variable

More information

Mongolia s TOP-20 Index Risk Analysis, Pt. 3

Mongolia s TOP-20 Index Risk Analysis, Pt. 3 Mongolia s TOP-20 Index Risk Analysis, Pt. 3 Federico M. Massari March 12, 2017 In the third part of our risk report on TOP-20 Index, Mongolia s main stock market indicator, we focus on modelling the right

More information

Section 6-1 : Numerical Summaries

Section 6-1 : Numerical Summaries MAT 2377 (Winter 2012) Section 6-1 : Numerical Summaries With a random experiment comes data. In these notes, we learn techniques to describe the data. Data : We will denote the n observations of the random

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

Economics 307: Intermediate Macroeconomic Theory A Brief Mathematical Primer

Economics 307: Intermediate Macroeconomic Theory A Brief Mathematical Primer Economics 07: Intermediate Macroeconomic Theory A Brief Mathematical Primer Calculus: Much of economics is based upon mathematical models that attempt to describe various economic relationships. You have

More information

AP Statistics Chapter 6 - Random Variables

AP Statistics Chapter 6 - Random Variables AP Statistics Chapter 6 - Random 6.1 Discrete and Continuous Random Objective: Recognize and define discrete random variables, and construct a probability distribution table and a probability histogram

More information

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M. adjustment coefficient, 272 and Cramér Lundberg approximation, 302 existence, 279 and Lundberg s inequality, 272 numerical methods for, 303 properties, 272 and reinsurance (case study), 348 statistical

More information

Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 3: PARAMETRIC FAMILIES OF UNIVARIATE DISTRIBUTIONS 1 Why do we need distributions?

More information

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality Point Estimation Some General Concepts of Point Estimation Statistical inference = conclusions about parameters Parameters == population characteristics A point estimate of a parameter is a value (based

More information

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION

MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION International Days of Statistics and Economics, Prague, September -3, MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION Diana Bílková Abstract Using L-moments

More information

Statistics and Probability

Statistics and Probability Statistics and Probability Continuous RVs (Normal); Confidence Intervals Outline Continuous random variables Normal distribution CLT Point estimation Confidence intervals http://www.isrec.isb-sib.ch/~darlene/geneve/

More information

The following content is provided under a Creative Commons license. Your support

The following content is provided under a Creative Commons license. Your support MITOCW Recitation 6 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make

More information

Quality Digest Daily, March 2, 2015 Manuscript 279. Probability Limits. A long standing controversy. Donald J. Wheeler

Quality Digest Daily, March 2, 2015 Manuscript 279. Probability Limits. A long standing controversy. Donald J. Wheeler Quality Digest Daily, March 2, 2015 Manuscript 279 A long standing controversy Donald J. Wheeler Shewhart explored many ways of detecting process changes. Along the way he considered the analysis of variance,

More information

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi Chapter 4: Commonly Used Distributions Statistics for Engineers and Scientists Fourth Edition William Navidi 2014 by Education. This is proprietary material solely for authorized instructor use. Not authorized

More information

THE USE OF THE LOGNORMAL DISTRIBUTION IN ANALYZING INCOMES

THE USE OF THE LOGNORMAL DISTRIBUTION IN ANALYZING INCOMES International Days of tatistics and Economics Prague eptember -3 011 THE UE OF THE LOGNORMAL DITRIBUTION IN ANALYZING INCOME Jakub Nedvěd Abstract Object of this paper is to examine the possibility of

More information

4.3 The money-making machine.

4.3 The money-making machine. . The money-making machine. You have access to a magical money making machine. You can put in any amount of money you want, between and $, and pull the big brass handle, and some payoff will come pouring

More information

QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016

QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016 QQ PLOT INTERPRETATION: Quantiles: QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016 The quantiles are values dividing a probability distribution into equal intervals, with every interval having

More information

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.) Starter Ch. 6: A z-score Analysis Starter Ch. 6 Your Statistics teacher has announced that the lower of your two tests will be dropped. You got a 90 on test 1 and an 85 on test 2. You re all set to drop

More information

Modeling Obesity and S&P500 Using Normal Inverse Gaussian

Modeling Obesity and S&P500 Using Normal Inverse Gaussian Modeling Obesity and S&P500 Using Normal Inverse Gaussian Presented by Keith Resendes and Jorge Fernandes University of Massachusetts, Dartmouth August 16, 2012 Diabetes and Obesity Data Data obtained

More information

EconS Income E ects

EconS Income E ects EconS 305 - Income E ects Eric Dunaway Washington State University eric.dunaway@wsu.edu September 23, 2015 Eric Dunaway (WSU) EconS 305 - Lecture 13 September 23, 2015 1 / 41 Introduction Over the net

More information

3. The Discount Factor

3. The Discount Factor 3. he Discount Factor Objectives Eplanation of - Eistence of Discount Factors: Necessary and Sufficient Conditions - Positive Discount Factors: Necessary and Sufficient Conditions Contents 3. he Discount

More information

9. Logit and Probit Models For Dichotomous Data

9. Logit and Probit Models For Dichotomous Data Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar

More information

1 Describing Distributions with numbers

1 Describing Distributions with numbers 1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write

More information

Most of the transformations we will deal with will be in the families of powers and roots: p X -> (X -1)/-1.

Most of the transformations we will deal with will be in the families of powers and roots: p X -> (X -1)/-1. Powers and Roots Quite often when we re dealing with quantitative data, it turns out that for the purposes of analysis, it is useful to carry out a transformation of one of the variables of interest. This

More information

Appendix A. Selecting and Using Probability Distributions. In this appendix

Appendix A. Selecting and Using Probability Distributions. In this appendix Appendix A Selecting and Using Probability Distributions In this appendix Understanding probability distributions Selecting a probability distribution Using basic distributions Using continuous distributions

More information

The mean-variance portfolio choice framework and its generalizations

The mean-variance portfolio choice framework and its generalizations The mean-variance portfolio choice framework and its generalizations Prof. Massimo Guidolin 20135 Theory of Finance, Part I (Sept. October) Fall 2014 Outline and objectives The backward, three-step solution

More information

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 13 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 017 by D.B. Rowe 1 Agenda: Recap Chapter 6.3 6.5 Lecture Chapter 7.1 7. Review Chapter 5 for Eam 3.

More information

Terms & Characteristics

Terms & Characteristics NORMAL CURVE Knowledge that a variable is distributed normally can be helpful in drawing inferences as to how frequently certain observations are likely to occur. NORMAL CURVE A Normal distribution: Distribution

More information

Fitting financial time series returns distributions: a mixture normality approach

Fitting financial time series returns distributions: a mixture normality approach Fitting financial time series returns distributions: a mixture normality approach Riccardo Bramante and Diego Zappa * Abstract Value at Risk has emerged as a useful tool to risk management. A relevant

More information

The Normal Distribution

The Normal Distribution Stat 6 Introduction to Business Statistics I Spring 009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:300:50 a.m. Chapter, Section.3 The Normal Distribution Density Curves So far we

More information

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.

More information

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL

MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL Isariya Suttakulpiboon MSc in Risk Management and Insurance Georgia State University, 30303 Atlanta, Georgia Email: suttakul.i@gmail.com,

More information

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is: **BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,

More information

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution? Distributions 1. What are distributions? When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution? In other words, if we have a large number of

More information

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,

More information

Lecture 1: The Econometrics of Financial Returns

Lecture 1: The Econometrics of Financial Returns Lecture 1: The Econometrics of Financial Returns Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2016 Overview General goals of the course and definition of risk(s) Predicting asset returns:

More information

The probability of having a very tall person in our sample. We look to see how this random variable is distributed.

The probability of having a very tall person in our sample. We look to see how this random variable is distributed. Distributions We're doing things a bit differently than in the text (it's very similar to BIOL 214/312 if you've had either of those courses). 1. What are distributions? When we look at a random variable,

More information

We use probability distributions to represent the distribution of a discrete random variable.

We use probability distributions to represent the distribution of a discrete random variable. Now we focus on discrete random variables. We will look at these in general, including calculating the mean and standard deviation. Then we will look more in depth at binomial random variables which are

More information

Sampling Distributions

Sampling Distributions AP Statistics Ch. 7 Notes Sampling Distributions A major field of statistics is statistical inference, which is using information from a sample to draw conclusions about a wider population. Parameter:

More information

The proof of Twin Primes Conjecture. Author: Ramón Ruiz Barcelona, Spain August 2014

The proof of Twin Primes Conjecture. Author: Ramón Ruiz Barcelona, Spain   August 2014 The proof of Twin Primes Conjecture Author: Ramón Ruiz Barcelona, Spain Email: ramonruiz1742@gmail.com August 2014 Abstract. Twin Primes Conjecture statement: There are infinitely many primes p such that

More information

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according STAT 345 Spring 2018 Homework 9 - Point Estimation Name: Please adhere to the homework rules as given in the Syllabus. 1. Mean Squared Error. Suppose that X 1, X 2 and X 3 are independent random variables

More information

Statistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015

Statistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015 Statistical Analysis of Data from the Stock Markets UiO-STK4510 Autumn 2015 Sampling Conventions We observe the price process S of some stock (or stock index) at times ft i g i=0,...,n, we denote it by

More information

Chapter 8 Statistical Intervals for a Single Sample

Chapter 8 Statistical Intervals for a Single Sample Chapter 8 Statistical Intervals for a Single Sample Part 1: Confidence intervals (CI) for population mean µ Section 8-1: CI for µ when σ 2 known & drawing from normal distribution Section 8-1.2: Sample

More information

MA300.2 Game Theory 2005, LSE

MA300.2 Game Theory 2005, LSE MA300.2 Game Theory 2005, LSE Answers to Problem Set 2 [1] (a) This is standard (we have even done it in class). The one-shot Cournot outputs can be computed to be A/3, while the payoff to each firm can

More information

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI 88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical

More information

Approximate Revenue Maximization with Multiple Items

Approximate Revenue Maximization with Multiple Items Approximate Revenue Maximization with Multiple Items Nir Shabbat - 05305311 December 5, 2012 Introduction The paper I read is called Approximate Revenue Maximization with Multiple Items by Sergiu Hart

More information