H i s t o g r a m o f P ir o. P i r o. H i s t o g r a m o f P i r o. P i r o
|
|
- Kimberly Atkinson
- 5 years ago
- Views:
Transcription
1 fit Lecture 3 Common problem in applications: find a density which fits well an eperimental sample. Given a sample 1,..., n, we look for a density f which may generate that sample. There eist infinitely many such densities, for a given sample (think to the case n 1). But, for some of them the sample is natural, typical, for others it is etreme, unusual, even if possible. We look for a density such that the sample is typical for it. Let us treat two eamples: the result of a test for future students in medicine, (large and regular sample), the intensity of the last 19 volcanic eruptions at Campi Flegrei (few data, one outlier). Load on R the file dati_campi_flegrei.tt (first column, ecept last component), save them in the vector Piro: A -read.table(file dati_campi_flegrei.tt,header TRUE) Piro - A[1:19,1] Load also test_medicina.tt, saved in the vector Medi. B -read.table(file test_medicina.tt,header TRUE) Medi - B[,2] These are Piro data: 5.4, 9.3, 23.4, 10, 27.6, 29.5, 52.9, 44.3, 18.3, 38.7, 7.4, 347.6, 5.3, 19.1, 44.3, 29.5, 71.2, 5.4, 18.1 Histograms and empiric cumulatives An histogram is a kind of empiric density. But it is not uniquely determined from data: it depends on the classes. Let us see two histograms of Piro, hist(piro) and hist(piro,15): H i s t o g r a m o f P ir o Frequency P i r o H i s t o g r a m o f P i r o Frequency P i r o They have absolute frequences. If we want area one under the graph, let us use hist(x,15,freq FALSE):
2 H is to g r a m o f P ir o P i ro We get a first idea of data and probability of different values.. Due to the outlier 347.6, most of the histogram is squeezed to the left. We may epand it by Piro.cut - c(piro[1:11],piro[13:19]) hist(piro.cut,7,freq FALSE) H is to g r a m o f P ir o.c u t P i ro.c u t From the epansion we not that there is no ascending part on the left, as we have in Weibull or Gamma distributions with shape 1. Thus, if we use Weibull, we choose shape 1. Much more regular is the histrogram of Medi: H is t o g r a m o f m e d m e d Let us plot the empirical cumulative plot.ecdf(piro). It is absolute, no choice of classes. For Piro and Medi: e c d f ( ) Fn()
3 e c d f ( ) Fn() Parametric and non parametric methods Using a parametric method means choosing a class of distributions (Weibull, normal, ecc.) characterized by few parameters (usually 2) and look for the best parameters; then one compares the results of different classes. Non parametric methods search a density in very large classes, having a very large number of degrees of freedom. Even such classes may be parametrized, but with too many parameters (sometimes infinitely many). Thus they are very fleible and fit data very closely. The previous histograms help us in the choice of the parametric class. For instance, we shall eclude Gaussians for Piro, as well as Beta, but eamine Weibull and possibly Gamma. Moreover, the decreasing shape of the histogram suggests shape 1. Vice versa, for Medi, Gaussians look suitable, although there is a mild asymmetry. Recall the way Gamma and Weibull are asymmetric; it is more natural to try Weibull. For Piro data there is an outlier, so presumably an heavy tail, or sub-eponential. Gamma are not sub-eponential. Weibull yes, if shape 1. Another class offered by R are log-normals. Summarizing, Gaussian and Weibull for Medi, Weibull and log-normal for Piro. One more distribution: log-normal If X is Gaussian or normal, the random variable Y e X is called log-normal. To be at the eponent (X), has the effect that Y takes very large values, sometimes. For instance, if X takes typical values in 2-4, but sometimes 5, the typical values of Y will be 7-55, but sometimes 150. It is eactly what happens to Piro. Parameters of log-normals are mean and standard deviation of the corresponding Gaussian. To mimic the numbers just given above, take a Gaussian with 3 and 1. We have: -1:100 y - dlnorm(,3,1) plot(,y)
4 y The only qualitative drawback of this distribution, for Piro, is the ascending initial step. But it is very fast, so we may choose to forget it. The heavy tail can be seen from the definition, the graph, or the density: f log 2 ep 2 2 for 0. Eponential and logarithm compensate and the decay is polynomial. A non parametric method Let us run: require(kernsmooth) density - bkde(piro, kernel normal, bandwidth 20) plot(density, type l ) density$y d e nsity$ density$y d e ns ity$ The package KernSmooth (kernel smoothing) is uploaded, since it is not default. The aim of this package is to find non parametric densities. using smoothing methods based on suitable kernels. There are several kernels. We try another one below. The feature of this method is to fit very closely our data. Run: hist(piro,15,freq FALSE) lines(density, type l )
5 Histogram of Piro The drawback, fir us, of this method, is its main feature: too close to these particular data. The precise value of the outlier has a physical meaning, of net time we may get 527 or 293? In this eample we think that has no absolute meaning. Thus the density given by kernel smoothing is not physical. Parameter estimate Assume we have chosen a class and we want to find optimal parameters. Two classical approaches are the method of Maimum Likelihood and the method of moments. We may also find the parameters optimizing other quantites, like the L 1 -distance described below. Let us describe here only ML. Given a density f, given an eperimental value, the number f is not the probability of (it is zero). It is called, however, likelihood of. Given a sample 1,..., n, the product Piro L 1,..., n f 1 f n is called likelihood of 1,..., n. When the density depends on parameters, say a,s, we write f a,s and L 1,..., n a,s. The ML method is: given a sample 1,..., n, find a,s which maimizes L 1,..., n a,s. If it were a probability, we could say: which is the choice of parameters that maimizes the probability or our sample? Since most probability densities are related to eponentials and products, taking logarithm is convenient: logl 1,..., n a,s. Maimizing it, it is equivalent. If this function is differentiable in a,s, and the possible maimum is inside the domain of definition, we must have a,s logl 1,..., n a,s 0. These are the ML equations. Sometimes, they can be solved eplicitly. Sometime else, numerical optimization is needed. Software R gives us a routine to compute ML estimates of parameters, for several classes of densities: fitdistr. In our cases: require(mass) fitdistr(piro, weibull ) fitdistr(piro, weibull, list(shape 0.5, scale 20)) fitdistr(piro, weibull, list(shape 2, scale 100)) fitdistr(piro, log-normal ) fitdistr(medi, normal )
6 mean(medi) sd(medi) The case fitdistr(medi, weibull ) gives error because of negative values. We cancel them in the file Medi.plus, and run fitdistr(medi.plus, weibull ) We also changed initial guesses of parameters in fitdistr(piro, weibull ) to check that the maimum did not change. We also checked that Gaussian fit is made just by taking empirical mean and deviation (the method of moments, in its simplest case). The results are: fitdistr(piro, weibull ): 0.85, fitdistr(piro, log-normal ): 3.09, 1.02 fitdistr(medi, normal ): 34.97, fitdistr(medi, weibull ): 3.58, Comparison between density and histogram The first idea is to compare density and histogram. Let us see Piro with Weibull and log-normal: a s (-0:5000)/10 hist(piro,15,freq FALSE) y -dweibull(,a,s) lines(,y) H is to g r a m o f P ir o P i r o H i s t o g r a m o f P ir o P i r o Both look reasonable, but comparison is very difficult. Not so different is Weibull with parameters a -0.8 s -100
7 H is t o g r a m o f P ir o P i r o The fit of the outlier looks improved, worsening a little bit elsewhere. We do not say that this kind of comparison is useless, simply that is is not trivial and final. Let us see Medi, gaussiana and Weibull: H is to g r a m o f m e d m e d H is to g r a m o f M e d i.p lu s M e d i.p lu s Both are very good. There is no evidence of improvement by Weibull to cope with asymmetry (Weibull, with those parameters, is almost symmetric). We have seen an eample, Medi, where the comparison density-histogram is convincing, another where it is poor. The presence of an outlier will always deteriorate a comparison density-histogram. Indeed, to be physical, a density must be distributed over a wide range, not only around the outlier. Comparison between cumulatives Another comparison is that of cumulatives, empirical and theoretical. For Piro, Weibull and log-normale, we have a s (-0:5000)/10 plot.ecdf(piro) y -pweibull(,a,s) lines(,y)
8 e c d f( ) Fn() e c d f( ) Fn() Here, for the first time, we have a hint of the superiority of log-normal. I we try again the Weibull a -0.8 s -100 we get e c d f ( ) Fn() which is much worse. Thus: the comparison of cumulatives is very informative. For Medi, Gaussian and Weibull: ecdf() Fn()
9 ecdf() Fn() Both look perfect. However, we notice a very small discrepancy in the tails. The right tail is better fitted by Weibull, the left tail by Gaussian, and not so much. Recall that Weibull of shape a decays as while Gaussian as ep 3.58 ep 2. The decay on the rught is very strong (even more than Weibull with a -3.58). The decay on the left is slower than Gaussian. Comparison between samples Another comparison, essentially heuritic, is based on the generation of a sample from the given distribution. Try with a s rweibull(19,a,s) Piro If we repeat this a few times, we usually get numbers similar to those of Piro, ecept that we do not get numbers of the order of 300, most often. The same for log-normal. This is the only hint, until now, that we have under-estimated the outlier. Traditional methods of fit have this tendency. One can see that the parameters m s -1.3 rlnorm(19,m,s) give us samples still similar to Piro but most of the times with outliers of the right order. Comparison between cumulatives is good: e c d f( ) Fn()
10 and we see why this is better for the outlier. Which case should we prefer? Q-Q plot Do describe this method, we need to give the definition of quantile. It is the inverse of the cdf. In all our eamples, the cdf F is continuous, strictly increasing (ecept maybe on half-lines). Therefore, given 0,1, there eists one and only one number q such that F q. The number q is called the quantile of order. For instance, if 5%, it is also called fifth percentile (if 25%, 25 percentile, and so on). Moreover, 25 percentile, 50 percentile, 75 percentile are also called first, second and third quartiles. The empirical cdf F is defined as follows: given a sample 1,..., n, we order it; if 1,..., n is the result, we set Some people prefer F i i n. F i i 0.5 n which is more symmetric. If a sample comes from a cdf F, we have F i nearly equal to F i. Compute the inverse of F, the quantile, and get that q F i is roughly equal to i q F i. But then the points i,q F i will be closed to the line y. We plot these points and get a feeling of the goodness of fit. For Piro, Weibull and log-normal: Dati - Piro a s quant - function() {qweibull(,a,s)} - 1:500 L - length(dati) F.hat - (1:L)/L - 0.5/L Dati.ord -sort(dati) plot(,, type l ) q - quant(f.hat)
11 Let us add the modified log-normale ( 1.3) which clearly shows what happens: the fit of the outlier is improved, the fit of some other points is worse. ML log-normal is better than ML Weibull; our modified log-normal is good as well and improves the outlier. For Medi, Gaussian, Weibull: L - length(medi) F.hat - (1:L)/L - 0.5/L Medi.ord -sort(medi) m s q - qnorm(f.hat,m,s) -(-0:700)/10 plot(,, type l ) lines(medi.ord,q, type b )
12 The result is surprising! We epected a very strong fit, and on the contrary we see so clarly the drawbacks of the tails. The problem is only there, the body of the distribution is perfect. The pictures seen until now were dominated by the body. This Q-Q plot confirms what seen previously: the decay on the right is very fast (a little more than Weibull with a -3.58, which however, is very good); slower than gaussian on the left. Numerical summaries, distances After several graphical comparisons, let us see some numerical ones. let us unticipate that they will not be so better than the graphical ones, but will add a few informations. One of the problems with them is that there are too many. If we use these indees to copare two given distributions, it may work, mosto of them will give the same order. If, on the contrary, we hope to use them to identify the optimal density in a class, or similarly to prove that the ML density is the best, we get in trouble. Usually, the optimal parameters depend on the inde. To summarize, a certain degree of subjectivity remains, cannot be eliminated, by the numerical indees. A distance between cumulatives Among many possible ones, particularly natural is the L 1 distance between empirical and theoretical cumulatives I : F F d. It measures the distance between the probability of events of the form X t, averaged in t. For simple dimensional and epository reasons, it may be convenient to use the following small variant, that we may call error of fit:
13 E 100 I ma min where ma and min are referred to the sample 1,..., n. The results for Piro are ML Weibull: I 6.13 ML log-normale: I 5.39, the best between the two modified log-normale: I 5.96, better than ML Weibull. Eercise Write R code which computes, for every positive number k, the inde I k : F F k d and the error of degree k: E k 100 I k ma min 1/k. Which discrepances between the densities are captured, as k? (Pay attention to the typical dimensions of the numbers involved). Are these values typical? We may use the error E to compare different densities, as above. We may use it to compute optimal parameters. But we may use it also as a statistical test, to understand, for instance, whether ML log-normal is acceptable or not in itself (not whether it is better than another density). We do it the following way. Consider ML log-normal. Generate from it a sample of cardinality 19 and compute its error E with respect to our log-normal. Repeat 1000 times, get 1000 values of E: e 1,...,e k A percentage of them will be greater than the value 1000 e obtained conparing the k eperimental sample with the log-normal. We interpret as the probability that, at 1000 random, from that log-normal we may get a sample like the eperimental one, so k etreme. Call empirical p-value the number, or k ecc. depending on the number of trials. If the p-value is small, e. 0.05, it means that it was not easy to get at random such a sample. This indicates that such log-normal is not natural enough. If, on the contrary, the p-value, is not so small, even some 0.15, we cannot eclude that the sample comes out from that distribution. At the end, we have a criterium to reject or not reject a distribution. Not reject does not mean a confirmation: several other distributions have the same property of non rejection. The code gives us E, p-value and histogram of e 1,...,e For Piro, ML log-normal: E 5.39, p-value 0.214
14 H is to g r a m o f * I.r a n d /R a n g e Frequency * I.ra n d /R a ng e (the p-value varies a little bit from trial to trial). We cannot reject this distribution; although this is an indication that the fit is not so good. Much worse is the result for ML Weibull: E 6.127, p-value H i s t o g r a m o f * I. r a n d / R a n g e Frequency * I.r a n d /R a n g e All methods confirm the superiority of log-normal fit. Eercise Find the p-value for the error of degree k introduced in the eercise above. Eercise Analyze the data of this lecture by means of class Gamma. Recall to use dgamma(,shape a,scale s), ecc.
Continuous Distributions
Quantitative Methods 2013 Continuous Distributions 1 The most important probability distribution in statistics is the normal distribution. Carl Friedrich Gauss (1777 1855) Normal curve A normal distribution
More informationLecture 6: Non Normal Distributions
Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192 Financial Econometrics Spring 2015 Overview Non-normalities in (standardized) residuals from asset return
More informationBasic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract
Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, 2013 Abstract Review summary statistics and measures of location. Discuss the placement exam as an exercise
More informationBasic Procedure for Histograms
Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that
More information[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright
Faculty and Institute of Actuaries Claims Reserving Manual v.2 (09/1997) Section D7 [D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright 1. Introduction
More informationSYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data
SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 5, 2015
More informationFinancial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR
Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR Nelson Mark University of Notre Dame Fall 2017 September 11, 2017 Introduction
More informationECON 214 Elements of Statistics for Economists 2016/2017
ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and
More informationSTAT 113 Variability
STAT 113 Variability Colin Reimer Dawson Oberlin College September 14, 2017 1 / 48 Outline Last Time: Shape and Center Variability Boxplots and the IQR Variance and Standard Deviaton Transformations 2
More informationSome Characteristics of Data
Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key
More informationModule Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION
Subject Paper No and Title Module No and Title Paper No.2: QUANTITATIVE METHODS Module No.7: NORMAL DISTRIBUTION Module Tag PSY_P2_M 7 TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Properties
More informationAssessing Normality. Contents. 1 Assessing Normality. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College
Introductory Statistics Lectures Assessing Normality Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the author 2009 (Compile
More informationSimple Descriptive Statistics
Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency
More informationThe distribution of the Return on Capital Employed (ROCE)
Appendix A The historical distribution of Return on Capital Employed (ROCE) was studied between 2003 and 2012 for a sample of Italian firms with revenues between euro 10 million and euro 50 million. 1
More informationProbability distributions relevant to radiowave propagation modelling
Rec. ITU-R P.57 RECOMMENDATION ITU-R P.57 PROBABILITY DISTRIBUTIONS RELEVANT TO RADIOWAVE PROPAGATION MODELLING (994) Rec. ITU-R P.57 The ITU Radiocommunication Assembly, considering a) that the propagation
More informationCan we use kernel smoothing to estimate Value at Risk and Tail Value at Risk?
Can we use kernel smoothing to estimate Value at Risk and Tail Value at Risk? Ramon Alemany, Catalina Bolancé and Montserrat Guillén Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter
More informationIntroduction to Computational Finance and Financial Econometrics Descriptive Statistics
You can t see this text! Introduction to Computational Finance and Financial Econometrics Descriptive Statistics Eric Zivot Summer 2015 Eric Zivot (Copyright 2015) Descriptive Statistics 1 / 28 Outline
More informationBasic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E
Basic Principles of Probability and Statistics Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E Definitions Risk Analysis Assessing probabilities of occurrence for each possible
More informationCS 237: Probability in Computing
CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 12: Continuous Distributions Uniform Distribution Normal Distribution (motivation) Discrete vs Continuous
More informationData Distributions and Normality
Data Distributions and Normality Definition (Non)Parametric Parametric statistics assume that data come from a normal distribution, and make inferences about parameters of that distribution. These statistical
More informationBasic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2012 Prepared by: Thomas W. Engler, Ph.D., P.E
Basic Principles of Probability and Statistics Lecture notes for PET 472 Spring 2012 Prepared by: Thomas W. Engler, Ph.D., P.E Definitions Risk Analysis Assessing probabilities of occurrence for each possible
More informationMVE051/MSG Lecture 7
MVE051/MSG810 2017 Lecture 7 Petter Mostad Chalmers November 20, 2017 The purpose of collecting and analyzing data Purpose: To build and select models for parts of the real world (which can be used for
More informationUncertainty Analysis with UNICORN
Uncertainty Analysis with UNICORN D.A.Ababei D.Kurowicka R.M.Cooke D.A.Ababei@ewi.tudelft.nl D.Kurowicka@ewi.tudelft.nl R.M.Cooke@ewi.tudelft.nl Delft Institute for Applied Mathematics Delft University
More informationThe mathematical definitions are given on screen.
Text Lecture 3.3 Coherent measures of risk and back- testing Dear all, welcome back. In this class we will discuss one of the main drawbacks of Value- at- Risk, that is to say the fact that the VaR, as
More informationyuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0
yuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0 Emanuele Guidotti, Stefano M. Iacus and Lorenzo Mercuri February 21, 2017 Contents 1 yuimagui: Home 3 2 yuimagui: Data
More information22.2 Shape, Center, and Spread
Name Class Date 22.2 Shape, Center, and Spread Essential Question: Which measures of center and spread are appropriate for a normal distribution, and which are appropriate for a skewed distribution? Eplore
More informationIntroduction to Algorithmic Trading Strategies Lecture 8
Introduction to Algorithmic Trading Strategies Lecture 8 Risk Management Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Value at Risk (VaR) Extreme Value Theory (EVT) References
More informationThe normal distribution is a theoretical model derived mathematically and not empirically.
Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.
More informationBoth the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.
Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need. For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of
More information2 DESCRIPTIVE STATISTICS
Chapter 2 Descriptive Statistics 47 2 DESCRIPTIVE STATISTICS Figure 2.1 When you have large amounts of data, you will need to organize it in a way that makes sense. These ballots from an election are rolled
More information2.1 Properties of PDFs
2.1 Properties of PDFs mode median epectation values moments mean variance skewness kurtosis 2.1: 1/13 Mode The mode is the most probable outcome. It is often given the symbol, µ ma. For a continuous random
More informationAsset Allocation Model with Tail Risk Parity
Proceedings of the Asia Pacific Industrial Engineering & Management Systems Conference 2017 Asset Allocation Model with Tail Risk Parity Hirotaka Kato Graduate School of Science and Technology Keio University,
More informationA New Hybrid Estimation Method for the Generalized Pareto Distribution
A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD
More informationNormal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem
1.1.2 Normal distribution 1.1.3 Approimating binomial distribution by normal 2.1 Central Limit Theorem Prof. Tesler Math 283 Fall 216 Prof. Tesler 1.1.2-3, 2.1 Normal distribution Math 283 / Fall 216 1
More informationSince his score is positive, he s above average. Since his score is not close to zero, his score is unusual.
Chapter 06: The Standard Deviation as a Ruler and the Normal Model This is the worst chapter title ever! This chapter is about the most important random variable distribution of them all the normal distribution.
More informationCommonly Used Distributions
Chapter 4: Commonly Used Distributions 1 Introduction Statistical inference involves drawing a sample from a population and analyzing the sample data to learn about the population. We often have some knowledge
More informationExam 2 Spring 2015 Statistics for Applications 4/9/2015
18.443 Exam 2 Spring 2015 Statistics for Applications 4/9/2015 1. True or False (and state why). (a). The significance level of a statistical test is not equal to the probability that the null hypothesis
More informationStatistics 431 Spring 2007 P. Shaman. Preliminaries
Statistics 4 Spring 007 P. Shaman The Binomial Distribution Preliminaries A binomial experiment is defined by the following conditions: A sequence of n trials is conducted, with each trial having two possible
More informationx is a random variable which is a numerical description of the outcome of an experiment.
Chapter 5 Discrete Probability Distributions Random Variables is a random variable which is a numerical description of the outcome of an eperiment. Discrete: If the possible values change by steps or jumps.
More informationLecture 2 Describing Data
Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms
More informationRandom Variables and Probability Distributions
Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering
More information4.3 Normal distribution
43 Normal distribution Prof Tesler Math 186 Winter 216 Prof Tesler 43 Normal distribution Math 186 / Winter 216 1 / 4 Normal distribution aka Bell curve and Gaussian distribution The normal distribution
More informationSoftware Tutorial ormal Statistics
Software Tutorial ormal Statistics The example session with the teaching software, PG2000, which is described below is intended as an example run to familiarise the user with the package. This documented
More informationDescribing Uncertain Variables
Describing Uncertain Variables L7 Uncertainty in Variables Uncertainty in concepts and models Uncertainty in variables Lack of precision Lack of knowledge Variability in space/time Describing Uncertainty
More information2 Exploring Univariate Data
2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting
More informationECON 214 Elements of Statistics for Economists
ECON 214 Elements of Statistics for Economists Session 7 The Normal Distribution Part 1 Lecturer: Dr. Bernardin Senadza, Dept. of Economics Contact Information: bsenadza@ug.edu.gh College of Education
More informationChapter 15: Dynamic Programming
Chapter 15: Dynamic Programming Dynamic programming is a general approach to making a sequence of interrelated decisions in an optimum way. While we can describe the general characteristics, the details
More informationMaster s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management. > Teaching > Courses
Master s in Financial Engineering Foundations of Buy-Side Finance: Quantitative Risk and Portfolio Management www.symmys.com > Teaching > Courses Spring 2008, Monday 7:10 pm 9:30 pm, Room 303 Attilio Meucci
More informationMAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw
MAS1403 Quantitative Methods for Business Management Semester 1, 2018 2019 Module leader: Dr. David Walshaw Additional lecturers: Dr. James Waldren and Dr. Stuart Hall Announcements: Written assignment
More informationContents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali
Part I Descriptive Statistics 1 Introduction and Framework... 3 1.1 Population, Sample, and Observations... 3 1.2 Variables.... 4 1.2.1 Qualitative and Quantitative Variables.... 5 1.2.2 Discrete and Continuous
More informationSTAT 157 HW1 Solutions
STAT 157 HW1 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/10/spring/stats157.dir/ Problem 1. 1.a: (6 points) Determine the Relative Frequency and the Cumulative Relative Frequency (fill
More informationFrequency Distribution Models 1- Probability Density Function (PDF)
Models 1- Probability Density Function (PDF) What is a PDF model? A mathematical equation that describes the frequency curve or probability distribution of a data set. Why modeling? It represents and summarizes
More informationME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.
ME3620 Theory of Engineering Experimentation Chapter III. Random Variables and Probability Distributions Chapter III 1 3.2 Random Variables In an experiment, a measurement is usually denoted by a variable
More informationMongolia s TOP-20 Index Risk Analysis, Pt. 3
Mongolia s TOP-20 Index Risk Analysis, Pt. 3 Federico M. Massari March 12, 2017 In the third part of our risk report on TOP-20 Index, Mongolia s main stock market indicator, we focus on modelling the right
More informationSection 6-1 : Numerical Summaries
MAT 2377 (Winter 2012) Section 6-1 : Numerical Summaries With a random experiment comes data. In these notes, we learn techniques to describe the data. Data : We will denote the n observations of the random
More informationCharacterization of the Optimum
ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing
More informationDATA SUMMARIZATION AND VISUALIZATION
APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296
More informationEconomics 307: Intermediate Macroeconomic Theory A Brief Mathematical Primer
Economics 07: Intermediate Macroeconomic Theory A Brief Mathematical Primer Calculus: Much of economics is based upon mathematical models that attempt to describe various economic relationships. You have
More informationAP Statistics Chapter 6 - Random Variables
AP Statistics Chapter 6 - Random 6.1 Discrete and Continuous Random Objective: Recognize and define discrete random variables, and construct a probability distribution table and a probability histogram
More informationCambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.
adjustment coefficient, 272 and Cramér Lundberg approximation, 302 existence, 279 and Lundberg s inequality, 272 numerical methods for, 303 properties, 272 and reinsurance (case study), 348 statistical
More informationProbability and Statistics
Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 3: PARAMETRIC FAMILIES OF UNIVARIATE DISTRIBUTIONS 1 Why do we need distributions?
More informationPoint Estimation. Some General Concepts of Point Estimation. Example. Estimator quality
Point Estimation Some General Concepts of Point Estimation Statistical inference = conclusions about parameters Parameters == population characteristics A point estimate of a parameter is a value (based
More informationMODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION
International Days of Statistics and Economics, Prague, September -3, MODELLING OF INCOME AND WAGE DISTRIBUTION USING THE METHOD OF L-MOMENTS OF PARAMETER ESTIMATION Diana Bílková Abstract Using L-moments
More informationStatistics and Probability
Statistics and Probability Continuous RVs (Normal); Confidence Intervals Outline Continuous random variables Normal distribution CLT Point estimation Confidence intervals http://www.isrec.isb-sib.ch/~darlene/geneve/
More informationThe following content is provided under a Creative Commons license. Your support
MITOCW Recitation 6 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make
More informationQuality Digest Daily, March 2, 2015 Manuscript 279. Probability Limits. A long standing controversy. Donald J. Wheeler
Quality Digest Daily, March 2, 2015 Manuscript 279 A long standing controversy Donald J. Wheeler Shewhart explored many ways of detecting process changes. Along the way he considered the analysis of variance,
More informationChapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi
Chapter 4: Commonly Used Distributions Statistics for Engineers and Scientists Fourth Edition William Navidi 2014 by Education. This is proprietary material solely for authorized instructor use. Not authorized
More informationTHE USE OF THE LOGNORMAL DISTRIBUTION IN ANALYZING INCOMES
International Days of tatistics and Economics Prague eptember -3 011 THE UE OF THE LOGNORMAL DITRIBUTION IN ANALYZING INCOME Jakub Nedvěd Abstract Object of this paper is to examine the possibility of
More information4.3 The money-making machine.
. The money-making machine. You have access to a magical money making machine. You can put in any amount of money you want, between and $, and pull the big brass handle, and some payoff will come pouring
More informationQQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016
QQ PLOT INTERPRETATION: Quantiles: QQ PLOT Yunsi Wang, Tyler Steele, Eva Zhang Spring 2016 The quantiles are values dividing a probability distribution into equal intervals, with every interval having
More informationChapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)
Starter Ch. 6: A z-score Analysis Starter Ch. 6 Your Statistics teacher has announced that the lower of your two tests will be dropped. You got a 90 on test 1 and an 85 on test 2. You re all set to drop
More informationModeling Obesity and S&P500 Using Normal Inverse Gaussian
Modeling Obesity and S&P500 Using Normal Inverse Gaussian Presented by Keith Resendes and Jorge Fernandes University of Massachusetts, Dartmouth August 16, 2012 Diabetes and Obesity Data Data obtained
More informationEconS Income E ects
EconS 305 - Income E ects Eric Dunaway Washington State University eric.dunaway@wsu.edu September 23, 2015 Eric Dunaway (WSU) EconS 305 - Lecture 13 September 23, 2015 1 / 41 Introduction Over the net
More information3. The Discount Factor
3. he Discount Factor Objectives Eplanation of - Eistence of Discount Factors: Necessary and Sufficient Conditions - Positive Discount Factors: Necessary and Sufficient Conditions Contents 3. he Discount
More information9. Logit and Probit Models For Dichotomous Data
Sociology 740 John Fox Lecture Notes 9. Logit and Probit Models For Dichotomous Data Copyright 2014 by John Fox Logit and Probit Models for Dichotomous Responses 1 1. Goals: I To show how models similar
More information1 Describing Distributions with numbers
1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write
More informationMost of the transformations we will deal with will be in the families of powers and roots: p X -> (X -1)/-1.
Powers and Roots Quite often when we re dealing with quantitative data, it turns out that for the purposes of analysis, it is useful to carry out a transformation of one of the variables of interest. This
More informationAppendix A. Selecting and Using Probability Distributions. In this appendix
Appendix A Selecting and Using Probability Distributions In this appendix Understanding probability distributions Selecting a probability distribution Using basic distributions Using continuous distributions
More informationThe mean-variance portfolio choice framework and its generalizations
The mean-variance portfolio choice framework and its generalizations Prof. Massimo Guidolin 20135 Theory of Finance, Part I (Sept. October) Fall 2014 Outline and objectives The backward, three-step solution
More informationClass 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700
Class 13 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 017 by D.B. Rowe 1 Agenda: Recap Chapter 6.3 6.5 Lecture Chapter 7.1 7. Review Chapter 5 for Eam 3.
More informationTerms & Characteristics
NORMAL CURVE Knowledge that a variable is distributed normally can be helpful in drawing inferences as to how frequently certain observations are likely to occur. NORMAL CURVE A Normal distribution: Distribution
More informationFitting financial time series returns distributions: a mixture normality approach
Fitting financial time series returns distributions: a mixture normality approach Riccardo Bramante and Diego Zappa * Abstract Value at Risk has emerged as a useful tool to risk management. A relevant
More informationThe Normal Distribution
Stat 6 Introduction to Business Statistics I Spring 009 Professor: Dr. Petrutza Caragea Section A Tuesdays and Thursdays 9:300:50 a.m. Chapter, Section.3 The Normal Distribution Density Curves So far we
More informationWeek 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.
Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.
More informationMEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL
MEASURING PORTFOLIO RISKS USING CONDITIONAL COPULA-AR-GARCH MODEL Isariya Suttakulpiboon MSc in Risk Management and Insurance Georgia State University, 30303 Atlanta, Georgia Email: suttakul.i@gmail.com,
More informationQuantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples
Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More information**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:
**BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,
More informationWhen we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?
Distributions 1. What are distributions? When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution? In other words, if we have a large number of
More informationAP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE
AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,
More informationLecture 1: The Econometrics of Financial Returns
Lecture 1: The Econometrics of Financial Returns Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2016 Overview General goals of the course and definition of risk(s) Predicting asset returns:
More informationThe probability of having a very tall person in our sample. We look to see how this random variable is distributed.
Distributions We're doing things a bit differently than in the text (it's very similar to BIOL 214/312 if you've had either of those courses). 1. What are distributions? When we look at a random variable,
More informationWe use probability distributions to represent the distribution of a discrete random variable.
Now we focus on discrete random variables. We will look at these in general, including calculating the mean and standard deviation. Then we will look more in depth at binomial random variables which are
More informationSampling Distributions
AP Statistics Ch. 7 Notes Sampling Distributions A major field of statistics is statistical inference, which is using information from a sample to draw conclusions about a wider population. Parameter:
More informationThe proof of Twin Primes Conjecture. Author: Ramón Ruiz Barcelona, Spain August 2014
The proof of Twin Primes Conjecture Author: Ramón Ruiz Barcelona, Spain Email: ramonruiz1742@gmail.com August 2014 Abstract. Twin Primes Conjecture statement: There are infinitely many primes p such that
More information3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according
STAT 345 Spring 2018 Homework 9 - Point Estimation Name: Please adhere to the homework rules as given in the Syllabus. 1. Mean Squared Error. Suppose that X 1, X 2 and X 3 are independent random variables
More informationStatistical Analysis of Data from the Stock Markets. UiO-STK4510 Autumn 2015
Statistical Analysis of Data from the Stock Markets UiO-STK4510 Autumn 2015 Sampling Conventions We observe the price process S of some stock (or stock index) at times ft i g i=0,...,n, we denote it by
More informationChapter 8 Statistical Intervals for a Single Sample
Chapter 8 Statistical Intervals for a Single Sample Part 1: Confidence intervals (CI) for population mean µ Section 8-1: CI for µ when σ 2 known & drawing from normal distribution Section 8-1.2: Sample
More informationMA300.2 Game Theory 2005, LSE
MA300.2 Game Theory 2005, LSE Answers to Problem Set 2 [1] (a) This is standard (we have even done it in class). The one-shot Cournot outputs can be computed to be A/3, while the payoff to each firm can
More informationKARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI
88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical
More informationApproximate Revenue Maximization with Multiple Items
Approximate Revenue Maximization with Multiple Items Nir Shabbat - 05305311 December 5, 2012 Introduction The paper I read is called Approximate Revenue Maximization with Multiple Items by Sergiu Hart
More information