Maximum Likelihood Estimates for Alpha and Beta With Zero SAIDI Days

Similar documents
Gamma Distribution Fitting

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Lecture 3: Factor models in modern portfolio choice

PASS Sample Size Software

Commonly Used Distributions

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

ELEMENTS OF MONTE CARLO SIMULATION

The Fundamental Review of the Trading Book: from VaR to ES

Lesson Plan for Simulation with Spreadsheets (8/31/11 & 9/7/11)

ก ก ก ก ก ก ก. ก (Food Safety Risk Assessment Workshop) 1 : Fundamental ( ก ( NAC 2010)) 2 3 : Excel and Statistics Simulation Software\

Chapter 2 Uncertainty Analysis and Sampling Techniques

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Random Variables and Probability Distributions

Loss Simulation Model Testing and Enhancement

AP Statistics Chapter 6 - Random Variables

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

The Not-So-Geeky World of Statistics

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN EXAMINATION

Frequency Distribution Models 1- Probability Density Function (PDF)

TABLE OF CONTENTS - VOLUME 2

The Two-Sample Independent Sample t Test

CHAPTERS 5 & 6: CONTINUOUS RANDOM VARIABLES

Lecture 2. Probability Distributions Theophanis Tsandilas

6. Continous Distributions

A Probabilistic Approach to Determining the Number of Widgets to Build in a Yield-Constrained Process

Stochastic Models. Statistics. Walt Pohl. February 28, Department of Business Administration

MODELS FOR QUANTIFYING RISK

Statistical Tables Compiled by Alan J. Terry

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Online Appendix (Not intended for Publication): Federal Reserve Credibility and the Term Structure of Interest Rates

Appendix A. Selecting and Using Probability Distributions. In this appendix

FINA 695 Assignment 1 Simon Foucher

NOTES ON THE BANK OF ENGLAND OPTION IMPLIED PROBABILITY DENSITY FUNCTIONS

Describing Uncertain Variables

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Basic Procedure for Histograms

Equivalence Tests for Two Correlated Proportions

Deriving the Black-Scholes Equation and Basic Mathematical Finance

Increasing Variability in SAIDI and Implications for Identifying Major Events Days

January 29. Annuities

MAS187/AEF258. University of Newcastle upon Tyne

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

SYSM 6304 Risk and Decision Analysis Lecture 2: Fitting Distributions to Data

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Chapter 4. The Normal Distribution

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -5 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Normal Distribution. Definition A continuous rv X is said to have a normal distribution with. the pdf of X is

John Hull, Risk Management and Financial Institutions, 4th Edition

Numerical Descriptive Measures. Measures of Center: Mean and Median

Process capability estimation for non normal quality characteristics: A comparison of Clements, Burr and Box Cox Methods

Practice Exam 1. Loss Amount Number of Losses

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Prentice Hall Connected Mathematics 2, 7th Grade Units 2009 Correlated to: Minnesota K-12 Academic Standards in Mathematics, 9/2008 (Grade 7)

DECISION SUPPORT Risk handout. Simulating Spreadsheet models

Non-Inferiority Tests for the Ratio of Two Proportions

Web Extension: Continuous Distributions and Estimating Beta with a Calculator

SPC Binomial Q-Charts for Short or long Runs

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 23

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Probability and Statistics

Prepared By. Handaru Jati, Ph.D. Universitas Negeri Yogyakarta.

Lattice Model of System Evolution. Outline

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Group-Sequential Tests for Two Proportions

A Comprehensive, Non-Aggregated, Stochastic Approach to. Loss Development

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Duration Models: Parametric Models

Applications of Good s Generalized Diversity Index. A. J. Baczkowski Department of Statistics, University of Leeds Leeds LS2 9JT, UK

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

The normal distribution is a theoretical model derived mathematically and not empirically.

Statistical Modeling Techniques for Reserve Ranges: A Simulation Approach

1. You are given the following information about a stationary AR(2) model:

Monitoring Processes with Highly Censored Data

Non-Inferiority Tests for the Odds Ratio of Two Proportions

Analyzing Oil Futures with a Dynamic Nelson-Siegel Model

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

EVA Tutorial #1 BLOCK MAXIMA APPROACH IN HYDROLOGIC/CLIMATE APPLICATIONS. Rick Katz

February 2010 Office of the Deputy Assistant Secretary of the Army for Cost & Economics (ODASA-CE)

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

PROBLEM SET 7 ANSWERS: Answers to Exercises in Jean Tirole s Theory of Industrial Organization

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Clark. Outside of a few technical sections, this is a very process-oriented paper. Practice problems are key!

Continuous Distributions

NCSS Statistical Software. Reference Intervals

Introduction to Algorithmic Trading Strategies Lecture 8

Data Simulator. Chapter 920. Introduction

CASE 6: INTEGRATED RISK ANALYSIS MODEL HOW TO COMBINE SIMULATION, FORECASTING, OPTIMIZATION, AND REAL OPTIONS ANALYSIS INTO A SEAMLESS RISK MODEL

Comparison of Estimation For Conditional Value at Risk

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

Information Processing and Limited Liability

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

By-Peril Deductible Factors

Transcription:

Maximum Likelihood Estimates for Alpha and Beta With Zero SAIDI Days 1. Introduction Richard D. Christie Department of Electrical Engineering Box 35500 University of Washington Seattle, WA 98195-500 christie@ee.washington.edu February 10, 003 [Chr03] explains how substituting the minimum SAIDI value for zero SAIDI days gave the most accurate - or least erroneous - results for Alpha and Beta compared ignoring zero SAIDI days, or using the median or average SAIDI value as a replacement. [Chr03] also stated that other statistical methods may be available. This document describes the statistically based maximum likelihood (MLE) method of estimating the values of Alpha and Beta in data sets zero SAIDI days. Two quantitative examples show that the MLE method is more accurate than minimum value substitution, which in turn is the most accurate of the proposed substitution methods. The MLE method involves the iterative solution of a non-linear equation. This is doable interactively a spreadsheet in a short time (a few minutes). The Working Group must determine whether the complexity of the method permits its adoption in P1366. Normal Distribution 0.45 0.4 0.35 pdf(ln(x) pdf(x) All these sample values become zeros 0.3 0.5 0. Nominal minimum value 0.15 0.1 0.05 0-3 - -1 0 1 3 x ln(x) Figure 1 - Normal distribution showing location of zero samples in left hand tail 1

II. Background In theory, an ideal log-normal distribution will never generate samples (daily SAIDI values) values of zero. In practice, zero values appear because the real process is not exactly log normal - it has some discrete components (faults may or may not occur) as well as continuous - and because of the quantization of time (SAIDI per day). It may be useful to think of the sampling process as going through a round-off process in which daily SAIDI values below some minimum are rounded to zero. These theoretical pre-roundoff sample values (there is no way to measure their actual values) are all less than some minimum but greater than zero and can be thought of as occupying the left hand tail of the normal distribution of the logs of the samples. This is shown in Figure 1. The value of the minimum shown is somewhat large to emphasize the content of the tail of the distribution. The question, then, is what to do about these zero values, which cannot be used to find the mean and standard deviation of the logs of the data, (Alpha and Beta, respectively) because the logs of the zero sample values are negative infinity. The objective is to estimate values of Alpha and Beta. The actual values of Alpha and Beta are properties of the population of the values of SAIDI for all possible days (i.e. an infinite number of values) for a given utility, and are not knowable. The days are sampled - nominally five years worth of samples - and computations are performed on this sample to estimate values for Alpha and Beta. The computations are called estimators. The commonly used and generally preferred estimators are called maximum likelihood estimators (MLEs) because the values they estimate have the highest chance of having the least error. As it happens, for normal (Gaussian) distributions, the maximum likelihood estimator for the population mean is the average of the sample, and the maximum likelihood estimator for the variance is the square of the standard deviation of the sample, and this is the method used to estimate Alpha and Beta from the natural logs of the daily SAIDI values when none of them are zero. When some daily SAIDI values are zero, the problem becomes estimating Alpha and Beta from a censored sample, one that is missing values below a certain point. The sample is singly censored, because sample values are missing from only one side of the distribution. The maximum likelihood estimators for mean and standard deviation of censored normal distributions are given in [Sch86], found via [Cro88]. III. Are Utility Reliability Distributions Really Log-Normal? At this point it may be useful to revisit the issue of whether utility reliability distributions are log-normal, since some Working Group members have claimed they are not and

provided graphical examples where the natural logs of the daily SAIDI values are, for example, somewhat bimodal. The quick answer is that utility daily reliability distributions are not exactly log-normal, but log-normal is close enough to what they really are for all practice purposes. Just as it is not possible to know the actual values of the mean and variance of the population of all possible daily reliability values, it is also not possible to formally state whether utility daily reliability values are or are not log-normally distributed. When someone makes such a statement, they are speaking informally. It is possible to make a statement about how close a given distribution is to log normal, i.e. " is log-normal p = ". The process that generates daily reliability distributions is sufficiently complex, involving as it does seasonal weather patterns, animal migrations, several independent discrete event processes (fault causes) and continuously distributed response times that include a travel component, that it seems unlikely to be provably log-normal. What can be said is that all of the utility daily reliability distributions analyzed to date have fit the log-normal distribution better than several other likely distributions, including normal (Gaussian) and Weibull. The common sense test for this is to look at the histogram of the natural logs of the daily reliability data and see if it looks more like a normal (Gaussian) distribution than any other distribution. Even the bimodal distribution offered as evidence of non-log normality can be seen to be Gaussian some systematic error. If log-normal is the closest distribution to what is actually observed, then methods based on the log-normal distribution can be used that generate results the least error, even though that error is not zero. This is the case for all of the historical utility data reviewed at present, and this is the basis for assuming log-normality for the rest of the discussion in this paper. IV. Maximum Likelihood Estimators In [Sch86], Schneider describes maximum likelihood estimators for singly censored normal distributions, reporting they were developed by Cohen in 1950. The presentation in [Sch86] is for right-censored samples while the zero SAIDI day case has left-censored samples, that is, low values are missing instead of high ones. Therefore the equations given here are modified for left censoring. Schneider's notation has also been modified to use Alpha (α) for the mean and Beta (β) for the standard deviation being estimated. Symbols used are as follows: α - Mean of the natural log of daily reliability. α - Estimate of the mean of the natural log of daily reliability. The Alpha value used to compute the major event day threshold value T MED. β - Standard deviation of the natural log of daily reliability. β - Estimate of the standard deviation of the natural log of daily reliability. The Beta value used to compute the major event day threshold value T MED. φ - Probability density function (pdf) of the standard normal distribution. Φ = Cumulative density function (cdf) of the standard normal distribution. 3

h - The amount of probability in the censored data. n - The total number of daily SAIDI values r i, including zero values. n z - The number of zero daily SAIDI values r i. r - The value of SAIDI on day i. i s - Sample variance, square of standard deviation of the natural logs of the nonzero daily SAIDI values r i. T MED - The major event day threshold value. u - The estimated normalized value of the natural log of the smallest possible nonzero daily SAIDI value. x - Average value of the natural logs of the non-zero daily SAIDI values r i. x - Natural log of the smallest non-zero daily SAIDI value r i. min The maximum likelihood estimators are ( h, u)( x x) ( h, u)( x x) α = x + λ min β = s + λ min (1) () where nz h = (3) n Y ( ) ( h, u) λ h, u = (4) Y ( h, u) + u h ~ Y ( h, u) = W ( u) 1 h (5) ~ φ ( ) ( u) W u = (6) 1 Φ u ( ) and u is the solution to 1 Y ( h, u) [ Y ( h, u) + u] s = [ Y ( h, u) + u] ( x x) min (7) Equation (7) is a non-linear equation that is solved by iteration. V. Estimation Process The maximum likelihood estimators α and β can be computed from a set of daily SAIDI values as follows: 1. Sort the sample by value.. Count the number of zero values, n z. 3. Take the natural log (ln) of all non-zero SAIDI values. 4

4. Find the average ( x ) and standard deviation ( s ) of the values computed in step 3. 5. If there are no zero SAIDI values (n z = 0), then Alpha = x and Beta = s. Otherwise, nz 6. Compute h = n 7. Find x min, the natural log of the minimum non-zero daily SAIDI value, min(r i ). 8. Solve equation (7) for u. See the discussion below. 9. Find Alpha and Beta from equations (1) and (). Once Alpha and Beta are known, TMED can be computed as usual using the estimates. = ˆ α +.5 ˆ β (8) T MED VI. Solving Equation (7) for u A number of algorithms are available for solving non-linear equations such as equation (7). These could be automated in a spreadsheet macro or programming in to an analysis program. The following spreadsheet-based heuristic interactive iterative process is practical and convenient for spreadsheets that have functions giving the standard normal distribution probability density function (pdf) and cumulative density function (cdf) (φ and Φ, respectively). One popular spreadsheet, Excel, implements these functions as follows: φ( x) NORMDIST(x,0,1,FALSE) Φ x NORMDIST(x,0,1,TRUE) ( ) Using these, after computing the necessary constants like h, x and s, the iterative process can be performed as follows. 1. Select a column in which guesses for u will be entered.. In the next column to the right enter the formula for W ~ from equation (6) as =NORMDIST(u,0,1,FALSE)/(1-NORMDIST(u,0,1,TRUE))) where u is the column selected in step 1. 3. In the next column to the right enter the formula for Y from equation (5). 4. In the next column to the right enter the formula for the left hand side (LHS) of equation (7) 5. In the next column to the right enter the formula (or copy the value of) the right hand side (RHS) of equation (7). Note that x is the average, and s is the standard deviation of the natural logs of the non-zero daily SAIDI values. 6. Copy the row several times. Each row will be one iteration. Copy as many times as needed. Alternatively, reenter new values of u in the same cell. 7. Enter an initial guess for u. 1.0 is a reasonable value if no other information (like a previous result) is available. 8. Based on the mismatch between the LHS value computed by the spreadsheet for the most recent guess and the constant RHS value, make another guess at u. In general an increase in u results in a decrease in the LHS value. The amount of change to make is based on judgement. (Interpolation could be used, but heuristic guessing is faster than 5

computing the interpolation unless repeating this analysis many times, in which case a macro is recommended.) Repeat until a sufficiently accurate guess is obtained. VII. Examples Alpha and Beta were estimated for two example censored data sets using five different methods. One data set has five years of simulated daily SAIDI values. The advantage of simulation is that the actual values of Alpha and Beta are known. Daily SAIDI values were found by obtaining a uniform random variable between 0 and 1, finding the value of the normal CDF for the random value, and then exponentiating. This gives an almost ideal lognormal distribution. The second data set is four years of real world daily SAIDI data for anonymous Utility provided by the Distribution Design Working Group. Neither data set has zero SAIDI days. Both are censored by assuming that the 110 lowest SAIDI values have been rounded to zero, so that their natural logs are not available. This permits comparison of Alpha and Beta estimates from the uncensored data set estimates calculated using the censored data set. The five methods of estimating Alpha and Beta are: Ignoring zero SAIDI days. Replacing zero SAIDI days the minimum non-zero SAIDI value. Replacing zero SAIDI days the median SAIDI value. Replacing zero SAIDI days the average SAIDI value. Maximum likelihood estimators for censored samples (MLE) The results for the simulated data set given in Table 1. Table 1 - Results for Simulated Data Set 110 Censored Values No Censored Values Minimum Average Median Maximum Likelihood Estimates Parameter Actual Values Ignore Zero Days Alpha -3.60-3.59-3.33-3.54-3.3-3.35-3.59 Beta.00.00 1.76 1.89 1.76 1.71 1.99 ln(tmed) 1.40 1.41 1.07 1.18 1.16 0.93 1.39 TMED 4.06 4.08.93 3.6 3.0.53 4.0 The values of Alpha and Beta estimated by taking the average and standard deviation of the complete set of simulated data (No Censored Values column) are very close to the actual values used to generate the data set. As discussed qualitatively in [Chr03], the values of Alpha and Beta estimated by replacing the zero SAIDI days the minimum non-zero SAIDI value are closer to the actual values than those found by ignoring zero SAIDI days or replacing the average or Median SAIDI value. However, the natural 6

log of T MED is significantly lower than the actual value, which would result in classifying more major event days than would be correct. The Maximum Likelihood Estimates (MLEs) are significantly more accurate than any of the replacement schemes and have about as much error as the uncensored value estimates. The MLE is clearly the preferable estimation technique. Results for the Utility data set are given in Table. In this case, the actual values of the parameters are not known, and the estimates from the censored data must be compared the estimates (average and standard deviation) from the uncensored data. Table - Results for the Utility Data Set 110 Censored Values Using real utility data, the qualitative results are the same as before, i.e. MLEs are closer to the non-censored estimates than any of the replacement methods, and replacement the minimum value is the best replacement method. However, errors are larger. This is probably because the real world data is only close to being log-normally distributed, not exactly log-normally distributed as is the case for the simulated data set. As explained in section III, assuming log-normality permits computation of the MLEs, and the error associated differences from log-normality is small if the distributions are close to being log normal. VIII. Other Estimators [Sch86] describes a number of other estimators for the mean and standard deviation of censored samples from normally distributed populations. Some of these estimators are described as "simplified" because they do not require an iterative solution of equation (7). However, the closed form solutions provided are more complicated to explain and implement than the use of MLEs. Many involve table look ups, where the table values are more complicated to compute than the solution of (7). Furthermore, all of the additional estimators have lower efficiency than the MLEs, meaning that they produce a wider range of estimates, even if the average estimate is accurate. For these reasons the other estimators in [Sch86] are evaluated as unsuitable for the major event day identification problem. IX. Conclusion No Censored Values Minimum Average Median Maximum Likelihood Estimates Ignore Zero Parameter Days Alpha -3.53-3.19-3.4-3.08-3.0-3.49 Beta.03 1.68 1.81 1.66 1.61 1.94 ln(tmed) 1.55 1.00 1.11 1.06 0.83 1.36 TMED 4.71.73 3.04.89.30 3.90 The Maximum Likelihood Estimators (MLEs) for censored samples, found in [Sch86], give better estimates of Alpha, Beta and T MED for data sets zero SAIDI days than replacement methods, a result backed by theory and illustrated by the examples in section 7

VII. The MLEs can be computed using a standard spreadsheet. Computation involves solving a non linear equation. Whether the accuracy of the results justifies the complexity of the solution is an issue for the Working Group to resolve. If use of MLEs is deemed too complex, the best replacement method is replacement the minimum non-zero SAIDI value. A spreadsheet the example calculations is available at X. References www.ee.washington.edu/people/faculty/christie/zerodayest.xls. [Chr03] R.D.Christie, "Zero SAIDI Days Issue - Response to WMECO", www.ee.washington.edu/people/faculty/christie/zerodayissue.pdf, January 4, 003. [Cro88] E.L. Crow and K. Shimizu, Lognormal Distributions: Theory and Applications, Marcel Dekker, Inc., New York, 1988. [Sch86] H. Schneider, Truncated and Censored Samples from Normal Populations, Marcel Dekker, Inc, New York, 1986. 8