Introduction. Why One-Pass Statistics?

Similar documents
Tests for Two Correlations

MgtOp 215 Chapter 13 Dr. Ahn

Evaluating Performance

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

An Application of Alternative Weighting Matrix Collapsing Approaches for Improving Sample Estimates

Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

4. Greek Letters, Value-at-Risk

CHAPTER 9 FUNCTIONAL FORMS OF REGRESSION MODELS

Linear Combinations of Random Variables and Sampling (100 points)

Likelihood Fits. Craig Blocker Brandeis August 23, 2004

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

Chapter 3 Student Lecture Notes 3-1

Chapter 3 Descriptive Statistics: Numerical Measures Part B

Measures of Spread IQR and Deviation. For exam X, calculate the mean, median and mode. For exam Y, calculate the mean, median and mode.

Tests for Two Ordered Categorical Variables

Simple Regression Theory II 2010 Samuel L. Baker

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

Interval Estimation for a Linear Function of. Variances of Nonnormal Distributions. that Utilize the Kurtosis

/ Computational Genomics. Normalization

Calibration Methods: Regression & Correlation. Calibration Methods: Regression & Correlation

3: Central Limit Theorem, Systematic Errors

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

A Bootstrap Confidence Limit for Process Capability Indices

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

EDC Introduction

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Dr. Wayne A. Taylor

Problem Set 6 Finance 1,

Spatial Variations in Covariates on Marriage and Marital Fertility: Geographically Weighted Regression Analyses in Japan

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

FORD MOTOR CREDIT COMPANY SUGGESTED ANSWERS. Richard M. Levich. New York University Stern School of Business. Revised, February 1999

Creating a zero coupon curve by bootstrapping with cubic splines.

Random Variables. b 2.

Chapter 5 Student Lecture Notes 5-1

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da *

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

Appendix - Normally Distributed Admissible Choices are Optimal

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

Capability Analysis. Chapter 255. Introduction. Capability Analysis

Clearing Notice SIX x-clear Ltd

Lecture Note 2 Time Value of Money

Spurious Seasonal Patterns and Excess Smoothness in the BLS Local Area Unemployment Statistics

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect

OCR Statistics 1 Working with data. Section 2: Measures of location

Physics 4A. Error Analysis or Experimental Uncertainty. Error

Hewlett Packard 10BII Calculator

Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator.

The Mack-Method and Analysis of Variability. Erasmus Gerigk

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. Truncated standard normal distribution for a = 0.5, 0, and 0.5. CDS Mphil Econometrics

Midterm Exam. Use the end of month price data for the S&P 500 index in the table below to answer the following questions.

Multifactor Term Structure Models

Price and Quantity Competition Revisited. Abstract

The Integration of the Israel Labour Force Survey with the National Insurance File

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Cracking VAR with kernels

Module Contact: Dr P Moffatt, ECO Copyright of the University of East Anglia Version 2

Monetary Tightening Cycles and the Predictability of Economic Activity. by Tobias Adrian and Arturo Estrella * October 2006.

Analysis of Variance and Design of Experiments-II

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

Skewness and kurtosis unbiased by Gaussian uncertainties

Construction Rules for Morningstar Canada Dividend Target 30 Index TM

Supplementary material for Non-conjugate Variational Message Passing for Multinomial and Binary Regression

- contrast so-called first-best outcome of Lindahl equilibrium with case of private provision through voluntary contributions of households

Alternatives to Shewhart Charts

ISyE 512 Chapter 9. CUSUM and EWMA Control Charts. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

Sequential equilibria of asymmetric ascending auctions: the case of log-normal distributions 3

ISE High Income Index Methodology

UNIVERSITY OF VICTORIA Midterm June 6, 2018 Solutions

Risk and Return: The Security Markets Line

Elements of Economic Analysis II Lecture VI: Industry Supply

ASSESSING GOODNESS OF FIT OF GENERALIZED LINEAR MODELS TO SPARSE DATA USING HIGHER ORDER MOMENT CORRECTIONS

Financial mathematics

AC : THE DIAGRAMMATIC AND MATHEMATICAL APPROACH OF PROJECT TIME-COST TRADEOFFS

Quiz on Deterministic part of course October 22, 2002

Financial Risk Management in Portfolio Optimization with Lower Partial Moment

Principles of Finance

Standardization. Stan Becker, PhD Bloomberg School of Public Health

arxiv: v1 [q-fin.pm] 13 Feb 2018

Chapter 10 Making Choices: The Method, MARR, and Multiple Attributes

It is important for a financial institution to monitor the volatilities of the market

Information Flow and Recovering the. Estimating the Moments of. Normality of Asset Returns

Construction Rules for Morningstar Canada Dividend Target 30 Index TM

CHAPTER 3: BAYESIAN DECISION THEORY

02_EBA2eSolutionsChapter2.pdf 02_EBA2e Case Soln Chapter2.pdf

Introduction. Chapter 7 - An Introduction to Portfolio Management

Parallel Prefix addition

Correlations and Copulas

Finance 402: Problem Set 1 Solutions

Facility Location Problem. Learning objectives. Antti Salonen Farzaneh Ahmadzadeh

Notes on experimental uncertainties and their propagation

Risk Reduction and Real Estate Portfolio Size

Testing for Omitted Variables

Applications of Myerson s Lemma

PASS Sample Size Software. :log

Lecture 7. We now use Brouwer s fixed point theorem to prove Nash s theorem.

Using Conditional Heteroskedastic

Global sensitivity analysis of credit risk portfolios

Problems to be discussed at the 5 th seminar Suggested solutions

Introduction to PGMs: Discrete Variables. Sargur Srihari

>1 indicates country i has a comparative advantage in production of j; the greater the index, the stronger the advantage. RCA 1 ij

ISE Cloud Computing Index Methodology

Transcription:

BERKELE RESEARCH GROUP Ths manuscrpt s program documentaton for three ways to calculate the mean, varance, skewness, kurtoss, covarance, correlaton, regresson parameters and other regresson statstcs. Although nformaton contaned n ths manuscrpt s beleved to be accurate, the documentaton s offered wthout warranty, and users agree to assume all responsbltes and consequences from usng ths documentaton. Introducton Formulas for common statstcs are generally well known, and users have access to natve routnes n Mcrosoft Excel and most programmng languages to calculate many statstcs. Under most crcumstances and wth most data, these routnes provde dentcal results. That s, they produce dentcal results wthn the mathematcal precson avalable n that envronment. However, these algorthms can be constructed n at least three ways, and sometmes the results dffer because the algorthms exceed the precson of the envronment. Stated dfferently, the three methods place unequal demands on the precson avalable for the calculatons. Some data also put more demands on the precson avalable for calculatons. For most data, the choce nvolves convenence; for some data, choosng the rght algorthm s mportant. Why One-Pass Statstcs? The standard defntons of the statstcal formulas descrbed below requre two passes through the data. At tmes t s mpossble or nconvenent to wat untl all data s avalable to make the calculatons. Ths mght occur because t s necessary to calculate a statstc wth all avalable data up to a pont and recalculate after recevng each addtonal data pont. Some data sets are large enough that retanng the data to make two passes s ether mpractcal or mpossble. These condtons argue for usng one of the one-pass methods descrbed below. One example n whch usng one-pass statstcs may be valuable nvolves Monte Carlo smulaton, where the number of samples can quckly become very large. In ths case, t s convenent to calculate dstrbuton parameters such as the mean, standard devaton, sample skewness, or kurtoss usng a one-pass method to avod havng to retan all of the data for ex post analyss. For the same reason, t s convenent to embed the statstcal calculatons n lne n the same code that generates the Monte Carlo test rather than to rely on natve statstcal routnes. A second example n whch one-pass statstcs may be valuable also nvolves Monte Carlo smulaton, where the tests are repeated untl a certan level of statstcal confdence s acheved. For example, the standard error of a Monte Carlo result generally declnes proportonate wth the square root of the number of trals. When the standard devaton of path results s known n advance, t s possble to also determne n advance how many trals are requred. When the standard devaton of sample paths s not known n advance ( for example, f ths uncertanty depends on nputs to the smulaton), t s convenent to run the test untl the standard error of the estmate falls below a targeted level. A one-pass method that can be ncrementally updated makes such a smart stop possble. For the rest of ths manuscrpt, sample skewness wll just be called skewness. In general, ths manuscrpt wll not assume that kurtoss wll actually mean excess kurtoss unless labeled as such explctly. In all cases, kurtoss wll refer to the kurtoss of a sample. IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL

BERKELE RESEARCH GROUP WHITE PAPER Two-Pass Statstcs The standard defntons of varance, skewness, kurtoss, covarance, and smple lnear regresson begn by assumng that the mean of data to be analyzed s already known. An algorthm frst calculates the mean. In the nterest of completeness and to ntroduce the notaton, that mean s shown n Equaton : µ () To calculate the mean, μ, of a vector, add all the values for and dvde by the number of observatons. The mean s sometmes called the frst sample moment of a statstcal dstrbuton. The unt of measure that apples to μ s the same unt that apples to. For example, f s measured n feet, the mean produced by Equaton wll be n feet. The standard defnton of sample varance appears n Equaton : ( ) - µ () An algorthm that frst calculates the results of Equaton and then Equaton s called a two-pass algorthm for calculatng varance. The varance s sometmes called the second sample moment of a statstcal dstrbuton and the numerator s called the sum of squares. The unt of measure that apples to s the square of the unt that apples to. The defnton of sample standard devaton appears n Equaton : ( µ ) - () Alternatvely, the standard devaton can be descrbed as the square root of varance, n whch case the algorthm bulder doesn t really need a separate formula as n Equaton. Each of the one-pass methods descrbed below follows that pattern: fnd the varance, then transform t nto standard devaton f needed. The unt of measure that apples to s the same unt that apples to. For example, f s measured n feet, the standard devaton produced by Equaton wll be n feet. Equatons and dvde the sum of squares by the number of observatons reduced by. Ths adjustment makes these sample statstcs unbased. A smlar bas adjustment s requred for skewness and kurtoss but often does not appear n publshed formulas. Ths manuscrpt wll follow that conventon and then dscuss how to adjust the results to be unbased. The standard defnton of skewness appears n Equaton : Skew ( µ ) * ()

BERKELE RESEARCH GROUP The denomnator n Equaton can be descrbed as ether the standard devaton rased to the thrd power or the varance rased to the.5 power. Of course, that denomnator requres a pass through the data, and the calculaton of the denomnator must be made before the skewness s calculated. However, the summatons requred to calculate the denomnator usng Equaton or Equaton can be talled on the same pass through the data requred to calculate the sum n the numerator of Equaton. For ths reason, t s stll generally descrbed as a two-pass formula. The skewness s sometmes called the thrd moment of a statstcal dstrbuton. The unt of measure that apples to Skew s ndependent of the unt that apples to. For any data, a skewness of 0 s consdered not skewed, whle postve values are descrbed as skewed rght and negatve values as skewed left. The standard defnton of kurtoss appears n Equaton 5: Kurtoss ( µ ) * (5) Ths kurtoss formula could be descrbed as a two-pass formula, because t reles on a pror step to calculate the mean, then a second step that sums values for both the numerator and the denomnator. The kurtoss s the fourth sample moment of a statstcal dstrbuton. The unt of measure that apples to Kurtoss x s ndependent of unt that apples to. For any data, a kurtoss of about s consdered typcal of normally dstrbuted values and descrbed as mesokurtc. Kurtoss larger than about s descrbed as leptokurtc ( fat tals), and kurtoss smaller than about s platykurtc (thnner tals). A measure called excess kurtoss subtracts approxmately so that excess kurtoss s centered around 0. The standard defnton for covarance appears as Equaton 6:, ( µ )( µ ) (6) Equaton 6 closely resembles the defnton of varance n Equaton. In fact, Equaton 6 becomes Equaton (except for the mnor dfference n the denomnators) when Equaton 6 s used to measure the covarance between a varable and tself. The covarance s not consdered a moment. The unts that apply to Equaton 6 lack ntutve clarty. For ths reason, correlaton s calculated as a knd of standardzed or normalzed covarance. See Equaton 7: ρ,, (7) See a longer descrpton on how the adjustment dffers from the bas secton. See Appendx D for a more precse defnton of the excess kurtoss adjustment. IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL

BERKELE RESEARCH GROUP WHITE PAPER Textbook Formulas for One-Pass Statstcs Each of the four moments descrbed above and the covarance can be restated n a format that s conducve for buldng a onepass algorthm. Each of the formulas s algebracally equvalent to the standard formulas. Ths means that f mathematcal routnes could produce the exact values called for n the equatons above and below, the results would dentcally match the results from the equatons above. The equvalent one-pass formula for the varance 5 appears as equaton 8: (8) Equaton 8 s sometmes called the textbook formula because t s frequently ncluded n statstcal textbooks. 6 It permts constructon of a one-pass algorthm because the mean s only needed at the end of a pass through the data. That algorthm sum both and. The equvalent one-pass textbook formula for the skewness 7 appears as Equaton 9: Skew (9) Although ths author has not seen Equaton 9 publshed, t s convenent to descrbe t as the textbook formula for skewness. It permts constructon of a one-pass algorthm because the mean s only needed at the end of a pass through the data. That algorthm must sum,, and. The equvalent textbook one-pass formula for the kurtoss 8 appears as Equaton 0: 6 Kurt (0) Although ths author has not seen Equaton 0 publshed, t s convenent to descrbe t as the textbook formula for kurtoss. It permts constructon of a one-pass algorthm because the mean s only needed at the end of a pass through the data. That algorthm must sum,,, and. 5 The dervaton of Equaton 8 appears n Appendx A. 6 Chan, Tony F., Gene H. Golub, and Randall J. LeVeque, Algorthms for Computng Sample Varance, Analyss and Recommendatons, The Amercan Statstcan 7: (August 98), 7. 7 The dervaton of Equaton 9 appears n Appendx B. 8 The dervaton of Equaton 0 appears n Appendx C.

BERKELE RESEARCH GROUP The textbook methodology lends tself to a one-pass method for calculatng the covarance. Equaton 9 follows the textbook strategy and requres the sums of,, and. Of course, a one-pass textbook algorthm to calculate the correlaton coeffcent follows, usng Equaton and Equaton 7. Calculate the covarance of two vectors usng Equaton and the square root of Equaton 8 (varance) to calculate the standard devaton of each vector wth a sngle pass. It s also possble to develop a smlar one-pass formula for a regresson slope, β, for a sngle 0 ndependent varable. Equaton requres the sums of,,, and and requres knowledge of the mean, but that mean s not requred to complete the other calculatons, so the terms needed to evaluate Equaton can be accumulated on a sngle pass through the data. () β () The ntercept, α, n Equaton can be found usng terms evaluated for Equaton. Equaton reles on the knowledge that the means of the and values represent a pont on the regresson lne. β α α m Therefore, the ntercept can be determned from a sngle pass f the slope s known. By relyng on Equaton, whch permts a one-pass algorthm, the regresson lne can be determned wth a one-pass methodology smlar to the textbook algorthms above. umercal Precson of Textbook One-Pass Algorthms The textbook algorthms are vulnerable to computatonal errors for certan types of data. For example, f the magntude of that data s large, requrng much of the avalable precson of a computer system, and f the varance s small relatve to the underlyng data, t s not dffcult to construct a hypothetcal data set where algorthms based on the textbook formulas for varance, skewness, and kurtoss produce unrelable results. Some authors have advsed aganst usng the textbook algorthm because t s more prone to errors ntroduced by the computatonal lmts of computer mathematcal operatons. Although two-pass methods are much less lkely to exceed the computatonal precson of a computer, t s also possble to fnd data where the two-pass method can produce unrelable results. Some strateges can mprove the accuracy of two-pass methods. For example, from all data, subtract a large number somewhat close to the expected mean. A thrd method ntroduced by Welford s generally least lkely to requre arthmetc operatons that exceed the precson of the computer. () () 9 The dervaton of Equaton appears n Appendx D. 0 The dervaton of Equaton appears n Appendx E. See for example, Cook, John D., Comparng Three Methods of Computng Standard Devaton, John D. Cook blog (September 6, 008), accessed at: http://www.johndcook.com/blog/008/09/6/comparng-three-methods-of-computng-standard-devaton/ IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL 5

BERKELE RESEARCH GROUP WHITE PAPER Onlne One-Pass Statstcs Welford ntroduced a way to calculate varance wthout requrng the pror calculaton of the mean. Knuth desgned a well-tested algorthm for calculatng varance, relyng on the Welford formulaton. Welford defned the mean conventonally. M (5) A thoughtful algorthm accumulates the sum n the numerator. The mean s found by dvdng ths accumulaton by the prevalng. The Onlne varance accumulates the sum of squares and uses that sum to calculate varance. Equaton 6 defnes the sum of squares as an ncrement based on the prevous sum of squares. S S ( M ) (6) Calculate varance of the data ponts wth the sum of squares usng Equaton 7: S - ote that ths formulaton supports an ncremental algorthm that works as long as the latest sum of the s and the sum of squares s preserved. As wth Equaton 8, the standard devaton s calculated usng Equaton 7 then takng the square root of the varance. Chan et al. extended the Onlne methodology to allow the accumulated analyss from one block of data to be merged wth the accumulated analyss of a second block of data. Tmothy B Terrberry 5 extended the Chan methodology to permt mergng of data sets used to calculate skewness and kurtoss. The Terrberry equatons reduce to the followng methodology when one addtonal data pont s added to a seres: M (8) δ, Equaton 8 calculates the devaton of the next data pont from the mean prevalng before, updatng the mean to reflect that data pont. M of course equals for. δ M, M, (9) ext, the mean s updated usng Equaton 9. otce that Equatons 8 through ntroduce nterm sums. M s equal to the mean of. M, M, and M provde a convenent way to calculate standard devaton, skew, and kurtoss. (7) M, M, M, δ (0A) (0B) Welford, B. P., ote on a Method for Calculatng Corrected Sums of Squares and Products, Technometrcs : (August 96), 9 0. Donald E. Knuth, The Art of Computer Programmng, Volume : Semnumercal Algorthms, thrd ed. (998),. Chan et al. (98). 5 Terrberry, Tmothy, Computng Hgher-Order Moments Onlne, 008, https://people.xph.org/~tterrbe/notes/homs.html, accessed 8//5. 6

BERKELE RESEARCH GROUP Use Equaton 0A and 0B to update the standard devaton or to calculate varance. otce that the value of M s not altered and can be contnually used to accumulate more data. M, Skew M, δ M / M,, ( -)( - ) δ*m -,- (A) (B) Update the skewness usng Equatons A and B. M, Kurt M, M M δ ( -)( - ) 6δ M,- δ*m -,- (A) (B) Fnally, update the kurtoss usng Equatons A and B. It s also possble to derve a Welford-lke formula for covarance. 6 The algorthm used heren reles on Equaton 7 : ( ) ( ( )( )/),, Onlne Regresson Parameters It s possble to calculate a regresson beta usng a formula smlar to the Onlne varance formula. The algorthm s summarzed n Equaton 8. Here, values for the numerator rely on prevous values, whch follow a now-famlar pattern because the algorthm adapts the prevous sum to the new mean. The denomnator s the sum of squared devatons accumulated for the ndependent valuaton, shown as a varaton on the sum of squares formulated n Equaton. () Beta Sumx ( ) ( ( ) Sum ), () As descrbed n the Appendx G, the x n Sumx refers to the data ponts mnus the mean of but Equaton nevertheless presents a methodology that allows for ncremental updatng. As before, calculate alpha usng the means of both the ndependent and dependent varables. Ths relatonshp reles on the fact that a ft lne passes through the coordnate equal to the means of and n Equatons 5 and 6. α µ β * µ β * (5) (6) 6 Pebay, Phllp, Formulas for Robust, One-Pass Parallel Computaton of Covarances and Arbtrary-Order Statstcal Moments, Sanda Report, SAD008-6 (September 008). 7 The dervaton of Equaton appears n Appendx F. 8 The dervaton of Equaton appears n Appendx G. IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL 7

BERKELE RESEARCH GROUP WHITE PAPER These means or summatons can be calculated usng the algorthm n Equaton 5, because no other part of the calculatons depend on the prevalng mean. The β s calculated wth Equaton. Other Statstcs A large number of statstcs nvolved wth lnear regresson could potentally be calculated wth a one-pass algorthm: total sum of squares, error sum of squares, regresson sum of squares, R-square, F statstc, the standard error of estmate, the standard error of the slope, the standard error of the ntercept, and t-tests of regresson parameters. Ths manuscrpt wll not seek to derve one-pass methods to calculate these addtonal statstcal values. Multple regresson s almost always conducted wth the use of matrx operatons: the nverse of a matrx, the transpose of a matrx, and matrx multplcaton. The format does not appear to lend tself to one-pass algorthms. The formula for beta, for example, appears n Equaton 7: In Equaton 7, {} refers to a matrx that contans two or more ndependent varables, multplcaton refers to matrx multplcaton, the symbol { } refers to the transpose of a matrx, the symbol {} refers to the nverse of a matrx, and the vector {y} represents a vector of the devatons from the average of all the values. Ths manuscrpt wll not attempt to ncrementally adapt to, for example, data ponts followng the analyss of data ponts. Exponental Smoothng Exponentally smoothed data s nherently one-pass n nature. A weghted average of prevous values s descrbed n Equaton 8, where an updated average equals a combnaton of the latest sample and the prevous estmated average: Exponental smoothng may offer computatonal effcences over other one-pass methods. By pckng a relatvely low value for α, the statstc should approxmate an average of all data n the sample. Alternatvely, by selectng a relatvely hgh value for α, the statstc can be calbrated to match recent observatons. It follows that another way to create a one-pass method of calculatng the varance s to adopt the method nto the standard defnton of varance. One example of such a hybrd s Equaton 9: The use of α and α make the result an expected value of the sum of squares, whch s also the ntent n Equaton, where the sum of squares s dvded by. To calculate the analogue to the standard devaton, take the square root of the statstc n Equaton 9. Equaton 0 apples the exponental weghtng to the elements n the numerator of the skewness formula n Equaton : SumS Ŝkew (0) ˆ 8 β n ( ' ) ' y ( α) ˆ α Where 0 α ˆ ( ˆ ) ( α) ˆ ˆ α where SumS α ( ˆ ) ( α) SumS (7) (8) (9)

BERKELE RESEARCH GROUP Here, the exponentally skewed average n Equaton 8 s substtuted for the sample mean, and a power of the statstc calculated n Equaton 9 substtutes for the standard devaton. Equaton apples the exponental weghtng to elements n the numerator of the kurtoss formula n Equaton 5: SumK Kˆ urt () ˆ Where SumK α ˆ α SumK As n Equaton 0, the exponentally skewed average n Equaton 8 s substtuted for the sample mean, and the statstc calculated n Equaton 9 substtutes for the standard devaton. Bas adjustments ( ) Takng the mean of a dstrbuton removes a degree of freedom from a sample. Ths s why the formula for varance n Equaton and the formula for standard devaton n Equaton use rather than n the denomnator. A smlar adjustment 9 s necessary to make the formulas for skewness (Equaton 0) unbased for samples of data. Equaton matches the value of skewness as calculated by Mntab. Skew Unbased Skew Based () Lkewse, the sample kurtoss adjusted wth Equaton should match the Mntab measure of unbased excess kurtoss. Excess Kurt Unbased Kurt Based () A slghtly dfferent adjustment s requred to match the skewness calculated by SAS, SPSS, and Excel Skew Unbased Skew Based * () Equaton 5 shows the bas adjustment of kurtoss to match SAS, SPSS, and Excel: Kurt Unbased Kurt Based * ( )( ) (5) Fnally, Equaton 6 provdes the adjustment needed to match the kurtoss calculated wthn SAS, SPSS, and Excel 0 : Kurt These adjustments matter prmarly for small sample szes. Ths adjustment factor would equal.00 f the prelmnary estmate (that s, before adjustng for the bas) s essentally unbased. Fgure shows that these adjustments are neglgble for larger sample szes. 9 Joanes, D.., and C.A. Gll, Comparng Measures of Sample Skewness and Kurtoss, Journal of the Royal Statstcal Socety, Seres D (The Statstcan), 7: (998), 8 89. 0 Joanes, p 8-89 (( ) *Kurt 6) (6) Based Unbased IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL 9

BERKELE RESEARCH GROUP WHITE PAPER FIGURE. BIAS ADJUSTMET 6.00 5.00 ADJUSTMET.00.00.00.00 Mntab Skew Mntab Kurt SAS Skew SAS Kurt 0.00 0 0 0 60 80 SAMPLE SIZE 00 0 Standard Error of the Mean, Varance, Skewness, Kurtoss Monte Carlo smulatons frequently report the average of the outcomes. Because these are sampled estmates of the true mean, t s mportant to measure the standard error of the mean. The standard error s defned by Equaton 7 : (7) The standard error can be derved from the standard devaton of the outcomes and s therefore avalable as a one-pass statstc. The standard error of varance s defned n Equaton 8 : S s Where s represents the varance of a sample (8) The standard error of varance can be derved from the varance of a sample and s therefore avalable as a one-pass statstc. For large values of, the standard error of the sample standard devaton s approxmated by Equaton 9 : s s ( ) (9) Ahn, Sangtae, and Jeffrey Fessler, Standard Error of Mean, Varance, and Standard Devaton Estmators, EECS Department, Unversty of Mchgan (00). http://web.eecs.umch.edu/~fessler/papers/fles/tr/stderr.pdf Accessed 8//5. Ahn and Fessler (00). 0 Ahn and Fessler (00).

BERKELE RESEARCH GROUP The standard error of the sample standard devaton can be approxmated from the sample standard devaton and s therefore avalable as a one-pass statstc. The standard error of the skewness s defned n Equaton 0 : SES 6( ) ( )( )( ) (0) The standard error of skewness s derved from the sample sze and s therefore avalable as a one-pass statstc. The standard error of the sample kurtoss s defned n Equaton 5 : SEK SES ( )( 5) () The standard error of sample kurtoss s derved from the sample sze and s therefore avalable as a one-pass statstc. Conclusons Statstcal routnes bult nto spreadsheets and statstcal packages generally return numercally ndstngushable results for most data sets. Certan data sets create measurement problems usng one or two methods descrbed heren. For these data sets, the algorthms bult around the methodology ntroduced by Welford may provde more accurate results. Ths documentaton prmarly descrbes an applcaton of one-pass methodologes to Monte Carlo trals. In these applcatons, a two-pass method may be mpractcal. Many such Monte Carlo samples are not problematc for ether the textbook or Onlne method. Where the results are the same, t s dffcult to argue that one method s better than the other. Whle the textbook method can produce accurate results most of the tme, a level of uncertanty remans that perhaps a partcular tral pushes nto an area where the textbook method s naccurate. One way to be more confdent about statstcal measurements s to perform them wth two or three dfferent algorthms and confrm that the results are equvalent for whatever precson s requred. Cramer, Duncan, Fundamental Statstcs for Socal Research, (997) p 85. 5 Cramer, p. 89 IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL

BERKELE RESEARCH GROUP Appendx A Dervaton of Textbook Varance Formula (A)... (A)...... (A) * (A)

BERKELE RESEARCH GROUP Appendx B Dervaton of Textbook Skewness Formula ( ) Skew ( ) s (B) ( )( )( ) ( )( )( )... (B) Skew ()( )s [( )( ) ( )( )...] (B) Skew ( )s Skew ()( )s... (B) Skew ()( )s (B5) Skew ()( )s (B6) Skew ()( )s (B7) IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL

BERKELE RESEARCH GROUP Appendx C Dervaton of Textbook Kurtoss Formula Equaton C6 replaces the numerator of Equaton C and Equaton A provdes the denomnator. (C) Kurt (C)... um (C)... um (C)... um 5) ( 6 6 um C 6) ( 6 um C (C7) 6 Kurt

BERKELE RESEARCH GROUP Appendx D Dervaton of Textbook Covarance Formula ρ, ( )( ) - (D) ρ, ( ) - (D) ρ, - (D) ρ, - (D) Appendx E Dervaton of Textbook Regresson Beta Formula The slope of a lnear regresson lne ncludes two terms. 6 The numerator equals the sum of the products of the observatons tmes the amount that the values devate from ther mean. The denomnator s the sum of squared devatons. β ( n ) ( n ) (E)............ β (E) β (E) 6 Wonnacott, Thomas H., and Ronald J. Wonnacott, Introductory Statstcs for Busness and Economcs, second ed. (977),, Equaton -6. IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL 5

BERKELE RESEARCH GROUP 6 Appendx F Dervaton of Onlne Covarance Formula (F), (F), (F) *, (F),,

IMPLEMETIG A TRIOMIAL COVERTIBLE BOD PRICIG MODEL WHITE PAPER BERKELE RESEARCH GROUP 7 Appendx G Dervaton of Welford Regresson Beta Formula (G) Sumx (G) Sumx n ` (G) Sum Sumx Sumx (G) Sum Sumx Sumx (G5) n β (G6) Sumx, β (G7) Sum Sumx, β