ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

Similar documents
Spreadsheet Directions

Prepared By. Handaru Jati, Ph.D. Universitas Negeri Yogyakarta.

Gamma Distribution Fitting

WEB APPENDIX 8A 7.1 ( 8.9)

Risk Analysis. å To change Benchmark tickers:

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price

Computing interest and composition of functions:

You should already have a worksheet with the Basic Plus Plan details in it as well as another plan you have chosen from ehealthinsurance.com.

Descriptive Statistics

An Excel Modeling Practice Problem

MLC at Boise State Logarithms Activity 6 Week #8

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Jacob: The illustrative worksheet shows the values of the simulation parameters in the upper left section (Cells D5:F10). Is this for documentation?

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Point-Biserial and Biserial Correlations

DECISION SUPPORT Risk handout. Simulating Spreadsheet models

Arius Deterministic Exhibit Statistics

Written by N.Nilgün Çokça. Advance Excel. Part One. Using Excel for Data Analysis

REGIONAL WORKSHOP ON TRAFFIC FORECASTING AND ECONOMIC PLANNING

DATA SUMMARIZATION AND VISUALIZATION

ESD.70J Engineering Economy

NOTE: A trend line cannot be added to data series in a stacked, 3-D, radar, pie, surface, or doughnut chart.

MLC at Boise State Polynomials Activity 2 Week #3

Jacob: What data do we use? Do we compile paid loss triangles for a line of business?

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Data screening, transformations: MRC05

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Important definitions and helpful examples related to this project are provided in Chapter 3 of the NAU MAT 114 course website.

Decision Trees Using TreePlan

CHAPTER TOPICS STATISTIK & PROBABILITAS. Copyright 2017 By. Ir. Arthur Daniel Limantara, MM, MT.

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

Common Compensation Terms & Formulas

INTERNATIONAL TRADE AND THE WORLD ECONOMY. Answers to * exercises in chapter 1 of the Study Guide

University of Texas at Dallas School of Management. Investment Management Spring Estimation of Systematic and Factor Risks (Due April 1)

Math 1526 Summer 2000 Session 1

SFSU FIN822 Project 1

MLC at Boise State Polynomials Activity 3 Week #5

starting on 5/1/1953 up until 2/1/2017.

DazStat. Introduction. Installation. DazStat is an Excel add-in for Excel 2003 and Excel 2007.

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

INTRODUCING RISK MODELING IN CORPORATE FINANCE

MLC at Boise State Lines and Rates Activity 1 Week #2

An application program that can quickly handle calculations. A spreadsheet uses numbers like a word processor uses words.

Workshop 1. Descriptive Statistics, Distributions, Sampling and Monte Carlo Simulation. Part I: The Firestone Case 1

Expected Return Methodologies in Morningstar Direct Asset Allocation

GETTING STARTED. To OPEN MINITAB: Click Start>Programs>Minitab14>Minitab14 or Click Minitab 14 on your Desktop

Discrete Probability Distributions

A. A spreadsheet file contains at least one or more worksheets. A worksheet is a single page in a spreadsheet file.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

NCSS Statistical Software. Reference Intervals

Two-Sample T-Test for Superiority by a Margin

2. ANALYTICAL TOOLS. E(X) = P i X i = X (2.1) i=1

Chapter 6 Analyzing Accumulated Change: Integrals in Action

SPSS I: Menu Basics Practice Exercises Target Software & Version: SPSS V Last Updated on January 17, 2007 Created by Jennifer Ortman

Two-Sample T-Test for Non-Inferiority

MRA Volume III: Changes for Reprinting December 2008

Maximum Likelihood Estimates for Alpha and Beta With Zero SAIDI Days

Question from Session Two

What s Normal? Chapter 8. Hitting the Curve. In This Chapter

How To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion

Lab 6. Microsoft Excel

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

5.- RISK ANALYSIS. Business Plan

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

Steps for Software to Do Simulation Modeling (New Update 02/15/01)

Statistics TI-83 Usage Handout

Answers to Exercise 8

Bidding Decision Example

Simulation. Decision Models

ExcelSim 2003 Documentation

The Advanced Budget Project Part D The Budget Report

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Web Extension: Continuous Distributions and Estimating Beta with a Calculator

Sampling Distributions

DATA HANDLING Five-Number Summary

ก ก ก ก ก ก ก. ก (Food Safety Risk Assessment Workshop) 1 : Fundamental ( ก ( NAC 2010)) 2 3 : Excel and Statistics Simulation Software\

PSCM_ Data Analytics

Value of Information in Spreadsheet Monte Carlo Simulation Models

Introduction to Basic Excel Functions and Formulae Note: Basic Functions Note: Function Key(s)/Input Description 1. Sum 2. Product

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Lecture 2 Describing Data

Excel Tutorial 9: Working with Financial Tools and Functions TRUE/FALSE 1. The fv argument is required in the PMT function.

FSA 3.4 Feature Description

<Partner Name> <Partner Product> RSA ARCHER GRC Platform Implementation Guide. 6.3

Prentice Hall Connected Mathematics 2, 7th Grade Units 2009 Correlated to: Minnesota K-12 Academic Standards in Mathematics, 9/2008 (Grade 7)

Excel Build a Salary Schedule 03/15/2017

Tests for the Difference Between Two Linear Regression Intercepts

Computing compound interest and composition of functions

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Continuous Probability Distributions

Crop Storage Analysis: Program Overview

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

Two-Sample T-Tests using Effect Size

StockFinder 5 Workbook

Fundamentals of Statistics

σ e, which will be large when prediction errors are Linear regression model

StockFinder Workbook. Fast and flexible sorting and rule-based scanning. Charting with the largest selection of indicators available

Transcription:

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA Michael R. Middleton, McLaren School of Business, University of San Francisco 0 Fulton Street, San Francisco, CA -00 -- middleton@usfca.edu ABSTRACT Flight capacity can be modified by using different seating configurations or different aircraft. An airline planning department wants to estimate the distribution of demand for a specific flight as an aid for deciding on flight capacity. Historical sales data for a specific flight are available, but the number of passenger tickets sold is bounded by the number of seats available. The planners are willing to assume that underlying demand is normally distributed. This paper describes an estimation method using normal scores, a normal probability plot, and simple linear regression. Step-by-step details using Excel or () spreadsheet software are included. Finally, the results are compared with the hazard function method of censored data analysis. BACKGROUND The vice president for planning of a major international airline must decide on the type of aircraft and seating configuration for each scheduled flight []. One of the inputs to this decision is the probability distribution of demand. The airline has historical data on the number of seats sold for each flight, but no information is kept about customer demand after all seats are sold. The planners believe that demand for each flight is normally distributed, and they need a systematic method for estimating the mean and standard deviation of demand for all flights using historical sales data. The historical data in this problem are an example of censored data where seat capacity places an upper bound on the values. Many methods for analyzing censored data have been developed for lifetime data and survival rates, including the hazard function method by Nelson []. Kesling has described a variety of business applications using the hazard function method for censored data analysis []. This paper first describes an estimation method for unbounded data using normal scores, a normal probability plot, and simple linear regression. Then the same method is modified for the bounded sales data. Last, the hazard function method is used and compared. All methods described are supported by Excel spreadsheet software. UNBOUNDED DATA If there were unlimited seat capacity on a flight, then the historical sales data would be the same as the demand data. The planners could check for normality using several methods: visually look for nonnormal patterns in a histogram, count the number of observations within,, and standard deviations of the mean, conduct a chisquare goodness-of-fit test, or check for a straight-line pattern on a normal probability plot [, p.]. Since only the latter method can be extended to the situation with bounded data, that approach is described here. s Assume a random sample of n = 0 values from a normal distribution. Using a bracket median approach, we can regard the lowest sample value as an estimate from the lowest % = /n range of population values and specifically as an estimate of the median of that range, which is the.0 = /n fractile [, p. 0]. (A fractile is a value below which that proportion of a distribution s values lie.) Referring to the standard normal distribution, the.0 fractile is located. standard deviations below the mean. This ideal location,., is the normal score for the lowest sample value. Similarly, the second lowest value in a sample of 0 is a general estimate of the next % range of population values and a specific estimate of the.0 = /n fractile, located. standard deviations below the mean, with normal score.. In general, if the n sample values have ranks i =,...,n, with rank i = for the lowest value and rank i = n for the highest, the normal score using the bracket median method is associated with the (i 0.)/n fractile. Figure shows a normal cumulative probability curve with the location of 0 equally-likely values on the horizontal axis expressed as z-scores, or standard deviation units from the mean. If a random sample of 0 values is normally distributed, we expect a dot plot of those values to have approximately the same spacing as these z-scores. A scatter plot called a normal probability plot can be used to visually compare the spacing of actual sample values with the ideal spacing of the z-scores. If the relative locations are similar, the pattern of the scatter plot will be approximately linear.

Cumulative Probability 0 0.0 0. 0. 0. 0. 0. 0. 0. 0. 0. FIGURE Normal Cumulative Distribution 0.0 - - - 0 FIGURE Worksheet for Unbounded Data A B C D Rank Cumul. Prob. Sorted Data 0.0 -.0 0.0 -.0 0. -.0 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0 0. -0.0 0. 0.0 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0..0 0 0..0 0 0 0..0 B:B. In cell C enter the formula =NORMSINV(B), and copy to cells C:C. To obtain the normal probability plot shown in Figure, select cells C:D and use the Chart Wizard. (To ensure that the Chart Wizard makes appropriate assumptions, the data is arranged on the worksheet with the x-axis data in a column on the left and the y-axis data on the right.) In step, verify the range; in step, choose XY (Scatter); in step, choose format ; in step, verify the entries; in step, click No for adding a legend, type for the X axis title, type for the Y axis title, and click Finish. To format the embedded chart, double-click it to activate it for editing. Select the horizontal axis, right-click, and choose Format Axis from the shortcut menu; on the Scale tab, type for Minimum, for Maximum, for Major Unit, and for Value (Y) Axis Crosses At; on the Number tab, set Decimal Places to 0; click OK. Select the vertical axis, right-click, and choose Format Axis from the shortcut menu; on the Scale tab, type 00 for Minimum, 0 for Maximum, and 00 for Value (X) Axis Crosses At; click OK. To insert a trendline, select the data series by clicking on one of the points. From the Insert menu, choose Trendline. Click the Type tab of the Trendline dialog box, and click the Linear icon. Click the Options tab, and click the check boxes for Display Equation on Chart and Display R-squared Value on Chart. Click OK. FIGURE Normal Probability Plot for Unbounded Data 0 00 0 00 y =.0x +. R = 0. Normal Probability Plot To develop a worksheet for a normal probability plot using Excel, enter the data in column D, and sort in ascending order as shown in Figure. (Here the data are a simulated random sample from a normal distribution with mean 0 and standard deviation 0.) Enter ranks in column A. In cell B enter the formula =(A 0.)/0, and copy to cells 0 00 - - - 0

The results are shown in Figure. The normal probability plot allows visual verification of normality, and the linear trendline produces estimates of the mean. (the intercept, when the normal score is zero) and standard deviation.0 (the slope, or the change in seats for a unit change in the normal score). For unbounded data, these estimates could be obtained directly from the sample data; the advantage of the method shown in Figure is that it can also be applied to the case of bounded data. BOUNDED DATA Figure shows a histogram of simulated data for 0 trips of the same flight. In actual practice the observations should be restricted to indistinguishable, stationary situations, e.g., a Boeing with coach seats on nonholiday Thursday afternoon flights from Hong Kong to Tokyo [, p. ]. The seating capacity obscures demand exceeding. The sorted data are shown in column D of Figure. Frequency 0 FIGURE Histogram of Bounded Data 0 0 00 0 0 0 0 00 for these bounded data, the trendline is based on only the unbounded observations. 0 0 FIGURE Worksheet for Bounded Data A B C D Rank Cumul. Prob. Sorted Data 0.0 -.0 0.0 -.0 0. -.0 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0 0. -0.0 0. 0.0 0. 0. 0. 0. 0. 0. 0. 0. 0 Mean Intercept. StDev Slope. FIGURE Normal Probability Plot for Bounded Data 0 00 0 y =.x +. R = 0. To analyze the bounded data using Excel, prepare the worksheet as previously described. The cumulative probabilities and normal scores are not needed for the censored data. The results are shown in rows through of Figure. To obtain the normal probability plot shown in Figure, select cells C:D, use the Chart Wizard, format the chart, and insert a trendline as previously described. For the unbounded data, the trendline used all observations; 00 0 00 - - - 0

To obtain the regression results on the worksheet, enter the labels in cells B:C as shown in Figure. In cell D, enter the formula =INTERCEPT(D:D,C:C); in cell D, enter the formula =SLOPE(D:D,C:C). The results may be interpreted like those for unbounded data. The estimate of normally distributed seat demand is =. +.*Z, where Z is the normal score. The mean demand,., corresponds to Z = 0; the standard deviation of demand,., corresponds to the change in demand for a unit change in Z. HAZARD FUNCTION The hazard function method uses a different technique for determining the cumulative probabilities and a different orientation for the normal probability plot. Figure arranges the calculations described by Kesling []. 0 0 FIGURE Worksheet for Hazard Function A B C D E F Sorted Cumul. Cumul. Normal Data Rank Hazard Hazard Prob. Score 0.000 0.000 0.0 -. 0.0 0.0 0.0 -. 0.0 0. 0. -.0 0.0 0.0 0. -0. 0.0 0. 0. -0. 0.0 0. 0. -0. 0.0 0. 0. -0.0 0.0 0. 0.0-0. 0.0 0. 0. -0. 0 0.00 0. 0. -0.0 0.000 0. 0. 0.0 0. 0. 0. 0. 0.0.00 0. 0. 0.. 0. 0. 0.. 0. 0. 0 0 0 0 0 0 Intercept -. Mean. Slope 0.0 StDev. the hazard function is zero. In cell C, enter the formula =/( B), and copy to cells C:C. To obtain cumulative hazard, in cell D enter the formula =C, in cell D enter the formula =D+C, and copy the formula from D to cells D:D. To obtain cumulative probability, in cell E enter the formula = EXP( D), and copy to cells E:E. To obtain normal scores, in cell F enter the formula =NORMSINV(E), and copy to cells F:F. To obtain the regression results, enter the labels in rows and as shown in Figure. In cell B, enter the formula =INTERCEPT(F:F,A:A); in cell B, enter the formula =SLOPE(F:F,A:A). These are coefficients for linear regression of fitted normal score depending on seats sold. The estimated equation is Z =. + 0.*. The mean of normally distributed demand corresponds to a normal score of zero, so the mean of equals the negative of the intercept divided by the slope. The slope in the equation is change in Z per seat, so the standard deviation of demand equals the reciprocal of the slope. To obtain these values, in cell E enter the formula = B/B, and in cell E enter the formula =/B. The hazard function method produces a mean of and a standard deviation of for normally distributed demand. To construct a normal probability plot, first select A:A, hold down the Control key while selecting F:F, and use the Chart Wizard. After inserting a trendline and formatting, the results are shown in Figure. FIGURE Normal Probability Plot for Hazard Function - - y = 0.0x -. R = 0. 0 00 0 00 0 00 0 To develop the worksheet for the hazard function method, enter the sorted data and ranks in columns A and B. For each observed value, the hazard function in column C equals the reciprocal of the number of data values greater than or equal to the observed value; for suspended values, -

DISCUSSION The two methods produce nearly identical estimates of the mean: from the hazard function method and from the direct method proposed here. The estimates of the standard deviation show a larger difference: vs.. Even though the hazard function method uses less extreme normal scores, the estimate of standard deviation is larger. Future research could investigate the reasons for these different results. One factor is the different orientation of the normal probability plots. The direct method plots seats vs. normal score, so the regression line minimizes sum of squared deviations in terms of seats. Conversely, the hazard function method plots normal score vs. seats, so the deviations that are minimized are fitted normal score minus actual normal score. The rationale for choosing between these two orientations should be explored. Another factor is the different methods for determining cumulative probabilities, which affect the normal scores used in the regression. The bracket median method is relatively easy to explain; the rationale for the hazard function method is more obscure. These methods and two others are shown in Figure. The method labeled Cryer uses i/(n+) to determine the cumulative probability for rank i [, p. ]. The method labeled Neter uses (i /)/(n+/), which is described as yielding a good approximation based on statistical theory [, p. 0]. The Neter method produces normal scores almost as extreme as the bracket median method. The hazard method with censored values in a sample of 0 generates the least extreme normal scores of the four methods. FIGURE Cumulative Probabilities and s Bracket Median Hazard ( of 0) Cryer Neter Rank Prob. Score Prob. Score Prob. Score Prob. Score 0.0 -.0 0.0 -. 0.0 -. 0.0 -. 0.0 -.0 0.0 -. 0.0 -.0 0.00 -.0 0. -.0 0. -.0 0. -.0 0.0 -. 0. -0. 0. -0. 0.0-0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0. 0. -0.0 0. -0. 0. -0. 0. -0. 0.0-0. 0. -0.0 0. -0. 0. -0. 0. -0. 0. -0.0 0. -0. 0 0. -0.0 0. -0.0 0. -0.00 0. -0.0 0. 0.0 0. 0.0 0. 0.00 0. 0.0 0. 0. 0. 0. 0. 0.0 0. 0. 0. 0. 0. 0. 0. 0.0 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.0 0. 0. 0. 0..0 0..0 0.0. 0..0 0.0.0 0.0.0 0 0..0 0.. 0.. The direct method (using bracket median, Cryer, or Neter cumulative probabilities) and the hazard function method can be applied to data with missing values at the low end, interspersed throughout, or at the high end. These methods can also be applied to nonnormal distributions for which the inverse cumulative function is available. In addition to the standard normal inverse and normal inverse, Excel has inverse functions for the beta, chisquare, exponential, F, gamma, lognormal, and t distributions. Since the computations are easy to perform with spreadsheet software, it is likely that these methods will become more widely used in business decision analysis. REFERENCES [] Cryer, J.D., and Miller, R.B. Statistics for business: Data analysis and modeling, nd ed. Belmont, CA: Duxbury,. [] Kesling, G.D. Censored data analysis in business using the hazard function method. Proceedings of Western Decision Sciences Institute,,. [] Nelson, W. Theory and applications of hazard plotting for censored failure data. Technometrics,, (), -. [] Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. Applied linear statistical models, th ed. Chicago, IL: Irwin,. [] Slosar, J. Personal communication,. [] Vatter, P.A., Bradley, S.P., Frey, S.C., and Jackson, B.B. Quantitative methods in management: Text and cases. Homewood, IL: Irwin,. This paper was presented at the annual meeting of the Western Decision Sciences Institute, March -,, and published in the conference Proceedings, pp. -.