Steps for Software to Do Simulation Modeling (New Update 02/15/01)

Steps for Using @RISK Software to Do Simulation Modeling (New Update 02/15/01) Important! Before we get to the steps, we want to provide several notes to help you do the steps. Use the browser s Back button after you have completed the @RISK work to close this PDF file and return to the risk analysis discussion. Adobe PDF Files This file is an Adobe PDF file with its own navigation tools. Use the bookmark links to the left or the arrows at the top or bottom of the Adobe window. Open @RISK To open @RISK, go to Start! Programs! Palisade DecisionTools! @RISK 4.0 for Excel. Your computer screen will flicker as Excel opens; a dialog box will ask you if you want to Enable Macros. Click/Select Yes. An Excel window will open with two additional sets of Toolbars with the DecisionTools icons and the @RISK icons. Software Version Options The images shown in these instructions are taken from @RISK s version 4 software. They may vary from what your software produces because of ongoing updates; however, the basic instructions should be similar. We typically show one or two ways to do the steps; however, as you become familiar with the software, you will encounter additional options. Cell References Cell references such as Cell B3 refer to Column B, Row 3. Command Control Devices Miscellaneous Notes Click, Select, and Enter all mean, Use one of your command-control devices (mouse, keyboard, voice-activated software, stylus, and the like) to perform the action indicated. For example, you will be selecting cells, buttons, and icons; you will be entering text and numbers in cells and dialog boxes. The steps provide the basics on how to use Palisade Corporation s @RISK software to run a simulation. The estimated time to complete the steps is 90 minutes for those with Excel experience. Hyperlinks use formatting (bold, underscored text in a magenta color) like this example: Hyperlink. Points of emphasis use formatting (bold text in a magenta color) like this example: Points of emphasis. We may insert Notes icons in the document to provide additions and/or corrections. Double Click the icons to open them. Use the minus sign in the upper left corner to close them. The images below will not open or close. Notes Icon sample (closed) Notes box sample (open) Select the minus sign to close the box.

Break Even Analysis: How many widgets will have to be sold to break even? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 A B C D E F Sell Price 200 Labor 30 Material 20 QC 2 R&D 80000 Facilities 156000 Equipment 60000 Figure 1A. Spreadsheet Showing Constants from the Naïve Approach Step 1: Set Up Your Excel Spreadsheet Fixed Costs Variable Costs Per Unit Sell Price - Variable Costs Per Unit Break Even 1.1 Set up your spreadsheet using information from the naïve approach for the original problem. The naive approach uses constants for all of the numbers. Refer to Formatting an Excel Spreadsheet if you need help formatting numbers, adjusting column width, formatting text, and so on. 1.2 Continue to Step 2 to enter formulas in your spreadsheet.

Break Even Analysis: How many widgets will have to be sold to break even? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 A B C D E F Sell Price 200 Labor 30 Material 20 QC 2 R&D 80000 Facilities 156000 Equipment 60000 Figure 2A. Spreadsheet Showing Formulas Fixed Costs Variable Costs Per Unit Sell Price-Variable Costs Per Unit Break Even =B8+B9+B10 =B5+B6+B7 =B4-D2 =C2/E2 Step 2: Enter Formulas in Your Excel Spreadsheet 2.1 Enter formulas in Cells C2, D2, E2, and F2 exactly as they are shown in Figure 2A. Do not use spaces between the signs and cell references. Always begin a formula by typing in the equals sign. To end the formula and show the results, Click or Select the Enter key on your keyboard. For example, for Fixed Costs in Cell C2, - Type the equals sign in Cell C2. - Click/Select Cell B8 or type B8 after the equals sign in Cell C2. - Type the plus sign after B8. - Click/Select Cell B9 or type B9 after the plus sign. - Type the plus sign after B9. - Click/Select Cell B10 or type B10 after the second plus sign. - Click/Select the Enter key on your keyboard. At this point, the formula will change to the result (256). 2.2 Format cells with numbers. - Hold down the Control key (keyboard) and Click/Select each cell with a number. - Go to the Format menu and Click/Select Cells. - Click/Select the Number tab. - Click/Select Currency. - Change the number of decimal places to 0. - Click/Select the Enter key (keyboard) or the OK button (dialog box) to close the dialog box. 2.3 Continue to Step 3 to review the results and formatting.

Break Even Analysis: How many widgets will have to be sold to break even? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 A B C D E F G H Sell Price $200 Labor $30 Material $20 QC $2 R&D $80,000 Facilities $156,000 Equipment $60,000 Fixed Costs Variable Costs Per Unit Sell Price - Variable Costs Per Unit Break Even $296,000 $52 $148 $2,000 Figure 3A. Spreadsheet Showing Results and Formatting Figure 3B. Cell C2 Formula in Excel's Formula Bar Figure 3C. Check Formulas in the Options Dialog Box Step 3: Review the Results and Formatting 3.1 To show one formula at a time, Click/Select a cell (C2, D2, E2, F2) and the formula will appear in the Formula Bar above the column headings. Figure 3B shows the formula for Cell C2. 3.2 To view all formulas at the same time, go to Excel's Tools menu. Scroll down and Click/Select Options. Click/Select the View tab. In Window Options, check the Formulas box (Figure 3C). Click/Select the Enter Key (keyboard) or the OK button (dialog box) to close the Options dialog box. 3.3 The next page shows the spreadsheet both ways.

Observe your spreadsheet first with formulas, and second with results. Figure 2A. Spreadsheet Showing Formulas Figure 3A. Spreadsheet Showing Results and Formatting As you continue to Step 4, you will move from using the naive approach (where constants were used) to a realistic approach (where constants are replaced with the distributions of random variables). Naive Approach constants Realistic Approach distributions replace constants (where appropriate)

Break Even Analysis: How many widgets will have to be sold to break even? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 A B C D E F G H Sell Price $200 Labor $30 Material $20 QC $2 R&D $80,000 Facilities $156,000 Equipment $60,000 Figure 4A. Your Spreadsheet Fixed Costs Variable Costs Per Unit Sell Price - Variable Costs Per Unit Break Even $296,000 $52 $148 $2,000 Step 4: Identify Numbers as Constants, Variables, and Random Variables 4.1 Use "Paper and Pencil Thinking" and identify the highlighted numbers in your spreadsheet as constants, variables, or random variables. Paper and Pencil Thinking 4.2 Continue to Step 5 to identify distributions for the random variables. For this problem, we will show Equipment as a random variable so it will be included in the @RISK analysis. Later we will explain how to make a constant a random variable.

Review the Logic of the @RISK Formulas Quantitative Item Constant, Variable, or Random Variable? Distribution Type @RISK Formula Distribution Displayed Visually Price Random Variable Uniform =RiskUniform(200,400) Labor Random Variable Lognormal =RiskLognorm(30,15) Materials Random Variable Discrete =RiskDiscrete({14,20,27},{0.5,0.3,0.2}) QC Random Variable Normal =RiskNormal(2,0.7) R&D Random Variable Triangular =RiskTriang(60000,80000,120000) Facilities Random Variable Uniform =RiskUniform(150000,160000) Equipment Random Variable Discrete =RiskDiscrete({60000},{1.0}) Continue to Step 7 to observe the distributions in your spreadsheet.

Break Even Analysis: How many widgets will have to be sold to break even? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Sell Price Labor Material QC R&D Facilities Equipment A B C D E F G =RiskUniform(200,400) =RiskLognorm(30,15) =RiskDiscrete({14,20,27},{0.5,0.3,0.2}) =RiskNormal(2,0.7) =RiskTriang(60000,80000,120000) =RiskUniform(150000,160000) =RiskDiscrete({60000},{1}) Figure 7A. Spreadsheet Showing Formulas Figure 7B. @RISK's Define Distributions Icon Step 7: Enter Formulas in Your Spreadsheet Fixed Costs Variable Costs Per Unit Sell Price - Variable Costs Per Unit Break Even =B8+B9+B10 =B5+B6+B7 =B4-D2 =C2/E2 7.1 Enter @RISK formulas for the distributions in your spreadsheet. (Figure 7A) Show formulas for calculations. There are several ways to enter formulas in your spreadsheet: Formulas for Distributions Figure 7C. Excel's Function Wizard Type formulas directly into your spreadsheet exactly as they are shown above in the highlighted areas. You must use proper syntax (arrangement of components such as commas, spaces, parenthesis, and so on). Use @RISK's Define Distributions Icon (Figure 7B) to visually see the shape of the distribution as values are entered for its parameters. Step-by-Step instructions are shown on the next page. Use Excel's Function Wizard (fx ) (Figure 7C) to show a list of possible distributions. Formulas for Calculations 7.2 Continue to the next page to review Step-by-Step instructions for Using @RISK's Define Distributions Icon. Step 8 shows the results from when you entered formulas in your spreadsheet.

Figure 7D. Cell B7 Figure 7E. Define Distributions Icon Figure 7F. Define Distributions Window Using @RISK s Define Distribution Icon Click/Select Cell B7 to define a distribution for it. (Figure 7D) You can do this with any cell in your spreadsheet. Click/Select @RISK s Define Distributions icon (Figure 7E) Work with the Define Distributions window. (Figure 7F) In this example, the Source will be Function. The Distribution will be Normal. The parameters µ and σ will be 2 and 0.7. As you enter data, the distribution s shape will change accordingly. Continue to Step 8 to see the results of entering formulas in your spreadsheet.

Break Even Analysis: How many widgets will have to be sold to break even? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 A B C D E F G Sell Price $300 Labor $30 Material $20 QC $2 R&D $86,667 Facilities $155,000 Equipment $60,000 Figure 8A. Spreadsheet Showing Results Step 8: Review the Results Fixed Costs Variable Costs Per Unit Sell Price - Variable Costs Per Unit Break Even $301,667 $52 $248 $1,216 8.1 Review the results in Column B of your spreadsheet. They are the expected values of their respective distributions. As a default option, @RISk calculates the expected value. Step 12 will show you other options; however, we will use the expected value for the example problem. Once you exhibit the results in your spreadsheet (Figure 8A), you can view the formulas again. See Step 3 for a refresher. 8.2 Continue to the next page to view the spreadsheet with and without formulas. Note: The upcoming steps will include only a few of @RISK's many options. We will present the basic steps needed to set up and run a simulation. In addition, we will include excerpts from Palisade's online manual as well as a link to the entire manual if you would like to explore @RISK's advanced steps.

Observe your spreadsheet first with formulas, and second with results. Figure 7A. Spreadsheet Showing Formulas for @RISK Distributions and Miscellaneous Formulas for Calculations Note: These numbers represent the expected values of their respective distributions. Figure 8A. Spreadsheet Showing Results and Formatting Continue to Step 9 to run an @RISK simulation.

Break Even Analysis: How many widgets will have to be sold to break even? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 A B C D E F G Sell Price $300 Labor $30 Material $20 QC $2 R&D $86,667 Facilities $155,000 Equipment $60,000 Figure 9A. Spreadsheet Showing Inputs and Output Figure 9B. DecisionTools Icons Step 9: Set Up a Simulation Figure 9C. @RISK Icons Fixed Costs Variable Costs Per Unit Sell Price - Variable Costs Per Unit Break Even $301,667 $52 $248 $1,216 Add Output Simulation Settings Display List of Outputs and Inputs 9.1 Review Figures 9B and 9C to observe the DecisionTools and @RISK icons you will use. The active icons are located near the top of your screen. Hover your mouse over the icons to see their names. 9.2 Click/Select Cell F2. Cell F2 holds the number of widgets the company must sell to break even. This is the output you will simulate. Cell F2 is based on the input in Cells B4, B5, B6, B7, B8, B9, and B10. 9.3 Click/Select the @RISK icon Add Output. (Your screen will flicker briefly.) Start Simulation Report Settings 9.4 Click/Select the @RISK icon Display List of Outputs and Inputs. An @RISK screen will open to show an @RISK Model. (Figure 9D on the next page) It lists the inputs and outputs @RISK will include in its analysis. Earlier we mentioned that Equipment would be shown as a random variable even though technically it was a constant. By making it a random variable, it will show up in the list of inputs along with the other random variables.

Figure 9D. @RISK Model Showing Outputs and Inputs Continue to Step 10 to open the Simulation Settings dialog box.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Break Even Analysis: How many widgets will have to be sold to break even? A B C D E F G Sell Price $300 Labor $30 Material $20 QC $2 R&D $86,667 Facilities $155,000 Equipment $60,000 Figure 10A. Spreadsheet Showing Inputs and Outputs Figure 10B. Simulation Settings Icon Figure 10C. Simulation Settings Dialog Box Step 10: Open the Simulation Settings Dialog Box Fixed Costs Variable Costs Per Unit Sell Price - Variable Costs Per Unit Break Even $301,667 $52 $248 $1,216 10.1 Click/Select the @RISK icon Simulation Settings (Figure 10B) to open the Simulation Settings dialog box. (Figure 10C) 10.2 Continue to Step 11 to select the simulation settings.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Break Even Analysis: How many widgets will have to be sold to break even? A B C D E F G Sell Price $300 Labor $30 Material $20 QC $2 R&D $86,667 Facilities $155,000 Equipment $60,000 Figure 11A. Simulation Settings: Iterations Tab Step 11: Select the Simulation Settings 11.1 Select the Iterations tab (Figure 11A) and change the # Iterations to 1000. Fixed Costs Variable Costs Per Unit Sell Price - Variable Costs Per Unit Break Even $301,667 $52 $248 $1,216 11.2 Select the Sampling tab (Figure 11B) and the following settings: Sampling Type - Monte Carlo; Standard Recalc - Expected Value; Random Generator Seed - Choose Randomly; Collect Distribution Samples - All. 11.3 Continue to Step 12 to run the simulation. Figure 11B. Simulation Settings: Sampling Tab

Figure 12E. @RISK Results Showing Summary Statistics Note: When @RISK produces models and reports, it places them in separate windows. To toggle between windows, go to the Excel menu bar at the top of your screen. Click/Select Window and scroll to the file you want to view. Figure 12F. Excel Menu Bar Review additional @RISK Results Windows in Figure 12G on the next page.

Click/Select these icons to open the @RISK Results Windows shown below. In Step 13, you will reproduce these reports in Excel so you can format and print them. Figure 12G. Miscellaneous @RISK Results Windows Continue to Step 13 to select the Report Settings dialog box.

Break Even Analysis: How many widgets will have to be sold to break even? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 A B C D E F G Figure 13A. Report Settings Figure 13C. Worksheet Tabs Showing Several @RISK Report Tabs Step 13: Select the Report Settings 13.1 Click/Select @RISK's Report Settings icon. (Figure 13A) 13.2 Select the @RISK Reports shown in Figure 13B. Generate Reports Now Figure 13B. @RISK Reports 13.3 Click/Select the Generate Reports Now button. The @RISK screen will flash as reports are generated in a new Excel workbook. Excel stores each report in individual worksheets identified by the tabs at the bottom of your screen. Figure 13C shows examples of several report tabs. Several of these reports are formatted and shown on the following pages. They will be explained in Step 14. 13.4 Save your new workbook with its @RISK reports. You can format each report just as you format any Excel output. 13.5 Review several formatted reports; then continue to Step 14 to Understand @RISK's Terminology, Reports, and Graphs.

@RISK Summary Report General Information Workbook Name Steps_Realistic_2a_RunSimulation.xls Number of Simulations 1 Number of Iterations 1000 Number of Inputs 7 Number of Outputs 1 Sampling Type Monte Carlo Simulation Start Time 2/8/01 23:51:49 Simulation Stop Time 2/8/01 23:51:54 Simulation Duration 0:00:05 Random Seed 1 Output and Input Summary Statistics Output Name Output Cell Simulation# Minimum Maximum Mean Std Dev 5% 95% Break Even $F$2 1 784.5050659 3182.568848 1292.473697 355.7138697 868.1774902 1960.62439 Input Name Input Cell Simulation# Minimum Maximum Mean Std Dev 5% 95% Sell Price $B$4 1 200.1490631 399.8982849 299.240982 58.2709679 209.239563 390.69397 Labor $B$5 1 7.111316681 100.8699112 29.96570297 15.09726057 12.79477596 59.6433449 Material $B$6 1 14 27 18.373 5.006689319 14 27 QC $B$7 1-0.436765283 4.091255665 1.988328282 0.70235574 0.802909076 3.10146427 R&D $B$8 1 60390.35938 119025.3203 86248.97443 12601.93164 66929.07031 109491.07 Facilities $B$9 1 150009.7344 159997.5313 154962.5493 2896.524848 150470.9531 159389.25 Equipment $B$10 1 60000 60000 60000 0 60000 60000 This section shows several of the reports you can produce with @RISK (version 4) and Excel. You can use Excel s formatting tools to format your @RISK reports.

@RISK Output Data Report Output Data Outputs Break Even Simulation# 1 Iteration / Cell $F$2 1 909.9862061 2 916.6084595 3 1111.341309 4 878.1502686 5 1626.600952 6 951.9277954 7 1420.436279 8 1533.885376 9 1841.775513 10 1563.875122 11 1606.596924 12 1427.765503 13 1748.101074 14 1223.054932 15 966.0921021 16 1179.120361 498 1398.364868 499 2206.767822 500 1214.237793 501 2088.738037 502 1394.971558 503 1113.849121 504 1141.11731 505 1080.449219 506 975.128418 507 1529.310303 508 974.7440186 509 1106.268677 510 1723.029297 511 811.2329712 512 942.6470337 513 1094.567383 514 1624.741211 984 900.4551392 985 1474.117554 986 1422.085693 987 1655.520142 988 2543.87793 989 981.9345093 990 926.5761719 991 1417.977661 992 1299.375977 993 2104.631836 994 1236.593506 995 1513.125732 996 889.6911621 997 1517.675903 998 1291.281372 999 1377.985962 1000 1623.665283 These are some of the 1000 output results produced when you set the number of iterations at 1000.

@RISK Input Data Report Input Data Inputs Sell Price Labor Material QC R&D Facilities Equipment Simulation# 1 1 1 1 1 1 1 Iteration / Cell $B$4 $B$5 $B$6 $B$7 $B$8 $B$9 $B$10 1 393.4192505 21.00489616 27 2.225671768 95739.75781 156557.2031 60000 2 371.8695374 23.82263184 20 2.30277276 80237.78906 158342.0469 60000 3 302.93573 16.72019386 14 1.500461817 85121.8125 155735.0469 60000 4 378.7402649 38.11558533 20 1.763113141 69798.36719 150210 60000 5 222.2989655 22.63306999 20 2.008236885 77635.98438 151342.125 60000 6 347.849884 27.64040565 14 2.351380348 77381.58594 151869.3906 60000 7 289.8354187 33.26564026 20 1.029378533 116162.8203 158407.2969 60000 8 258.912323 35.25548172 27 1.931726217 85117.28125 153568.75 60000 9 208.5817108 37.40048599 14 0.490382016 72523.27344 156066.0625 60000 10 229.759079 26.52225685 14 2.866286039 73803.64844 157656.6094 60000 11 232.8937988 17.34792328 20 1.407278895 101355.7422 150546.7031 60000 12 255.6546631 23.35033989 27 0.750274599 77732.82813 154322.375 60000 986 245.237793 14.58153152 14 2.075205564 89076.16406 156076.4688 60000 987 243.1249542 26.38834381 27 2.441208601 92369.11719 157702.1875 60000 988 203.7792206 57.54346848 27 2.298166752 77593.03906 159881.9219 60000 989 365.1177368 50.62573242 14 0.896715403 81125.83594 153057.1094 60000 990 359.1701965 23.9046154 14 2.455418587 81546.3125 153855.6094 60000 991 264.5511475 31.33803558 20 2.40879178 80900.67969 158015.125 60000 992 253.9598999 28.00313568 14 1.878269911 62697.42188 150273.5313 60000 993 204.5622711 31.42062569 27 0.689033806 94147.30469 151976.8906 60000 994 342.8905945 51.62947845 27 1.589504957 112882.1641 151935.8281 60000 995 224.5756683 12.0755558 20 1.844479203 76764.73438 151721.2188 60000 996 388.1571655 44.8118248 14 1.494532943 80726.54688 150959.4219 60000 997 237.7990112 17.17729759 20 2.280871868 81695.72656 159321.3906 60000 998 284.8969727 43.22843552 14 1.551819921 80202.88281 151777.4375 60000 999 278.9086914 26.25245285 14 1.862671733 108032.125 158266.1094 60000 1000 235.7346954 17.11428452 20 2.632974386 101334.1484 156883.8594 60000 These are some of the 1000 input results produced when you set the number of iterations at 1000.

@RISK Sensitivity Report Sensitivity Ranking Step-Wise Regression Rank Name Cell Function Regression Correlation Break Even at $F$2, for Simulation 1 1 Sell Price $B$4 RiskUniform(200,400) -0.898479451-0.955307383 2 Labor $B$5 RiskLognorm(30,15) 0.285607439 0.21780791 3 R&D $B$8 RiskTriang(60000,80000,120000) 0.153874152 0.148823345 4 Material $B$6 RiskDiscrete({14,20,27},{0.5,0.3,0.2}) 0.092849691 0.091961814 5 Facilities $B$9 RiskUniform(150000,160000) 0.02459837 0.050544831 6 QC $B$7 RiskNormal(2,0.7) 0 0.043464691 7 Equipment $B$10 RiskDiscrete({60000},{1}) 0 0 Sensitivity Ranking Correlation Coefficient Rank Name Cell Function Regression Correlation Break Even at $F$2, for Simulation 1 1 Sell Price $B$4 RiskUniform(200,400) -0.898479451-0.955307383 2 Labor $B$5 RiskLognorm(30,15) 0.285607439 0.21780791 3 R&D $B$8 RiskTriang(60000,80000,120000) 0.153874152 0.148823345 4 Material $B$6 RiskDiscrete({14,20,27},{0.5,0.3,0.2}) 0.092849691 0.091961814 5 Facilities $B$9 RiskUniform(150000,160000) 0.02459837 0.050544831 6 QC $B$7 RiskNormal(2,0.7) 0 0.043464691 7 Equipment $B$10 RiskDiscrete({60000},{1}) 0 0

@RISK Input Details Report Input Statistics Inputs Sell Price Labor Material QC R&D Facilities Equipment Simulation# 1 1 1 1 1 1 1 Statistics / Cell $B$4 $B$5 $B$6 $B$7 $B$8 $B$9 $B$10 Minimum 200.1490631 7.111316681 14-0.436765283 60390.35938 150009.7344 60000 Maximum 399.8982849 100.8699112 27 4.091255665 119025.3203 159997.5313 60000 Mean 299.240982 29.96570297 18.373 1.988328282 86248.97443 154962.5493 60000 Standard Deviation 58.2709679 15.09726057 5.006689319 0.70235574 12601.93164 2896.524848 0 Variance 3395.5057 227.9272769 25.06693794 0.493303585 158808681.1 8389856.197 0 Skewness 0.019076552 1.421558116 0.684379514-0.143487863 0.364727731 0.001311222 Error Kurtosis 1.788072177 5.592283143 2.031556927 2.894682161 2.444202702 1.78193881 Error NumErrs 0 0 0 0 0 0 0 Mode 313.5385376 19.57271786 14 2.149728775 82413.59375 153850.3188 60000 5% 209.2395631 12.79477599 14 0.802909078 66929.07041 150470.9531 60000 10% 220.1912079 14.9077397 14 1.071162704 70366.57845 151000.1094 60000 15% 227.8271647 16.57086198 14 1.225863937 73469.30513 151447.9844 60000 20% 235.991808 17.71208584 14 1.359130505 75642.64067 151895.2345 60000 25% 248.0038605 19.13677216 14 1.51730299 77016.58594 152460.9688 60000 30% 257.9617616 20.23818787 14 1.626738469 78525.76571 152969.9845 60000 35% 268.8215637 21.77901993 14 1.735821363 79865.10929 153441.6093 60000 40% 278.5636297 23.2780819 14 1.844355703 81284.34395 153868.5469 60000 45% 288.858393 24.83904061 14 1.942882652 82848.15551 154465.0937 60000 50% 299.9686279 26.16023827 14 2.041386843 84383.15625 154958.5625 60000 55% 308.8929754 27.7752121 20 2.132246262 86240.73495 155421.297 60000 60% 318.7754926 29.63485391 20 2.204748666 88293.16483 156002.1875 60000 65% 328.9841883 31.66231117 20 2.289775278 90332.62322 156493.4373 60000 70% 338.8416366 33.91704161 20 2.353842491 92118.24945 156957.2812 60000 75% 348.745697 37.1859436 20 2.442231894 94909.28906 157538.7344 60000 80% 360.6994636 40.27426556 20 2.583830369 98245.35988 157950.6252 60000 85% 369.3285023 44.40036862 27 2.683954484 101205.0234 158505.8594 60000 90% 379.4453907 50.62573155 27 2.863582169 105055.3658 158925.9531 60000 95% 390.693969 59.64334467 27 3.101464228 109491.0632 159389.2499 60000

@RISK Output Details Report Output Statistics Outputs Break Even Simulation# 1 Statistics / Cell $F$2 Minimum 784.5050659 Maximum 3182.568848 Mean 1292.473697 Standard Deviation 355.7138697 Variance 126532.3571 Skewness 0.940633456 Kurtosis 3.739647413 NumErrs 0 Mode 1001.550928 5% 868.1774907 10% 905.2075199 15% 937.376771 20% 974.7440197 25% 1010.166748 30% 1042.318608 35% 1078.872558 40% 1109.405762 45% 1154.329809 50% 1198.528564 55% 1268.274912 60% 1323.123228 65% 1387.289182 70% 1453.119093 75% 1514.461792 80% 1590.099383 85% 1692.199049 90% 1800.018619 95% 1960.624377

0.040 0.035 0.030 0.025 0.020 0.015 0.010 0.005 Distribution for Labor/B5 Mean=29.9657 0.000 0 22 44 66 88 110 5% 90% 5% 12.79 59.64 Values in 10^ -3 1.400 1.200 1.000 0.800 0.600 0.400 0.200 Distribution for Break Even/F2 Mean=1292.474 0.000.5 1.5 2.5 3.5 Values in Thousands 5% 90% 5%.87 1.96 One of the Input Graphs The Output Graph

Regression Sensitivity for Break Even/F2 -.898 Sell Price/B4 Labor/B5.286 R&D/B8.154 Material/B6.093 Facilities/B9.025-1 -.75 -.5 -.25 0.25.5.7 Std b Coefficients Tornado Graph Showing Correlation Tornado Graph Showing Regression Step 13.3 showed you how to produce a Tornado Graph with the @RISK Reports icon. The next page shows you how to produce these Tornado Graphs with the Tornado Graph icon.

Using the Tornado Graph Icon 1 Click/Select the Tornado Graph icon to open the Sensitivity Analysis. 2 In the Display Significant Inputs Using area, Click the drop down arrow and Select Correlation. 3 Right Click on F2-Break Even and Select Tornado Graph to open the Correlation Sensitivity Graph. 4 Repeat Steps 2 and 3 to produce the Regression Tornado Graph. Continue to Step 14 for an explanation of some of @RISK s terminology, reports, and graphs.

Step 14: Understanding @RISK s Terminology, Reports, and Graphs In Step 13 you generated the following reports: Sensitivities Detailed Statistics (Input Details, Output Details) Output Graphs Input Graphs Tornado Graphs Simulation Summary Output Data Input Data This section includes the following to help you understand them: Definitions Links to videos that discuss @RISK reports and graphs and the logic behind them. Print the reports and graphs and review them as you watch the videos. Playing S3C s Videos To play S 3 C s videos, you will need RealPlayer 8 Basic on your computer. To download this plug-in, Click/Select RealPlayer 8 Basic it is the free player. When the download is complete, Click/Select your browser s Back button to return to S 3 C. Viewing Videos Your options are Original Size, Double Size, or Full Screen. As screen size increases, distortion may occur. To increase the size of your video screen, go to RealPlayer s View menu and scroll to Zoom. Select Double Size or Full Screen. To return to the Original Size from Double Size, go to the View menu and Select Original Size. To return to the Original Size from Full Screen, right click on the screen and Select Original Size. Listening to Videos Make sure your speakers are plugged in and the volume is turned up.

Definitions: Review these definitions and then watch the video at the end of this section. Uncertainty Complexity Internal Numbers External Numbers Uncertainty is synonymous with random variable. We face uncertainty when we cannot determine the exact outcome of an event. There is no agreed upon definition for complexity theory. It is generally accepted that as the number of factors interacting grows, the complexity of the problem increases. Simulation is one of the basic tools used to analyze complex systems. Internal numbers are numbers (factors) that we can control or influence. External numbers are numbers (factors) that we cannot control or influence. Most numbers fall on a continuum, with external numbers being far more prevalent. Internal External Closed System Open System A closed system is a system in which we can control all factors (numbers) that influence the system s performance. An open system has factors (numbers) we cannot control that will influence the system s performance. Most systems fall on a continuum, with open systems being far more prevalent. Closed Open Impact Causing Change Numbers (ICCNs) ICCNs are numbers that, when they change, cause the number that we are attempting to predict to change. For this example, Number being predicted: Break Even Quantity Possible ICCNs: Price, Labor, Material, QC, R&D, Facilities, and Equipment Costs The ICCN s level of impact falls on a continuum. No Impact Minor Impact Medium Impact Major Impact This video expands on the definitions presented above.

@RISK Input Details Report The @RISK Input/Output Details Report and Graph provide you with Three pieces of information you always want to know about a random variable Mean (µ) Standard Deviation (σ) Distribution This video presents several definitions, along with notes on input details for QC costs and R&D costs. The @RISK Input Details Report on the next page gives the descriptive statistics for each input random variable. In this case, the sample size was 1,000, as we had the computer calculate 1,000 different break even values (iterations). Hence, the computer generated 1000 prices, 1000 labor costs, and so on. For example, look at Quality Control (QC). We assumed the mean was $2 and the standard deviation was $.70. The average of the simulated QC costs is 1.988328282; the standard deviation is.70235574. Both are very close to what we assumed. QC =RiskNormal(2,0.7) Details from your spreadsheet show QC s mean is 2 and its standard deviation is.70. Inputs QC Simulation# 1 Statistics / Cell $B$7 Minimum -0.436765283 Maximum 4.091255665 Mean 1.988328282 Standard Deviation 0.70235574 Variance 0.493303585 Skewness -0.143487863 Kurtosis 2.894682161 NumErrs 0 Note: The spreadsheet details are very close to the simulated details. Details from the simulation show QC s mean is 1.988328282 and its standard deviation is.70235574.

@RISK Input Graphs For each random variable input, the @RISK Input Graph draws a histogram showing the distribution of 1,000 iterated values. Using R&D costs as an example, note how close the simulated distribution for R&D is to what you instructed the computer to do in your spreadsheet. Values in 10^ -5 3.500 3.000 2.500 2.000 1.500 1.000 0.500 Distribution for R&D/B8 Mean=86248.98 0.000 60 80 100 120 Values in Thousands 5% 90% 5% 66.93 109.49 The simulated distribution is based on how you set up your spreadsheet: Histogram Showing the Simulated Distribution for R&D Triangular Distribution for R&D =RiskTriang(60000,80000,120000)

@RISK Output Details Report The @RISK Output Details Report indicates the results of the 1,000 simulated break even values. The % column gives you the simulated cumulative probability distribution. Using this, we could say Based on our assumptions, the model indicates there is a 90% probability our break even value will be 1800 or less. This result comes from the following: Number of simulated Q values at 1800 or less P( Q 1800) = = Total Number of Q values 900 1000 @RISK Output Details Report Output Statistics Outputs Break Even Simulation# 1 Statistics / Cell $F$2 Minimum 784.5050659 Maximum 3182.568848 Mean 1292.473697 Standard Deviation 355.7138697 Variance 126532.3571 Skewness 0.940633456 Kurtosis 3.739647413 NumErrs 0 Mode 1001.550928 5% 868.1774907 10% 905.2075199 15% 937.376771 20% 974.7440197 25% 1010.166748 30% 1042.318608 35% 1078.872558 40% 1109.405762 45% 1154.329809 50% 1198.528564 55% 1268.274912 60% 1323.123228 65% 1387.289182 70% 1453.119093 75% 1514.461792 80% 1590.099383 85% 1692.199049 90% 1800.018619 95% 1960.624377 Remember, simulated values are pretend experiences.

@RISK Output Graph The @RISK Output Graph for the break even quantities below is skewed to the right. It can be very difficult to determine the shape of this output distribution because the input random variables and their shapes (price and costs) control the output shape. Values in 10^ -3 1.400 1.200 1.000 0.800 0.600 0.400 0.200 Distribution for Break Even/F2 Mean=1292.474 0.000.5 1.5 2.5 3.5 Values in Thousands 5% 90% 5%.87 1.96 The 90% bar is similar to a 90% confidence interval. We could say We are 90% confident the break even value for widgets will be between 870 (.87 x 1000) and 1960 (1.96 x 1000). We prefer to say 90% of the simulated break even points fall between 870 and 1960; 5% are below 870; and 5% are above 1960. The key difference is that these results are a summary of results based on pretend experiences (simulated or virtual experiences). Never make a statement that you are confident the future will behave a certain way based on simulated results.

Tornado Graph and Sensitivity Analysis The tornado graph and sensitivity analysis attempts to quantify the level of impact an Impact Causing Change Number can have on the number being predicted. The correlation coefficient = r is one measure used to quantify the impact. If you need to review r, Click/Select Correlation Coefficient = r. In modeling or analyzing a system, we attempt to identify the important factors those factors that can have a major impact on what we are attempting to determine. For this example, what numbers (price and costs) have the greatest impact on the break even quantity? The numbers (factors) are often called drivers, as the direction they go will drive the number we are predicting. If you are able to identify the important factors, then ask Are these internal or external numbers (factors)? We hope they are internal, because then we have a chance to influence them. If they are external, the problem has greater uncertainty, since the impact number is not under our control. Review this video to help you understand the tornado graph below. Tornado Graph The tornado graph above indicates that price has by far the greatest impact on the break even quantity. This adds to the uncertainty as we have classified price as an external number the competition and customers can influence and/or control this number far more than we can. Ideally the Impact Causing Change Numbers would be internal numbers because then we would be able to control and/or influence them. As a decision maker, you may have limited resources. You will want to direct these resources to the factors that have the greatest IMPACT on your desired results. The simulation model helps identify key factors. For this problem, price is the key factor. Because we have little control over it, it may cause us to rethink how we will make our decision on what the break even value will be.

Excerpt from The Decision Making Book Chapter 12: Regression Analysis In Chapter 5 we developed the idea of Best Fitting Least Squares Line; in this chapter we expand on this idea. First: We look at quantifying the effectiveness of the Best Fitting Least Squares Model. Second: We add probability to our model and make Confidence Interval estimates rather than point estimates. Third: We look at models that use two or more pieces of information to make predictions.this is called Multiple Regression. Fourth: We study fitting curved lines to our experiences. This is called Polynomial Regression. Fifth: We study the issue of identifying variables that are correlated. Reflection Make a quick scan of Chapter 5 and read your reflections on this chapter.

12.1 Quantifying Effectiveness In Chapter 5 we saw there were formulas that guaranteed a straight line where e = 0 and e 2 is a minimum. No other straight line would give a smaller e 2. We did not address the issue of: What is a good e 2? Question: How effective is the equation (straight line) in explaining the relationship that exists between X and Y. If Σe 2 = 0, the line does a perfect job, as all the data points would fall on a straight line. As the points become more scattered away from a straight line pattern, Σe 2 will become larger. But we can't say that when Σe 2 is large, we have a bad fit. We saw in Chapter 4, when studying standard deviation, we must consider the magnitude of the numbers. The following example should explain why we can't use Σe 2 by itself. Assume we're estimating two Y values and are interested in the Σe 2 values. Actual Y Estimated Y e = (Y - Y ˆ ) e 2 = (Y - Y ˆ ) 2 Point 1 8 2 6 36 Point 2 1,000,100 1,000,000 100 10,000 Notice what is happening: the estimate for point 2 seems to be doing a "better job" of estimating, but because of the large values of Y and ˆ Y it's very easy to get large e 2 values. For this reason, the magnitude of e 2 isn't a good indicator of how well the line fits the data. The correlation coefficient and coefficient of determination which we discuss in the next section prove to be better indicators of how well the line fits the data. Reflection When you looked at the e 2 for points 1 and 2 did you think about the standard deviation being meaningless without the mean?

12.2 Correlation Coefficient Question: Can we fit a straight line to any set of data? For example, can we fit a straight line to the following pattern? Would this line help us make good predictions? Does it explain the relationship that exists between the two variables? NO! NO! NO! NO! NO! NO! NO! NO! NO! But, recognize we can fit a straight line to any set of data, whether the pattern is linear or nonlinear. Unfortunately we all can do "stupid things". The question should have been stated: Should we fit a straight line? The formulas tell us nothing about how well the straight line captures reality, summarizes our experiences. All the least squares formulas do, is ensure that e 2 is a minimum and e = 0. Think about this. At this point we have no way of "measuring" in a quantitative way whether the least squares technique is effective in explaining the relationship between X and Y. By looking at the graph, we can get a "visual measure" for what is happening. This is spatial thinking. To do symbolic thinking, we need a numeric statistic that gives a quantitative measure for the quality of the fit that exists between the straight line and the data. One measurement that does this is called the correlation coefficient. It is represented by the small letter r. r = Correlation Coefficient The graphs below show how the r value are used to explain how close the points are to the line. Y Y X If a Perfect Positive Linear If a Perfect Negative Linear Relationship exists. r = 1 Relationship exists. r = -1 X

As the points become more spread out from the line, the r value gets closer to zero. The r value is negative when the slope is negative and positive when the slope is positive. Low Negative Correlation r = -.3 High Negative Correlation r = -.8 No Linear Relationship r = 0 High Positive Correlation r =.9 There are several formulas for the correlation coefficient (remember concept vs. calculation). They all give the same answer. One of them is listed below: r = xy - x y n ( x 2 - ( x)2 n )( y2 - ( y)2 n ) In looking at the formula, you are probably saying, "you've got to be kidding". It is hard to believe that it can be used to measure how well the linear model fits the data. It is!

Numerically Verifying the Formula Instead of showing you the mathematics that gives us the formula, we'll do as we did with other formulas, we will verify that the above formula works. The points (1,4), (2,7), (3,10) fall on a straight line; if you don t believe this, plot the points on a graph. If the above formula for r works, we should get r = 1 for this data. x y xy x 2 y 2 1 4 4 1 16 2 7 14 4 49 3 10 30 9 100 Σx=6 Σy=21 Σxy=48 Σx 2 =14 Σy 2 =165 n = 3 (we have 3 points (x,y)) r = 48 (6)(21) 3 14 6 2 212 3 165 3 = 48 42 ()18 2 ( ) = 6 36 = 6 6 = 1 Data that does not fall on a straight line will always give an r value between -1 and +1. In calculating r the result will always be: 1 < r < 1 Revisiting Tree Diameter vs. Cost of Removal The following is the correlation coefficient for the tree diameter, cost problem in Chapter 5. If one had to determine the correlation coefficient r by hand calculation, it is best to use a table to keep track of all the necessary sums,.σxy,σx,σy,σx i 2, and Σy i 2. x i y i x i y i x i 2 1 4 4 1 16 2 3 6 4 9 3 7 21 9 49 4 8 32 16 64 5 8 40 25 64 Σx=15 Σy=30 Σxy=103 Σx 2 =55 Σy 2 =202 n = 5 (we have 5 points (x,y)) y i 2 r = 103 (15)(30) 5 55 ( 15 )2 30 5 202 5 ( )2 = 13 ( 10) 22 r.877 ( ) = 13 220 =.877

If we had a perfect positive relation, we would get an r-value of positive one. It is difficult to say what a "good" or acceptable r-value is. This is unfortunate because one would like to use the r-value as a means of determining whether or not the model is actually capturing reality (summarizing our experiences) and should be used for making predictions. The formula for r doesn't give one an intuitive feel for what the r-value is measuring about the best fitting least squares line. It is often best to look at both the graph and the r value to get a sense of the fit. The correlation coefficient measures the strength of the linear relation that may exist between the two variables. Just because the r value is close to zero, it doesn't mean there is no relationship between the two variables: It only means that a linear relation does not exist between the two variables. Reread this paragraph, as this is a frequently misunderstood concept. The correlation coefficient for the following data would be very close to zero because the relationship is not linear. However, the picture indicates that the two variables are certainly related in a cyclic manner and we could use a nonlinear (not a straight line) model to explain the relationship. Y X Reflection What does the correlation coefficient tell us? Find that last remaining friend and tell them about it.

12.3 Coefficient of Determination In Charter 5 we discussed building a model that summarizes our experiences so we can use information to make better predictions. Assume you're a counselor and a student asks what level of proficiency she can expect to achieve if she takes a specific math course. How would you answer her? You'd probably say you wanted to look at her record, you're gathering data information. You would mentally compare her present skills to other students you've known, who have taken the math course and try to make an educated guess based on your past experiences. By summarizing your experiences, you hope to have created Knowledge. The mathematics studied in Chapter 5 is doing exactly the same thing, summarizing experiences to create a model that helps one use information to make predictions. The mathematical model moves us from having Information to having Knowledge, knowing how to use the information. Our mind is constantly doing this either formally or informally. Do you keep track of how accurate your answers and predictions are and evaluate if you have used your information wisely? This is what we're doing with the correlation coefficient and coefficient of determination, determining if we're summarizing the information wisely. Assume you have no information on her or other students and you ask her to return after you've gathered some facts. Chances are you won't look at all the students records but select a sample of the records. Assume you take a stratified sample and check the records of five "typical" students who have recently completed the course. You stratified by pretest scores, because you wanted to have data on a range of math backgrounds. Their pre and final scores are given below. The pre-scores are on a math placement test that has a maximum of 20 points, while the final score is the points earned out of 100 on the final of the course. Pre Scores = X i Final Score = Y i 11 40 12 30 13 70 14 80 15 80

Predictions With No Information To evaluate the effectiveness of a model that uses information one compares the effectiveness of predictions made with no information, to predictions made with Information and Knowledge. Approach I I have no information on this student but will still estimate her final test score. Approach II I have information on this student s prescore and I know how to use prescores to estimate her final test score. Note how much more confident the person with knowledge looks as they make decisions. Deming and others have written about the need to have a method for creating knowledge from our experiences. In the process of creating this knowledge we must assess how much we know. The person with the smile knows she knows and has more confidence. Quantifying Effectiveness To quantify effectiveness we compare how effective Approach I is to II in estimating the Final Score. If there is little or no difference in the effectiveness of the two methods of estimating it would be silly to spend the time and money to gather information and to create knowledge. If this is the case the Approach II person shouldn t be smiling and is in that dangerous position of being Unconsciously Incompetent. The Coefficient of Determination, which we now study, a measures how much more we will know by using the model and information. In Chapter 5 we measured effectiveness of the model by measuring the error in estimating: Ideally, the Error is small or zero. Error = Actual Final Test Score - Estimated Final Test Score For this problem we assume you have information on the old students, but have no information about the student who is asking you to predict her success. She did not take the pretest. How would you predict her success, having no information about her? Using the above information, what score would you predict for the student? It would probably be some score between 30 and 80. If you have a very pessimistic outlook on student abilities you might estimate 30 or being an optimist an 80. However, to most people it would seem more pragmatic to select a score that is in the "middle," rather than at an extreme. We're assuming that you're trying to logically use your experiences. Unfortunately, in many problems this is not the case. Personal biases may blind us from the truth and we disregard or distort our experiences. We ll assume you are not influenced by the student's appearance and demeanor, etc. We recognize final scores are random variables, using our knowledge of random variables

we use the average final score as an educated guess on the student's final score. Y is 60, hence we estimate the student's final score would be 60. In fact, we would use the value 60 as an estimate for any student who wanted to know their potential final score, if we had no pre-score on them. We realize some students are going to be above and some below, but on the average our overestimates and underestimates will balance out. Our estimate would be unbiased, since, (Y - Y ) is always equal to 0. The average is an unbiased estimator, it does not in total underestimate or overestimate. How close the average is to individual scores is estimated by the standard deviation. The graph below shows how our model would look with respect to the information we have on students who have completed the course. Y 80 70 60 50 40 30 20 10 ^ _ Y = Y = 60 If we have no information about the student, the average is used for our prediction model. It is just a straight line parallel to the x axis. The Y estimate is just the average of Y, the estimate is a constant, it uses no information about the current student. 11 12 13 14 15 16 17 18 X We see that there are errors in estimating when the mean is used as an estimate. The errors are e i = Y i Y _. As we did in Chapter 4, we find the sum of the squared error terms. Y i 40 30 70 80 80 Σy=300 ^ Y = Y = 60 60 60 60 60 60 Σy=300 e = Y - Y i 40-60 = -20 30-60 = -30 70-60 = 10 80-60 = 20 80-60 = 20 Σ (Y i - Y ) = 0 2 = ( Y - Y ) 2 e i (-20) 2 = 400 900 100 400 400 Σ ( Y i - Y ) 2 = 2200 We always estimate 60, we compare how close we are to each actual score. For the first student our estimate was too high by 20 points. The value (Y i Y _ ) 2 is called the Total Variation from the Mean or just the Total Variation. This is the same value we determined in Chapter 4 that led to the Standard Deviation formula. It is a measure of the amount of error we would have in predicting final scores, if we have no information about the student we are making a prediction about. The Total Variation is an observed result and we have no control over its size. If it is large, we know the individual values are probably fairly spread out from the mean and if it's small the values are fairly close to the mean. Remember, we must consider the magnitude of the numbers involved.

Effectiveness of Models That Use Information If we assume that the student takes the time for the pretest, how would this additional information affect our estimate? Looking at the graph that follows, we see that our evidence (sample results) indicates that a student whose pre-score is below average will probably have a final score below average and vice versa. With a pre-score of 11, it wouldn't make sense to predict a final score of 60, since our experience indicates that it will be somewhere below 60. Our final score estimate for a pre-score of 11 will be dictated by where we place the line. We could think that we have a ruler with a thumbtack at the point (X_, Y _ ), how we rotate the ruler (line) is dictated by the least squares formulas and the experiences. The formulas listen to what each point is telling them and finds a balance line, a line that makes e i = 0 and e i 2 a minimum. Each point is an experience, an observed fact; the BFLSL formulas listen to all the facts in a logical manner. 80 70 60 50 40 30 20 10 X = 13 These facts are telling us to bring the line down These facts are telling us to bring the line up. Y = 60 11 12 13 14 15 16 If we use the formulas, we get the Best Fitting Least Square Line, which is: ˆ Y = 13X - 109 Our new line for making estimates would look as follows: 80 70 60 50 40 30 20 10 ( X, Y ) = (13, 60) 11 12 13 14 15 16 ^ Y= 13X - 109 The point (X_, Y _ ) will always fall on the BFLSL. We now determine how much the amount of error in estimating decreases, when the least squares line is used instead of Y _ as the estimate. X Y

Error in Estimating Using Information This is the error discussed in Chapter 5, the Σe i 2. Study the graph and the numerical values and see how they are related. Y 80 70 60 50 40 30 20 10 e 1 e 3 e 2 X = 3 e 4 11 12 13 14 15 16 e 5 Y =60 X Y ^ = 13X - 109 = unexplained variation or error in estimating X Y Y ^ = 13X - 109 e = Y -Y^ ( Y - Y ^ ) 2 11 40 34 40-34 = 6 12 30 47-17 (6) 2 = 36 13 70 60 10 289 14 80 73 7 100 15 80 86-6 49 36 ΣY = 300 ΣY^ i = 300 Σ ( Y - Y) ^ = 0 Σ ( Y - ^ Y) 2 = 510 These sums are always equal This sum will always be zero. Error in Estimating Using Information In our vocabulary, (Y ˆ Y ) 2 would be a measure of the amount of error we have in estimating if we have Information, the students' pre-score and we have Knowledge of how to use this information. Experien ce In form ation an d Education K n ow ledge In this case Y ˆ = 13X + 109 is our Knowledge. The Best Fitting Least Squares Line summarizes our experiences and we used our algebra and statistics education to do this summarizing. We now know how to use our information: X = student's pre-score. Using Regression Vocabulary, (Y Y ˆ ) 2 is called the Unexplained Variation. Even with the additional information we have on the pre-score, some error still exists in our predictions. Some of the variability in students final scores is still unexplained (not predicted perfectly) by the model. (Y - Y ˆ ) 2 is equal to e 2 i and we know this sum is a minimum, when we use the least squares formulas to determine the equation of the line. Thus the amount of unexplained variation is a minimum, the formulas squeeze out as much information as possible from the data. We Are Being Effective Data Miners

Comparing Effectiveness Of Information and Knowledge vs. No Information In using no information our measure for the amount of error in estimating was: (Y - Y ) 2 = 2200. while the level of error when information and knowledge were used is: (Y - ˆ Y ) 2 = 510. The error in estimating has been reduced by 1690 units, 2200-510 = 1690 What was the % change in Error? Error with No Info - Error with Info % Change in Error = Error with No Info x 100% % Change in Error = 2200-510 2200 x 100% = 76.9% The level of error in estimating has been reduced by approximately 77%. The model is not perfect as there is still 23% of the "original error" that has not been explained. Actual, Explained and Unexplained Variation The vocabulary of regression is different from what we just used to quantify the effectiveness of the model that uses information. The discussion just given was to develop an intuitive feeling for what we now discuss. The numbers will be the same but the vocabulary will be different. The Actual Variation is how far the final scores varied from the mean: Σ Y Y The Actual Variation is also called the Total Variation. ( ) 2 In recognizing a relationship between pre and final scores, we decided to use the Best Fitting Least Squares Line instead of the mean line to estimate. By using the line, Y ˆ, to make estimates on post-scores, we reduced the errors in estimating. We say that some of the variation in the final scores is explained by the model. The model explains how pre and final scores are related. The model is: ˆ Y = 13x - 109 The Explained Variation is the prediction of how far the scores would vary from the mean. We use the person's pre-score and Y ˆ = 13x - 109 to make this prediction.

Profound Statement Actual Variation is either Explained by the Model or it is Not Explained. Most students indicate that this is not the most profound statement they have ever read. Study the following graph to see how the Actual, Explained and Unexplained Variations are related. Try to write a Symbolic Profound Statement for the Verbal Profound Statement. This can be a difficult concept for students to grasp, so don't be surprised if you're reaching that "frustration threshold." 80 70 X = 3 5 6 26 20 ^ Y = 13X - 109 60 50 40 30 20 10-13 -30-17 2 11 12 14 15 16 Y = 60 Actual Variation From Mean Explained Variation From Mean Unexplained Variation X Symbolic: Y- Y = ( ˆ Y - Y ) + (Y - ˆ Y ) Verbal: Actual = Explained + Unexplained For point 2 the Actual Variation from the Mean = -30 Explained Variation = -13 (model under explained) Unexplained Variation = -17-30 = -13 + -17 Wow, the profound statement is true! For point 5 the Actual Variation from the Mean = 20 Explained Variation = 26 (model over explained) Unexplained Variation = -6 20 = 26 + (-6) (WOW) 2 Note, (Wow) 2 = Wow Wow

Total Actual, Total Explained and Total Unexplained: Are They Related? If you were asked to explain how much something would vary, you would either explain the variability perfectly or you would make an error and underestimate or overestimate. Rather amazingly what is true for the individual points is true when we look at the squared totals. Total Actual Variation is either Explained or Unexplained Total Actual Variation = Total Explained Variation + Total Unexplained Variation _ (Y i Y) 2 = ( Y ˆ i Y _ ) 2 + (Y i Y ˆ ) 2 ˆ In looking at the previous graph, the explained variation is equal to Y i Y and the Total Explained Variation is equal to ( Y ˆ Y ) 2. Study how the numerical values and the graph are related. ^ Y i 34 47 60 73 86 ^ Y- Y 34-60 = -26-13 0 13 26 Σ ( ^ Y - Y ) = 0 ( ^ Y- Y) 2 676 169 0 169 676 2 Σ ( ^Y - Y ) = 1690 The value ( Y ˆ i Y ) 2 is called the Explained Variation. You should recognize the 1690 value as this was the amount our error in estimating decreased by, when information and knowledge were used to estimate final score. (see page 12) Look at point 2 on the previous graph, the actual Y deviates from Y by -30 units. The Y ˆ line predicts it will deviate by -13 units. The line Y ˆ has explained part of the variation but has left part unexplained; in this case the Y ˆ line "under estimated" the actual variation by -17 units. Where as, at point 5, the actual Y deviated from the Y by 20 units but the Y ˆ line predicted it would deviate by 26 units. In this case the line made an error in estimating by "over estimating" by 6 units. In estimating, we either overestimate or we underestimate. For point 2, we underestimated and for point 5, we overestimated. Remember that the least squares line gives unbiased estimates, the total amount we overestimate will equal the total amount we underestimate, e = 0. For this example: 2200 = 1690 + 510 This result will hold for all Best Fitting Least Squares Lines. Total Actual = Total Explained + Total Unexplained.

Coefficient of Determination Ideally, when we fit a straight line to the data, the Unexplained Variation would equal 0 (all the points fall on a straight line) and hence the Total Variation is completely explained by the mathematical relationship that exists between the two variables. In most practical problems the Unexplained Variation (error) is not zero. The following statistic, r 2, is used to quantify the percent of variability that has been explained. Coefficient of Determination = r 2 = Σ( Y ˆ - Y ) 2 Σ(Y- Y ) 2 100% For our problem r 2 = 16.9 100% = 76.9% 22 This number should also look familiar, it is the same number we determined using an intuitive argument for measuring the effectiveness of using information and knowledge versus no information. In interpreting r 2 you would say that 76.9% of the final score variation from the average final score can be explained by ˆ having information on the student's pre-score and using the knowledge Y = 13X -109 to predict the final score. Or one could say the model is not perfect, as 23.1% of the variation in final scores can not be explained using the students' pre-score and the model. It is completely WRONG to say there is a 76.9% chance the prediction is correct. r 2 Is Not A Probability. The Coefficient of Determination can be determined for any model. If you were fitting curved lines to the data rather than straight lines, you are doing Polynomial Regression, r 2 can be determined for curved lines. There are formulas for determining the best fitting curvilinear line where ei = 0 and e 2 is a minimum. We know the unexplained variation for these models is a minimum and hence the explained variation is a maximum and thus r 2 is a maximum. We cannot explain any more of the variation than we have, using the particular model we have selected. The r 2 statistic is a method of measuring the effectiveness of any prediction model we select to use. In section 12.9 we discuss Polynomial Regression. It is important to recognize that the: Correlation Coefficient, r, can be Used Only for the Simple Linear Regression. If we are fitting a straight line, Y ˆ = bx + a, we can use r. The Coefficient of Determination takes values from 0 (none of the variability in Y was explained by the relation with X) to 100% (all the variability was explained). r 2 is a measure of how effective the model is in explaining (predicting) the variability in Y.

Sensitivity Analysis An interesting use of the r and r 2 is to use them in determining which Input factors have the biggest impact on a final answer, an Output. In Chapter 11, we looked at doing simulations to help us make predictions. In the breakeven and profit problems in section 11.5, there were many factors that influenced the final answer. R & D cost, legal cost, price, demand, labor, etc. all contributed to the final answer of what profit would be made. As a decision maker it would be helpful to know which factors had the biggest impact an profitability. Is labor cost more important than R & D cost to profitability? To answer this question we can look at how these variables are correlated. Both @Risk and Crystal Ball will do this and display them in what is called a Tornado Graph. Its called this because the shape of the graph is often a tornado. Factors Affecting Breakeven The Chart below shows, which factors have the greatest impact on the breakeven value. By displaying the highest correlation factors on top, next highest etc., we get a tornado shape.

Legal cost and breakeven have the highest positive correlation. What this tells us is when the legal costs go up, the breakeven value also goes up. We see that purchasing has a much smaller correlation with breakeven. If you were a manager of this project and had time to work on either controlling legal costs or controlling purchasing cost, which would you select? Hopefully, you said legal costs, as it appears this will give you the biggest return. We must be very careful in this analysis as we are assuming that there is a cause and affect relationship. Factors Affecting Profit The table shows the correlation between different factors and profit. Labor Does it make sense that R & D equipment cost and profit have a negative relationship; and demand and profit have a positive relationships. Yes, it does. When R & D cost go up, profit will go down, a negative relationship. While, when demand goes up, profit will go up, a positive relationship. This type of analysis can be very powerful, as one can make new assumptions on the key factors (highly correlated factors). Then by playing what if games, we can generate ideas that may improve the final decision. For example, since R & D Equipment appears to be a key factor, we could go back to R & D and ask can you reduce this expenditure by 30%? If you can, what happens to completion time etc. We then run another simulation with a different set of assumptions. Reflection What does the coefficient of determination tell us?

12.4 True Correlation Coefficient In Chapter 4 we saw that a: Statistic is any Result that is Based on a Sample. r =.877 is a Statistic ALL STATISTICS ARE RANDOM VARIABLES Hence, if another sample is taken to study how pre-scores are related to final scores a different r value would occur. r is a random variable. Parameter is Any Result that is Based on a Census. If we had data on every student who had taken a pretest and this final, we could then determine the true correlation coefficient: The Greek letter rho, which is the Greek r, is used. ρ = True Correlation Coefficient ρ is a Parameter ρ is a Constant Statistics Estimate Parameters r Estimates ρ True Coefficient of Determination A similar argument can be made for the Coefficient of Determination. If we have a census, all the data, we can calculate: ρ 2 = True Coefficient of Determination ρ 2 is a Parameter ρ 2 is a Constant r 2 estimates ρ 2 r 2 is a Random Variable

Correlation Coefficient and Coefficient of Determination Even though the formulas and reasoning for determining the coefficient of determination and the correlation coefficient are different, they are related when we are using the simple linear model, Y ˆ = bx + a. Coefficient of Determination = (Correlation Coefficient) 2 r 2 = (r) 2 For our example:.769 = (.877) 2.769 =.769 If one is calculating these values by hand, only one value must be calculated and the other can easily be determined, r = r 2 or r 2 = (r) 2. In readings, one often sees the correlation coefficient given, this tends to overstate the strength of the relationship that exists between X and Y. For example, when r =.6, we could then conclude that (.6) 2 or 36% of the variation in Y can be explained by the relationship with X. This leaves 64% of the variation in Y unexplained. Repeating ourselves, a common mistake with r 2 is to interpret the r 2 as a probability. If r 2 = 36%, we might be tempted to conclude that there is a 36% chance our estimate is correct. THIS INTERPRETATION IS WRONG! The r 2 statistic is not the probability of whether our estimate of Y is correct, It is simply a measure of the amount of variability in Y that we can explain by using Information and Knowledge. r 2 IS NOT A PROBABILITY! I have information on this student s pre-score and I know how to use pre-scores to estimate her final test score. Reflection Do you understand the difference between a statistic and a parameter?

Working with Adobe PDF Files Figure 1A (this page) shows a PDF file with the Navigation Pane closed. Figure 2A (next page) shows a PDF file with the Navigation Pane open. We recommend that you use the tools shown here to navigate these pages. Using your browser s Back button will close the PDF file and you will have to download it again. Navigation Pane Icon Click/Select the icon to open and close the Navigation window. It will reveal a Table of Contents with links. Notes Icon Double Click the icon to open the Notes box. Select the minus sign in the upper left corner to close it. The icons may be color coded. Adjust Scale Use controls to adjust the scale of the image. Navigate Use Adobe s controls to navigate the pages. Figure 1A. Navigation Pane Closed Use Adobe Acrobat Reader version 4.0 or later for best results.

Working with Adobe PDF Files Figure 1A (previous page) shows a PDF file with the Navigation Pane closed. Figure 2A (this page) shows a PDF file with the Navigation Pane open. We recommend that you use the tools shown here to navigate these pages. Using your browser s Back button will close the PDF file and you will have to download it again. Navigation Pane Icon Click/Select the icon to open and close the Navigation window. Select Bookmarks to open the links to Content headings. Figure 2A shows that the Completed Tree link was selected. To close the Navigation window, Click/Select the Navigation Pane icon. Adjust the Navigation Pane s width by hovering your mouse over the divider and moving it right or left. Use Adobe Acrobat Reader version 4.0 or later for best results. Figure 2A. Navigation Pane Open