Rational Decision Making

Size: px
Start display at page:

Download "Rational Decision Making"

Transcription

1 Department of Decision Sciences Rational Decision Making Only study guide for DSC2602 University of South Africa Pretoria

2 c 2010 University of South Africa All rights reserved. Printed and published by the University of South Africa, Muckleneuk, Pretoria. DSC2602/1/2011 Cover: Eastern Transvaal, Lowveld (1928) J. H. Pierneef J. H. Pierneef is one of South Africa s best known artists. Permission for the use of this work was kindly granted by the Schweickerdt family. The tree structure is a recurring theme in various branches of the decision sciences.

3 Preface Everyday life is full of decisions. What should I wear today? What should I eat? Should I buy the red or blue shirt? Should I buy a specific house or buy a piece of land? What is the shortest route from my house to work?... And many more. Some of these decisions can be made without thinking or by guesswork. Some can be solved by reasoning or emotions. Some are a bit more difficult and may need additional information. People have been using mathematical tools to aid decision making for decades. During World War II many techniques were developed to assists the military in decision making. These developments were so successful that after World War II many companies used similar techniques in managerial decision making and planning. The decision making task of modern management is more demanding and more important than ever. Many organisations employ operations research or management science personnel or consultants to apply the principles of scientific management to problems and decision making. In this module we focus on a number of useful models and techniques that can be used in the decision making process. Two important themes run through the study guide: data analysis and decision making techniques. Firstly we look at data analysis. This approach starts with data that are manipulated or processed into information that is valuable to decision making. The processing and manipulation of raw data into meaningful information are the heart of data analysis. Data analysis includes data description, data inference, the search for relationships in data and dealing with uncertainty which in turn includes measuring uncertainty and modelling uncertainty explicitly. In addition to data analysis, other decision making techniques are discussed. These techniques include decision analysis, project scheduling and network models. Chapter 1 illustrates a number of ways to summarise the information in data sets, also known as descriptive statistics. It includes graphical and tabular summaries, as well as summary measures such as means, medians and standard deviations. Uncertainty is a key aspect of most business problems. To deal with uncertainty, we need a basic understanding of probability. Chapter 2 covers basic rules of probability and in Chapter 3 we discuss the important concept of probability distributions in some generality. In Chapter 4 we discuss statistical inference (estimation), where the basic problem is to estimate one or more characteristics of a population. Since it is too expensive to obtain the population information, we instead select a sample from the population and then use the information in the sample to infer the characteristics of the population. In Chapter 5 we look at the topic of regression analysis which is used to study relationships between variables. In Chapter 6 we study another type of decision making called decision analysis where costs and profits are considered to be important. The problem is not whether to accept or reject a statement but to select the best alternative from a list of several possible decisions. Usually no statistical data are available. Decision analysis is the study of how people make decisions, particularly when faced with imperfect information or uncertainty.

4 Chapter 7 deals with project management. Project management consists of planning projects, acquiring resources, scheduling activities and evaluating complete projects. Managers are responsible for project management. They must know how long a specific project will take to finish, what the critical tasks are, and very often, what the probability is of completing the project within a given time span. In Chapter 8 the subject of network models is discussed. Network models consist of nodes and arcs. Many real-world problems have a network structure or can be modelled in network form. These include problems in areas such as production, distribution, project planning, facilities location, resource management and financial planning. The graphical network representation of problems provides a powerful visual and conceptual aid to indicate the relationship between components of a system.

5 v DSC2602/1/2010 Contents 1 Descriptive statistics Introduction Data collection Sampling methods Simple random sampling Stratified random sampling Systematic sampling Presentation of data Types of data The frequency table and histogram The pie chart The cumulative frequency polygon The stem-and-leaf diagram Descriptive measures Measures of locality The mean The median The mode Measures of dispersion The variance of a data set The standard deviation of a data set The quartile deviation The coefficient of variation

6 DSC2602/1/2010 vi 1.6 The box-and-whiskers diagram Summary of descriptive measures Measures of locality Measures of dispersion Exercises Probability concepts Introduction Classical probability Some rules in probability theory Conditional probability Joint probabilities: multiplication law Summary of basic probability concepts Exercises Probability distributions Introduction Random variable Discrete random variables The probability distribution of a discrete random variable Expected value of a discrete probability distribution Variance of a discrete probability distribution Discrete distribution function Discrete random distributions The binomial distribution The Poisson distribution Continuous random variables The probability distribution of a continuous random variable Continuous distribution function Continuous distributions The normal distribution The Standard Normal distribution The exponential distribution The uniform distribution Exercises

7 vii DSC2602/1/ Estimation Introduction Types of estimators Point estimation Estimating the mean Estimating the variance Estimating proportions Interval estimators The standard error Confidence intervals for means Confidence intervals for proportions Exercises Correlation and regression Introduction Correlation analysis The scatter diagram The correlation coefficient Pearson s correlation coefficient Spearman s rank correlation coefficient Simple linear regression The estimated regression line The method of least squares Rules and assumptions underlying regression analysis Residual plot analysis The coefficient of determination The F-test for overall significance Forecasting accuracy Fitting nonlinear relationships The use of spreadsheets in simple linear regression Exercises Decision analysis Introduction Structuring decision problems The basic steps in decision making

8 DSC2602/1/2010 viii Payoff tables Decision trees Decision making without probabilities or under uncertainty The optimistic approach - maximax criterion The conservative approach - maximim criterion Minimax regret approach Decision making with probabilities or under risk The expected value approach Decision trees and the expected value approach Sensitivity analysis Expected value of perfect information Decision analysis with sample information Utility and decision making Obtaining utility values for payoffs Utility curve The expected utility approach Utility functions Exercises Project management Introduction PERT/CPM Project scheduling with certain activity durations single time estimates Define the project Project network diagrams Conventions for constructing network diagrams Drawing network diagrams Network calculations Calculating early and late event times The duration of the project The critical path Total float Other measures of float Using linear programming to find a critical path Formulating an LP model

9 ix DSC2602/1/ Using LINDO or LINGO to solve the LP model Project scheduling with uncertain activity durations multiple time estimates Probability of project completion time Project scheduling with time-cost tradeoffs Formulating an LP model to crash a project Using LINGO to solve the LP model for crashing a project Exercises Network models Introduction The shortest-route problem A shortest-route algorithm The labelling phase Backtracking phase Formulating the shortest-route problem as an LP model Solving shortest-path problems with LINGO The maximum-flow problem The minimum-spanning tree problem Minimum-spanning tree algorithm Minimum-spanning tree algorithm Exercises A Solutions to exercises 231 A.1 Chapter 1: Descriptive statistics A.2 Chapter 2: Probability concepts A.3 Chapter 3: Probability distributions A.4 Chapter 4: Estimation A.5 Chapter 5: Correlation and regression A.6 Chapter 6: Decision analysis A.7 Chapter 7: Project management A.8 Chapter 8: Network models B Statistical tables 281 B.1 The cumulative Poisson distribution B.2 The standard normal distribution B.3 Student s t-distribution

10 DSC2602/1/2010 x B.4 The F-distribution B.5 The cumulative binomial distribution C Bibliography 295

11 CHAPTER 1 Descriptive statistics 1.1 Introduction N O business can exist without the information given by numbers. Managing numbers is an important part of understanding and solving problems. Numbers provide a universal language that can easily be understood and supply a description of some aspects of most problems. The collection of numbers and other facts such as names, addresses, opinions etc. provides data. The data only becomes information when it informs the user. Statistics is about changing data to information by analysing the data. The Statistical analysis can be divided into two main branches namely descriptive statistics and inferential statistics. Descriptive statistics deals with methods of organising, summarising and representing data in a convenient and informative way by means of tabulation, graphical representation and calculation of descriptive measures. Inferential statistics is a body of methods used to draw conclusions or inferences about characteristics of populations based on sample data by using the descriptive measures calculated. In this chapter we discuss methods for collecting data and some descriptive statistics. 1.2 Data collection Data can come from existing sources or may need to be collected. Technology makes it possible to collect huge amounts of data. For example, retailers collect point-of-sale data on products and customers and credit agencies have all sorts of data on people who have or would like to obtain credit. In the case where data must be collected, data can be collected from a census where everybody or every item of interest is included or from a sample from the population of interest. 1

12 DSC First we look at an example to clarify some definitions. A tyre company, Radial, advertises that its XXX tyres, generally known as Triple X, will complete at least km before one of the four tyres will no longer meet the minimum safety requirements. Several complaints, however, have been received that the tyres completed only km before the minimum requirements were exceeded. Radial sells directly to the public and it is company policy to keep a record of each customer. During the two years that XXX tyres have been manufactured, sets have been sold. Radial feels that they just do not have the time, personnel or money to locate and question all of their customers. They feel that if they question 100 customers, they will get a good idea of the actual situation. In other words, they will take a sample of 100 from a population of Consider the following definitions: VARIABLE: Any property or characteristic that can be measured or observed, is called a variable. A variable can take on a range of different values. For example, the distance completed on a set of tyres differs for each customer and the observations therefore vary continually. In Radial s case, distance completed is a variable. SAMPLE UNIT: The sample unit is the item that is measured or counted with regard to the variable being studied. Radial s sample unit is a set of tyres measured for minimum safety requirements. POPULATION: A population is the set of all the elements or items being studied. In Radial s case, the sets of XXX tyres that have been sold form the population. SAMPLE: A sample is a representative group or a subset of the population. The 100 sets of tyres that Radial will investigate, form the sample. Note: What is very important is that the sample must always be representative of the population. It should be designed and administrated in such a way as to minimise the chance of being biased (sample outcome does not represent the population of interest). If the sample is likely to leave out certain people or there is a relatively high level of non-response, it would probably not be representative of the population. 1.3 Sampling methods Simple random sampling A good sample requires that every item in the population has an equal and independent chance of being included in the sample. A simple random sample of n elements is a sample that is chosen in such a way that every combination of n elements has an equal chance of being the sample selected. One method of drawing a simple random sample is to allocate a number to each item in the

13 3 DSC2602 population. A computer is used to generate a sequence of random numbers. These numbers are then used to identify items in the population to be included in the sample. Example Printapage, a printing company, has 30 clients with outstanding balances (in Rand) as shown in Table 1.1. Account Account number Balance number Balance Table 1.1: Outstanding balances (in rand) The following random numbers are available: 22; 17; 83; 57; 27; 54; 19; 51; 39; 59; 84 and 20. Use these numbers to draw a random sample of 5 from the 30 customer accounts. Solution Since the total number of elements in the population is 30, an account number larger than 30 will be of no use. The sample units are the numbers of the accounts to be drawn. These are The corresponding outstanding balances are 22; 17; 27; 19 and ; 102; 16; 429 and 197.

14 DSC Example A political candidate wishes to determine the opinions of the voters in his ward. He decides on a sample size of 20. Using random numbers, he chooses 20 telephone numbers from the directory for a telephonic survey. Is this procedure correct? Give a reason for your answer. Solution Not all residents will have telephones, and the numbers of those who do have telephones may not all be included in the telephone directory. Such a sample can therefore not be considered random Stratified random sampling Simple random sampling requires no prior (a priori) knowledge of the population and can therefore be done with relatively little effort. It could, however, happen that all the elements drawn for the sample, are nearly homogeneous or alike. This may cause the conclusions about the population to be biased. If, however, you have prior information about the population, you could rule out this problem to some degree and consider more correct information about the population by making use of stratified random sampling. The population is divided into mutually exclusive sets or strata. This means that a specific element may only belong to one group or stratum. The strata must be chosen in such a way that there will be large differences between the strata, but small differences between the elements within the same stratum. Now simple random samples are taken from each stratum. The number of elements taken from each is often proportional to the size of that stratum. Example Divide Printapage s 30 customers into three strata as follows: Stratum Balance 1 < > 600 A proportional sample of size 12 must be drawn from the population. How would you do it? Solution Printapage s customers are divided into three strata, as shown in Table 1.2.

15 5 DSC2602 Stratum 1 Stratum 2 Stratum 3 Account Balance Account Balance Account Balance number (< R200) number (R200 R600) number (> R600) Frequency=15 Frequency=10 Frequency=5 Table 1.2: Dividing Printapage s data into 3 strata To draw a proportional sample of size 12, the following number of items must be drawn from each stratum: 15 12=6 elements from stratum 1, =4 elements from stratum 2, and =2 elements from stratum Lastly, a simple random sample as describe in the previous section is drawn from each stratum Systematic sampling Systematic sampling starts at a randomly selected starting point in the population. Each subsequent k-th element is then chosen. Example A political candidate wishes to determine the opinions of the voters in his ward. He has a list of voters available. One way of obtaining a systematic sample would be to start with voter number 6 and then select every tenth voter to complete a questionnaire. What are the advantages and disadvantages of such a method?

16 DSC Solution Advantage(s): Systematic sampling is convenient, especially when the size of the population is not known. Disadvantage(s): If the variable being considered is periodic in nature, systematic sampling could produce misleading results. For example, if we were to estimate a shop s sales using a 1-in-7 systematic sampling design, it could happen that only sales figures for Saturdays were selected. Sales would then be overestimated. 1.4 Presentation of data Once data has been collected either by you or by someone else, the initial task is to obtain some overall impression of the findings. This can be done by visually representing the data using frequency tables, charts and diagrams. But before we try to visualise the data, let us first consider the different types of data one might get Types of data There are two main groupings of data qualitative and quantitative. Qualitative data is characterised by categorical answers such as yes or no, male or female, etc. Quantitative data is characterised by numerical values. Quantitative data can further be divided into two groups, discrete data and continuous data. Discrete data include everything that can be considered as a separate unit because of its nature, for example, number of units sold, number of consumers, number of job opportunities, etc., that is everything that you can count on your fingers. Continuous data are usually the result of a measurement and do not consist of fixed, isolated points. There can be a whole range of values between any two values. Examples are length, mass, time and temperature measurements. Example Classify the data in each of the following questions: (a) Do you own a TV set? Yes No (b) How many TV sets do you own? (c) How many kilometres did you drive on your set of Radial tyres? (d) What was your electricity bill last month?

17 7 DSC2602 Solution (a) Qualitative (b) Quantitative discrete (c) Quantitative continuous (d) Quantitative continuous Let s look at Radial s data again. Radial, the company introduced in Section 1.2, has taken a sample of 100 and is happy that it is representative. The sample elements (in thousands) are shown in Table Table 1.3: Sample elements for Radial We identified Radial s data as quantitative and continuous. Perhaps if we could picture the data, we would be able to form a better idea of what is going on The frequency table and histogram The histogram is one of the most common ways of visually representing data. It is a graphical representation of a frequency table. A frequency table is a table in which the data are grouped into intervals. To draw a histogram we must first set up a frequency table. The steps needed to set up a frequency table are as follows: Step 1 Find the range (R) of the data, where R= maximum value of data set minimum value of data set. Step 2 Decide on the number of intervals. If the number of intervals used are too few or too many, one cannot get a good idea of the distribution of the data. It is not always easy to decide how many intervals to use. R is a good number if R is large, but any number between 5 and 8 is acceptable. Do 10 not use fewer than 5.

18 DSC Step 3 Determine the width of the intervals as R number of intervals. The width must be a whole number this will make it easier to determine the limits of the intervals. Step 4 Determine the interval limits. The limits should be such that there is no doubt into which interval a value falls. For example, when we are working with Radial s data, we cannot choose intervals such as Why not? Well, where would you place a value of 65? For the mathematical manipulations that we will be doing with grouped data, it is also necessary that we do not work with intervals such as 55 just smaller than just smaller than 75 What does just smaller than 65 mean? The rule that we will use is to take the lower limit of the first interval as half a unit less than the minimum value, so that there can be no confusion as to which interval a value belongs. The lower limit of the first class must be a value which is smaller than the minimum data value and the upper limit of each interval is the same as the lower limit of the succeeding interval. Step 5 Tabulate the data. Example (a) Set up a frequency table for Radial s data shown in Table 1.3. (b) What percentage of customers were able to do km or more on a set of tyres? (c) What percentage of customers were able to do km or less on a set of tyres? (d) Draw a histogram of the data using the frequency table.

19 9 DSC2602 Solution (a) Set up the frequency table following these steps: Step 1 The range of the data: The minimum value is 14 and the maximum value is 98. The range is therefore R=98 14=84. Step 2 The number of intervals: The number of intervals is calculated as R 10 = 8,4. Therefore, use 8 intervals. Step 3 The interval width: The interval width is calculated as R 8 = 84 = 10,5. Therefore, use a width of Step 4 The interval limits: The interval limits are determined as follows: The minimum value is 14 so the lower limit of the first interval will start at half a unit less than the minimum, which is 13,5. The upper limit of the first interval is determined by adding the width to the lower limit, that is 13,5+11=24,5. The first interval is therefore 13,5 24,5. The second interval starts at 24,5 and also has a width of 11. Its upper limit is therefore 24,5+11=35,5. The last interval starts at 90,5 and its upper limit is 101,5, which is well above the largest element in the sample. The intervals are: =11 { }} { 13,5 24,5 Step 5 Tabulate the data: 24,5 35,5 35,5 46,5 46,5 57,5 57,5 68,5 68,5 79,5 79,5 90,5 90,5 101,5 The only remaining thing to do is to group the data into the intervals. Now go back to the data set and consider the first four sample elements in the first row, which are 61, 38, 19 and 58. Our aim is to find in which one of the following intervals they belong:

20 DSC Interval 13,5 24,5 24,5 35,5 35,5 46,5 46,5 57,5 57,5 68,5 68,5 79,5 79,5 90,5 90,5 101,5 19 Fit in the first interval because it is greater than 13,5 and less than 24,5. 38 Fit in the third interval because it is greater than 35,5 and less than 46,5. 61 and 58 Fit in the fifth interval because they are greater than 57,5 and less than 68,5 Instead of writing 19; 38; 61 and 58 in their corresponding intervals, we represent them with a line,, as follows Interval 13,5 24,5 24,5 35,5 35,5 46,5 46,5 57,5 57,5 68,5 68,5 79,5 79,5 90,5 90,5 101,5 The fifth element in a group of lines is indicated by a line drawn across the group: represents a group of five. Note: The total number of elements falling into an interval is called the frequency. The complete frequency table for Radial is given in Table 1.4. Interval Frequency 3,5 24,5 7 24,5 35,5 6 35,5 46, ,5 57, ,5 68, ,5 79, ,5 90,5 8 90,5 101,5 1 Total 100 Table 1.4: The frequency table for Radial It is clear that the highest frequency occurs in the interval 57,5 68,5. This shows that most of the customers were able to do between 57,5 and 68,5 thousand kilometres on a set of tyres.

21 11 DSC2602 (b) The intervals 79,5 90,5 and 90,5 101,5 represent the number of customers who were able to drive km or more on a set of tyres. The total number is thus the sum of the frequencies in these intervals, namely 8+1=9. The percentage of customers who were able to do km or more on a set of tyres is %=9%. Note: The fraction 9 is called the relative frequency. 100 (c) The first three intervals account for the customers who were able to drive km or less. The sum of the frequencies in these intervals, namely =26 is thus equal to the total number of customers. The percentage of customers who were only able to do km or less on a set of tyres is %=26%. (d) Now we can graphically represent the frequency table by drawing the interval lengths on a horizontal axis and the frequencies on a vertical axis. This is called a histogram. The histogram for Radial is given in Figure 1.1. Frequency ,5 24,5 35,5 46,5 57,5 68,5 79,5 90,5 101,5 Distance Figure 1.1: Histogram for Radial (Notice that the horizontal axis starts at 0 and that the zigzag line is there to break the line in order to prevent a huge space from appearing on the left of the actual graph.) The pie chart Another way of representing data is by means of a pie chart. A pie chart is drawn as a circle and the slices of the circle represent the relative frequencies expressed as a percentage. It is often difficult to draw a pie chart by hand. In Radial s case one needs to divide the circle into 100 equal slices not an easy task! The pie chart for Radial is more or less as shown in Figure 1.2.

22 DSC ! # # " $ # " #! # # $!! # " # % & ' # # % ' # ' # " $ # # % #!! $ & # % ' # # % # $ & # Figure 1.2: Pie chart representing Radial s data The cumulative frequency polygon We calculated in Example that 26% of the customers was only able to do km or less on a set of tyres. Such information can be presented graphically if we first obtain the cumulative less than table. Such a table is set up from the frequency table, setting the upper limits to less than.... The cumulative frequency table for Radial is given in Table 1.5. Upper limit Frequency Cumulative frequency < 24,5 7 7 < 35, (7+6=13) < 46, (7+6+13=26) < 57, ( =39) < 68, ( =69) < 79, etc. < 90, < 101, Table 1.5: The cumulative frequency table for Radial You have probably realised that cumulative means added up. This information can now be represented by a cumulative frequency polygon as shown in Figure 1.3.

23 13 DSC2602 Cumulative frequency ,5 35,5 46,5 57,5 68,5 79,5 90,5 101,5 Distance Figure 1.3: The cumulative frequency polygon for Radial The stem-and-leaf diagram The stem-and-leaf diagram is also a useful diagram and is easy to set up. The first step is to decide how to separate each observation into two parts - the stem and the leaf. Radial s data can be separated in such a way that the first digit of each number is the stem and the second digit is the leaf. First we determine the biggest and smallest numbers in the data set and separate them into a stem and a leaf. The smallest number in the data, 14, has stem 1 and leaf 4. The largest number, 98, has stem 9 and leaf 8. Next we fill in the rest of the data. All the other numbers lie between these two. We can therefore set up the stem from 1 to 9, with the second digit of each number being written next to its stem, as shown in Table 1.6. Stem Leaf Frequency Table 1.6: Stem-and-leaf diagram for Radial (unsorted) But every stem s leaf MUST be in ascending order (from the smallest value to the largest). Radial s sorted stem-and-leaf diagram is given in Table 1.7.

24 DSC Stem Leaf Frequency Table 1.7: Stem-and-leaf diagram for Radial Now turn the page on its side. It is easy to see that most of the customers drove between and thousand kilometres on a set of tyres. 1.5 Descriptive measures The presentation of charts and diagrams can be regarded as the first step in analysing data and is not sufficient for most purposes. They provide an overall picture of the data but give only an approximate indication of specific properties such as midpoint and spread of data. Proper analysis requires a summary of the data in the form of descriptive statistical measures. Descriptive measures are single numerical values that indicate the shape or distribution of the data set. There are descriptive measures of location, spread, symmetry and kurtosis Measures of locality A measure of location or position gives an indication of the midpoint or general size of the distribution. Examples are the mean, median and mode of a data set The mean Radial advertises that its XXX tyres will last for at least km before one of the four tyres will no longer meet the minimum safety requirements. What is the mean number of kilometres that can be driven on a set of XXX tyres? Radial has only the sample of 100 observations available for estimating the mean. If we consider the sample as being representative of the population, we can use the sample mean as an estimator of the population mean. To obtain the sample mean we add up all the observations and divide the result by the number of observations. (This is called the arithmetic mean.) If we add up all the observations in Radial s sample and divide the sum by the number of observations, we get = 58,28.

25 15 DSC2602 We can therefore expect a set of tyres to last 58, = km on average. The formula for the mean is where x= 1 n x (read as x-bar) is the generally accepted symbol for the arithmetic mean, n is the number of observations, is the Greek letter for S and means sum, x i represents the i-th observation, and n x i is just another way of writing x 1 + x x n. i=1 Note: You may enter the data into the statistics mode of your calculator and find the value of the mean by pressing a button. This is much faster than doing the calculation by hand. See the manual of your calculator. When the raw data, that is the values in the original data set, are available, it is easy to calculate the mean. Sometimes, however, the data is given in the form of a frequency table and the actual values are not known. Let s look at Radial s data again. Assume that Radial s sample data is available in the following form only: (Distance in km) n i=1 x i Interval Frequency ( f i ) 13,5 24,5 7 24,5 35,5 6 35,5 46, ,5 57, ,5 68, ,5 79, ,5 90,5 8 90,5 101, We do not know what the actual values in each interval are. For computational purposes we make the following assumption: All values in an interval are equal to the middle value of the interval. The middle value is calculated by adding the lower and the upper limits of the interval and dividing the result by two. The middle value of the first interval is: 13,5+24,5 2 = 19.

26 DSC We thus assume that all the observations in the first interval are equal to 19. The contribution of these seven observations to the grand total is therefore: 7 19=133. The formula for the mean of a frequency distribution is x= k i=1 f ix i k i=1 f i where f i x i k = the frequency for the i-th interval, = the middle value of the i-th interval, and = number of intervals. Note: Data in a frequency table are often referred to as grouped data. Example Calculate the mean distance travelled on a set of XXX tyres using the frequency distribution in Table 1.4. Solution In Table 1.8 the frequency, f i, the middle value, x i, and the product of these are given for each interval. Interval f i x i f i x i 13,5 24, ,5 35, ,5 46, ,5 57, ,5 68, ,5 79, ,5 90, ,5 101, Table 1.8: Calculating the mean from the frequency table The mean is calculated as x= k i=1 f ix i k i=1 f i = = 58,16. The mean distance is therefore 58, = km. Note: The intervals of the frequency distribution are all of equal width, that is 11. After you have calculated the middle value of the first interval, the successive middle values can be obtained by adding 11 to the previous middle value.

27 17 DSC2602 The calculation based on the above assumption resulted in a total that is different from the actual total. A mean calculated in this way will differ from the actual mean. For example, if the observations in the interval 79,5 90,5 are: 88; 80; 83; 82; 90; 86; 86; 80, their sum is 675. If we use the middle value and the frequency of each interval and then calculate the contribution to the total of the observations in the interval 79,5 90,5, their sum is 680. Interval Frequency Middle value f i x i f i x i 79,5 90, =680 In our example the mean obtained using the original data values is km, while the mean obtained using the frequency distribution is km. The mean is the measure of locality that is used most often. It can, however, be misleading. For example, if we calculate the mean of 2; 3; 5; 71, we see that the mean will be x= = 20,25. When a data set has a mean of 20, one intuitively expects most of the values to lie in the vicinity of 20. In this instance, however, most of the values are less than 6, while one value is an outlier of 71! The mean is rather sensitive to outliers and can often be misleading. On its own, without any additional information, it can often lead to incorrect conclusions. Another disadvantage of the mean is that it is a difficult task to calculate the mean for an open frequency distribution, as the following example will illustrate. Example Table 1.9 gives the property value distribution for ratepayers in the Steelcity Metropolitan Substructure. Property value Frequency (in Rands) (in thousands) Less than More than Table 1.9: Property value distribution

28 DSC What assumptions will we have to make about the middle values of the first and the last interval? What is an acceptable lower limit for the first interval? What is an acceptable upper limit for the last interval? If we have access to the original data, we may be able to make a good guess, but this is not always possible. The use of the mean is therefore restricted when there are open distributions. A big advantage of the arithmetic mean is that it uses all the available data. Later we will see that this is not the case for the other measures of locality. Since the mean can be calculated exactly, it forms the basis for many advanced analyses and is not only descriptive in nature The median Since the mean is sensitive to extreme values (outliers), and may often result in misleading conclusions, the median is often preferred as a measure of locality. The median is the value that divides an ordered data set into two equal parts. If the data set is sorted in ascending order, 50% of the data values will lie below, or to the left, of the median, and 50% will lie above, or to the right of the median. The median is determined as follows: If a data set of size n is sorted in ascending sequence, then the median (me) is the n+1 -th value of the data set. 2 Example Solution Determine the median of the following data sets: (a) 6, 9, 12, 12, 13, 15, 18, 24, 27 (b) 2, 3, 5, 71 (a) The data set is arranged in ascending order and n=9. The median (me) is the 9+1 = 5-th value, that is the median is (b) The data set is arranged in ascending order and n=4. The median (me) is the = 2 1 -th value. 2 The 2 1 -th value is a value between the second and third values, that is halfway between 2 3 and 5 or 3+5 = 4. The median is therefore 4 and we can say that 50% of the data lie 2 to the left of 4 and 50% to the right of 4. The median may also be calculated for a frequency table. The cumulative frequency table is used to identify the median interval.

29 19 DSC2602 Let s use Radial s data once again, as shown in Table Interval Frequency ( f i ) Cumulative frequency 13,5 24, ,5 35, ,5 46, ,5 57, ,5 68, ,5 79, ,5 90, ,5 101, Table 1.10: Cumulative frequency table for Radial To identify the median interval, find the interval within which the n+1 2 = = 50,5th value occurs. The median interval is therefore 57,5 68,5. The biggest advantage of the median is that open intervals pose no problem and it is not affected by extreme values. However, it ignores the largest part of the data and cannot be manipulated mathematically The mode The mode of a data set is the value that occurs most often. Consider the following example: A survey was conducted amongst married couples who were married eight years ago. Table 1.11 gives the frequency table of the data obtained. Number of children per couple x i Number of couples f i Highest frequency Table 1.11: Frequency table for number of children per couple It is clear that most couples have no children. The mode is therefore zero. The mode, however, is not a good measure of locality. Sometimes there is no value that occurs more than any other value, or there is more than one value with the same maximum number of occurrences. In addition, the mode has the same drawbacks as the median. The only thing in favour of the mode is that it is easy to understand. For grouped data the modal interval is the interval with the highest frequency.

30 DSC Measures of dispersion The variance of a data set In the previous section, we considered the problems that occur when working with a measure of locality only. Even though the arithmetic mean, uses all the values in a data set, it does not give much information about what the data set really looks like. We also need information on the spread of the data around the mean. Consider the data set 5; 3; 8; 4; 1; 5; 0; 6 with a mean of x= 32 8 = 4. Let s plot the data points around the mean: x= Now calculate the distance from x=4 to each value and then calculate the mean of these distances =16 and 16 8 = 2. When we work with Radial s sample it may be quite difficult to calculate the mean distance how long do you think it would take using 100 values? An alternative is to calculate the deviation from the mean for each observation, that is x x: 1; 1; 4; 0; 3; 1; 4; 2. When these are added, you get 0, which tells you nothing! However, the number crunchers of the old days did not become discouraged, and came up with the clever idea of using (x x), and squaring it. The square of any value is always a positive number. The mean of the squared deviations is called the variance. The positive square root of the variance is called the standard deviation, and we will use this measurement to give an indication of the spread of data around the mean. The variance of a sample is defined as s 2 = n i=1(x i x) 2. n 1 Notice that we divide by n 1 and not by n. The reason for this is that the sample variance (s 2 ) is used to estimate the population variance (σ 2 ). If we were to divide by n, it would give an

31 21 DSC2602 underestimation of the population variance. Division by n 1 therefore gives a better estimator. Calculators normally can calculate s 2 andσ 2. Make sure which one you should use on your calculator. The squared deviations of our sample are calculated in Table x i (x i x) (x i x) Table 1.12: Calculations to find the variance and the standard deviation The variance s 2 = 48 (8 1) = 6,86 and the standard deviation s= 6,86=2,62. An alternative formula for computing the variance is s 2 = n(σn i=1 x2 i ) (Σn i=1 x i) 2 n(n 1) Note: You may enter the data into the statistics mode of your calculator and find the value of the standard deviation by pressing a button. This is much faster than doing the calculation by hand. See the manual of your calculator. The variance can also be calculated for grouped data. For a frequency table the sample variance is defined as follows: where s 2 = k i=1 f ix 2 i n x 2 n 1 x= k i=1 f i x i k i=1 f i = k i=1 f i x i n ; Let s look at Radial s data again. x i = middle value of the i-th interval, and k n= f i. i=1 The variance of Radial s frequency table is calculated in Table (Remember that we have already calculated the mean as x=58,16.) The variance s 2 = , = ,44 99 = 335,67.

32 DSC Interval f i x i f i x 2 i 13,5 24, ,5 35, ,5 46, ,5 57, ,5 68, ,5 79, ,5 90, ,5 101, Table 1.13: Calculations for the variance of Radial s frequency table The standard deviation of a data set The standard deviation is defined as the square root of the variance. The standard deviation of a sample is defined as s= s 2 n i=1 = (x i x) 2. n 1 But what does this tell us? It tells us how far away the observations are from the mean. The larger the standard deviation, the further away the data points are from the mean. The following schematic representation shows how many of the data points lie between one standard deviation to the left and to the right of the mean, and between two standard deviations to the left and to the right of the mean. 95% 68% 2s s x +s +2s The standard deviation plays a important role in inferential statistics that is, the field where the problem of making scientifically based conclusions about populations, using sample data, is considered The quartile deviation The median is that value which separates a sorted data set into two equal parts.

33 23 DSC % 50% If we divide a sorted data set into four equal parts we get the following four quartiles: me 25% 25% 25% 25% me q 1 q 2 q 3 q 1 represents the value that indicates the end of the first 25% of the data values; q 2 represents the value that indicates the end of the second 25% (or the value which divides the data set into two equal parts, that is the median); and q 3 is the value indicating the end of the third 25%. The middle 50% of the data lies between q 1 and q 3. The quartile deviation is q D = q 3 q 1 2 and is the measurement of the dispersion of the data around the median. As with the median, the quartile deviation does not use all the observations. It ignores outliers since the top 25% and the bottom 25% of the data values are not taken into account. Example The purchasing manager of a group of clothing shops has recorded the following 15 observations on the number of days that pass between reordering items from a new range of children s clothing. Reordering intervals (in days) Solution Calculate and interpret the quartile deviation of the reordering intervals. The value of the median or q 2 is the value of the 1 (n+1)th observation in a ranked data set. 2 Similarly, the value of q 1 is the value of the 1(n+1)th observation and the value of q 4 3 is the value of the 3 (n+1)th observation in an ordered data set. 4 (An ordered data set is a data set that is arranged in ascending order.) The ordered data set is: 5; 12; 15; 17; 17; 18; 18; 22; 22; 23; 23; 26; 26; 28; 29.

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

CHAPTER 2 Describing Data: Numerical

CHAPTER 2 Describing Data: Numerical CHAPTER Multiple-Choice Questions 1. A scatter plot can illustrate all of the following except: A) the median of each of the two variables B) the range of each of the two variables C) an indication of

More information

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html Math 2311 Class

More information

DATA HANDLING Five-Number Summary

DATA HANDLING Five-Number Summary DATA HANDLING Five-Number Summary The five-number summary consists of the minimum and maximum values, the median, and the upper and lower quartiles. The minimum and the maximum are the smallest and greatest

More information

CSC Advanced Scientific Programming, Spring Descriptive Statistics

CSC Advanced Scientific Programming, Spring Descriptive Statistics CSC 223 - Advanced Scientific Programming, Spring 2018 Descriptive Statistics Overview Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.

More information

MAS187/AEF258. University of Newcastle upon Tyne

MAS187/AEF258. University of Newcastle upon Tyne MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................

More information

Descriptive Statistics (Devore Chapter One)

Descriptive Statistics (Devore Chapter One) Descriptive Statistics (Devore Chapter One) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 0 Perspective 1 1 Pictorial and Tabular Descriptions of Data 2 1.1 Stem-and-Leaf

More information

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,

More information

2 Exploring Univariate Data

2 Exploring Univariate Data 2 Exploring Univariate Data A good picture is worth more than a thousand words! Having the data collected we examine them to get a feel for they main messages and any surprising features, before attempting

More information

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line. Introduction We continue our study of descriptive statistics with measures of dispersion, such as dot plots, stem and leaf displays, quartiles, percentiles, and box plots. Dot plots, a stem-and-leaf display,

More information

Frequency Distribution and Summary Statistics

Frequency Distribution and Summary Statistics Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai i at Mānoa Outline 1. Stemplot 2. Frequency table 3. Summary

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc COUNSELLING PSYCHOLOGY (2011 Admission Onwards) II Semester Complementary Course PSYCHOLOGICAL STATISTICS QUESTION BANK 1. The process of grouping

More information

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives Basic Statistics for the Healthcare Professional 1 F R A N K C O H E N, M B B, M P A D I R E C T O R O F A N A L Y T I C S D O C T O R S M A N A G E M E N T, LLC Purpose of Statistic 2 Provide a numerical

More information

Diploma in Financial Management with Public Finance

Diploma in Financial Management with Public Finance Diploma in Financial Management with Public Finance Cohort: DFM/09/FT Jan Intake Examinations for 2009 Semester II MODULE: STATISTICS FOR FINANCE MODULE CODE: QUAN 1103 Duration: 2 Hours Reading time:

More information

CABARRUS COUNTY 2008 APPRAISAL MANUAL

CABARRUS COUNTY 2008 APPRAISAL MANUAL STATISTICS AND THE APPRAISAL PROCESS PREFACE Like many of the technical aspects of appraising, such as income valuation, you have to work with and use statistics before you can really begin to understand

More information

appstats5.notebook September 07, 2016 Chapter 5

appstats5.notebook September 07, 2016 Chapter 5 Chapter 5 Describing Distributions Numerically Chapter 5 Objective: Students will be able to use statistics appropriate to the shape of the data distribution to compare of two or more different data sets.

More information

MAS187/AEF258. University of Newcastle upon Tyne

MAS187/AEF258. University of Newcastle upon Tyne MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................

More information

1 Describing Distributions with numbers

1 Describing Distributions with numbers 1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write

More information

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii) Contents (ix) Contents Preface... (vii) CHAPTER 1 An Overview of Statistical Applications 1.1 Introduction... 1 1. Probability Functions and Statistics... 1..1 Discrete versus Continuous Functions... 1..

More information

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.

More information

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Categorical. A general name for non-numerical data; the data is separated into categories of some kind. Chapter 5 Categorical A general name for non-numerical data; the data is separated into categories of some kind. Nominal data Categorical data with no implied order. Eg. Eye colours, favourite TV show,

More information

Chapter 6 Simple Correlation and

Chapter 6 Simple Correlation and Contents Chapter 1 Introduction to Statistics Meaning of Statistics... 1 Definition of Statistics... 2 Importance and Scope of Statistics... 2 Application of Statistics... 3 Characteristics of Statistics...

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either

More information

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

A.REPRESENTATION OF DATA

A.REPRESENTATION OF DATA A.REPRESENTATION OF DATA (a) GRAPHS : PART I Q: Why do we need a graph paper? Ans: You need graph paper to draw: (i) Histogram (ii) Cumulative Frequency Curve (iii) Frequency Polygon (iv) Box-and-Whisker

More information

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Measures of Dispersion (Range, standard deviation, standard error) Introduction Measures of Dispersion (Range, standard deviation, standard error) Introduction We have already learnt that frequency distribution table gives a rough idea of the distribution of the variables in a sample

More information

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment

MBEJ 1023 Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment MBEJ 1023 Planning Analytical Methods Dr. Mehdi Moeinaddini Dept. of Urban & Regional Planning Faculty of Built Environment Contents What is statistics? Population and Sample Descriptive Statistics Inferential

More information

DESCRIPTIVE STATISTICS

DESCRIPTIVE STATISTICS DESCRIPTIVE STATISTICS INTRODUCTION Numbers and quantification offer us a very special language which enables us to express ourselves in exact terms. This language is called Mathematics. We will now learn

More information

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form: 1 Exercise One Note that the data is not grouped! 1.1 Calculate the mean ROI Below you find the raw data in tabular form: Obs Data 1 18.5 2 18.6 3 17.4 4 12.2 5 19.7 6 5.6 7 7.7 8 9.8 9 19.9 10 9.9 11

More information

3.1 Measures of Central Tendency

3.1 Measures of Central Tendency 3.1 Measures of Central Tendency n Summation Notation x i or x Sum observation on the variable that appears to the right of the summation symbol. Example 1 Suppose the variable x i is used to represent

More information

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data

Summarising Data. Summarising Data. Examples of Types of Data. Types of Data Summarising Data Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester Today we will consider Different types of data Appropriate ways to summarise these data 17/10/2017

More information

Fundamentals of Statistics

Fundamentals of Statistics CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Ani Manichaikul amanicha@jhsph.edu 16 April 2007 1 / 40 Course Information I Office hours For questions and help When? I ll announce this tomorrow

More information

Exploring Data and Graphics

Exploring Data and Graphics Exploring Data and Graphics Rick White Department of Statistics, UBC Graduate Pathways to Success Graduate & Postdoctoral Studies November 13, 2013 Outline Summarizing Data Types of Data Visualizing Data

More information

3. Probability Distributions and Sampling

3. Probability Distributions and Sampling 3. Probability Distributions and Sampling 3.1 Introduction: the US Presidential Race Appendix 2 shows a page from the Gallup WWW site. As you probably know, Gallup is an opinion poll company. The page

More information

Edexcel past paper questions

Edexcel past paper questions Edexcel past paper questions Statistics 1 Chapters 2-4 (Discrete) Statistics 1 Chapters 2-4 (Discrete) Page 1 Stem and leaf diagram Stem-and-leaf diagrams are used to represent data in its original form.

More information

STATISTICS 4040/23 Paper 2 October/November 2014

STATISTICS 4040/23 Paper 2 October/November 2014 Cambridge International Examinations Cambridge Ordinary Level *9099999814* STATISTICS 4040/23 Paper 2 October/November 2014 Candidates answer on the question paper. Additional Materials: Pair of compasses

More information

SAMPLE. HSC formula sheet. Sphere V = 4 πr. Volume. A area of base

SAMPLE. HSC formula sheet. Sphere V = 4 πr. Volume. A area of base Area of an annulus A = π(r 2 r 2 ) R radius of the outer circle r radius of the inner circle HSC formula sheet Area of an ellipse A = πab a length of the semi-major axis b length of the semi-minor axis

More information

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and

More information

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Chapter 8 Measures of Center Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Data that can only be integer

More information

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI 88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical

More information

Data Analysis. BCF106 Fundamentals of Cost Analysis

Data Analysis. BCF106 Fundamentals of Cost Analysis Data Analysis BCF106 Fundamentals of Cost Analysis June 009 Chapter 5 Data Analysis 5.0 Introduction... 3 5.1 Terminology... 3 5. Measures of Central Tendency... 5 5.3 Measures of Dispersion... 7 5.4 Frequency

More information

Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree

Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree CHINHOYI UNIVERSITY OF TECHNOLOGY SCHOOL OF BUSINESS SCIENCES AND MANAGEMENT POST GRADUATE PROGRAMME Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management

More information

Descriptive Statistics

Descriptive Statistics Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations

More information

STAB22 section 1.3 and Chapter 1 exercises

STAB22 section 1.3 and Chapter 1 exercises STAB22 section 1.3 and Chapter 1 exercises 1.101 Go up and down two times the standard deviation from the mean. So 95% of scores will be between 572 (2)(51) = 470 and 572 + (2)(51) = 674. 1.102 Same idea

More information

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19) Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19) Mean, Median, Mode Mode: most common value Median: middle value (when the values are in order) Mean = total how many = x

More information

M249 Diagnostic Quiz

M249 Diagnostic Quiz THE OPEN UNIVERSITY Faculty of Mathematics and Computing M249 Diagnostic Quiz Prepared by the Course Team [Press to begin] c 2005, 2006 The Open University Last Revision Date: May 19, 2006 Version 4.2

More information

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes

Model Paper Statistics Objective. Paper Code Time Allowed: 20 minutes Model Paper Statistics Objective Intermediate Part I (11 th Class) Examination Session 2012-2013 and onward Total marks: 17 Paper Code Time Allowed: 20 minutes Note:- You have four choices for each objective

More information

St. Xavier s College Autonomous Mumbai STATISTICS. F.Y.B.Sc. Syllabus For 1 st Semester Courses in Statistics (June 2015 onwards)

St. Xavier s College Autonomous Mumbai STATISTICS. F.Y.B.Sc. Syllabus For 1 st Semester Courses in Statistics (June 2015 onwards) St. Xavier s College Autonomous Mumbai STATISTICS F.Y.B.Sc Syllabus For 1 st Semester Courses in Statistics (June 2015 onwards) Contents: Theory Syllabus for Courses: S.STA.1.01 Descriptive Statistics

More information

Exam 1 Review. 1) Identify the population being studied. The heights of 14 out of the 31 cucumber plants at Mr. Lonardo's greenhouse.

Exam 1 Review. 1) Identify the population being studied. The heights of 14 out of the 31 cucumber plants at Mr. Lonardo's greenhouse. Exam 1 Review 1) Identify the population being studied. The heights of 14 out of the 31 cucumber plants at Mr. Lonardo's greenhouse. 2) Identify the population being studied and the sample chosen. The

More information

2 DESCRIPTIVE STATISTICS

2 DESCRIPTIVE STATISTICS Chapter 2 Descriptive Statistics 47 2 DESCRIPTIVE STATISTICS Figure 2.1 When you have large amounts of data, you will need to organize it in a way that makes sense. These ballots from an election are rolled

More information

Project Management Chapter 13

Project Management Chapter 13 Lecture 12 Project Management Chapter 13 Introduction n Managing large-scale, complicated projects effectively is a difficult problem and the stakes are high. n The first step in planning and scheduling

More information

Probability. An intro for calculus students P= Figure 1: A normal integral

Probability. An intro for calculus students P= Figure 1: A normal integral Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided

More information

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION

MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION MEASURES OF CENTRAL TENDENCY & VARIABILITY + NORMAL DISTRIBUTION 1 Day 3 Summer 2017.07.31 DISTRIBUTION Symmetry Modality 单峰, 双峰 Skewness 正偏或负偏 Kurtosis 2 3 CHAPTER 4 Measures of Central Tendency 集中趋势

More information

Description of Data I

Description of Data I Description of Data I (Summary and Variability measures) Objectives: Able to understand how to summarize the data Able to understand how to measure the variability of the data Able to use and interpret

More information

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes. Introduction In the previous chapter we discussed the basic concepts of probability and described how the rules of addition and multiplication were used to compute probabilities. In this chapter we expand

More information

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted. 1 Insurance data Generalized linear modeling is a methodology for modeling relationships between variables. It generalizes the classical normal linear model, by relaxing some of its restrictive assumptions,

More information

The following content is provided under a Creative Commons license. Your support

The following content is provided under a Creative Commons license. Your support MITOCW Recitation 6 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make

More information

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda, MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE Dr. Bijaya Bhusan Nanda, CONTENTS What is measures of dispersion? Why measures of dispersion? How measures of dispersions are calculated? Range Quartile

More information

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali Part I Descriptive Statistics 1 Introduction and Framework... 3 1.1 Population, Sample, and Observations... 3 1.2 Variables.... 4 1.2.1 Qualitative and Quantitative Variables.... 5 1.2.2 Discrete and Continuous

More information

Probability distributions

Probability distributions Probability distributions Introduction What is a probability? If I perform n eperiments and a particular event occurs on r occasions, the relative frequency of this event is simply r n. his is an eperimental

More information

starting on 5/1/1953 up until 2/1/2017.

starting on 5/1/1953 up until 2/1/2017. An Actuary s Guide to Financial Applications: Examples with EViews By William Bourgeois An actuary is a business professional who uses statistics to determine and analyze risks for companies. In this guide,

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 05 Normal Distribution So far we have looked at discrete distributions

More information

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1 32.S [F] SU 02 June 2014 2015 All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1 32.S [F] SU 02 June 2014 2015 All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 2 32.S

More information

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics

Graphical and Tabular Methods in Descriptive Statistics. Descriptive Statistics Graphical and Tabular Methods in Descriptive Statistics MATH 3342 Section 1.2 Descriptive Statistics n Graphs and Tables n Numerical Summaries Sections 1.3 and 1.4 1 Why graph data? n The amount of data

More information

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 - [Sem.I & II] - 1 - [Sem.I & II] - 2 - [Sem.I & II] - 3 - Syllabus of B.Sc. First Year Statistics [Optional ] Sem. I & II effect for the academic year 2014 2015 [Sem.I & II] - 4 - SYLLABUS OF F.Y.B.Sc.

More information

NOTES: Chapter 4 Describing Data

NOTES: Chapter 4 Describing Data NOTES: Chapter 4 Describing Data Intro to Statistics COLYER Spring 2017 Student Name: Page 2 Section 4.1 ~ What is Average? Objective: In this section you will understand the difference between the three

More information

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION Subject Paper No and Title Module No and Title Paper No.2: QUANTITATIVE METHODS Module No.7: NORMAL DISTRIBUTION Module Tag PSY_P2_M 7 TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Properties

More information

Numerical Descriptions of Data

Numerical Descriptions of Data Numerical Descriptions of Data Measures of Center Mean x = x i n Excel: = average ( ) Weighted mean x = (x i w i ) w i x = data values x i = i th data value w i = weight of the i th data value Median =

More information

Lecture Week 4 Inspecting Data: Distributions

Lecture Week 4 Inspecting Data: Distributions Lecture Week 4 Inspecting Data: Distributions Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit So next week No lecture & workgroups But Practice Test on-line (BB) Enter data for your

More information

Introduction. Introduction. Six Steps of PERT/CPM. Six Steps of PERT/CPM LEARNING OBJECTIVES

Introduction. Introduction. Six Steps of PERT/CPM. Six Steps of PERT/CPM LEARNING OBJECTIVES Valua%on and pricing (November 5, 2013) LEARNING OBJECTIVES Lecture 12 Project Management Olivier J. de Jong, LL.M., MM., MBA, CFD, CFFA, AA www.olivierdejong.com 1. Understand how to plan, monitor, and

More information

Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 18 PERT

Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 18 PERT Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 18 PERT (Refer Slide Time: 00:56) In the last class we completed the C P M critical path analysis

More information

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018 ` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.

More information

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny. Distributions September 17 Random variables Anything that can be measured or categorized is called a variable If the value that a variable takes on is subject to variability, then it the variable is a

More information

FACULTY OF SCIENCE DEPARTMENT OF STATISTICS

FACULTY OF SCIENCE DEPARTMENT OF STATISTICS FACULTY OF SCIENCE DEPARTMENT OF STATISTICS MODULE ATE1A10 / ATE01A1 ANALYTICAL TECHNIQUES A CAMPUS APK, DFC & SWC SUPPLEMENTARY SUMMATIVE ASSESSMENT DATE 15 JULY 2014 SESSION 15:00 17:00 ASSESSOR MODERATOR

More information

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution PSY 464 Advanced Experimental Design Describing and Exploring Data The Normal Distribution 1 Overview/Outline Questions-problems? Exploring/Describing data Organizing/summarizing data Graphical presentations

More information

Lecture 2 Describing Data

Lecture 2 Describing Data Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms

More information

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1) Descriptive statistics are ways of summarizing large sets of quantitative (numerical) information. The best way to reduce a set of

More information

1. A is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes,

1. A is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, 1. A is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. A) Decision tree B) Graphs

More information

STAT 157 HW1 Solutions

STAT 157 HW1 Solutions STAT 157 HW1 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/10/spring/stats157.dir/ Problem 1. 1.a: (6 points) Determine the Relative Frequency and the Cumulative Relative Frequency (fill

More information

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2] 1. a) 45 [1] b) 7 th value 37 [] n c) LQ : 4 = 3.5 4 th value so LQ = 5 3 n UQ : 4 = 9.75 10 th value so UQ = 45 IQR = 0 f.t. d) Median is closer to upper quartile Hence negative skew [] Page 1 . a) Orders

More information

Chapter 6 Confidence Intervals

Chapter 6 Confidence Intervals Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) VOCABULARY: Point Estimate A value for a parameter. The most point estimate of the population parameter is the

More information

Textbook: pp Chapter 11: Project Management

Textbook: pp Chapter 11: Project Management 1 Textbook: pp. 405-444 Chapter 11: Project Management 2 Learning Objectives After completing this chapter, students will be able to: Understand how to plan, monitor, and control projects with the use

More information

CH 5 Normal Probability Distributions Properties of the Normal Distribution

CH 5 Normal Probability Distributions Properties of the Normal Distribution Properties of the Normal Distribution Example A friend that is always late. Let X represent the amount of minutes that pass from the moment you are suppose to meet your friend until the moment your friend

More information

Numerical Descriptive Measures. Measures of Center: Mean and Median

Numerical Descriptive Measures. Measures of Center: Mean and Median Steve Sawin Statistics Numerical Descriptive Measures Having seen the shape of a distribution by looking at the histogram, the two most obvious questions to ask about the specific distribution is where

More information

Time allowed : 3 hours Maximum marks : 100. Total number of questions : 8 Total number of printed pages : 7 PART A

Time allowed : 3 hours Maximum marks : 100. Total number of questions : 8 Total number of printed pages : 7 PART A : 1 : Roll No... Time allowed : 3 hours Maximum marks : 100 Total number of questions : 8 Total number of printed pages : 7 PART A (Answer Question No.1 which is compulsory and any two of the rest from

More information

STA1510 (BASIC STATISTICS) AND STA1610 (INTRODUCTION TO STATISTICS) NOTES PART 1

STA1510 (BASIC STATISTICS) AND STA1610 (INTRODUCTION TO STATISTICS) NOTES PART 1 STA50 (BASIC STATISTICS) AND STA60 (INTRODUCTION TO STATISTICS) NOTES PART Dear student, I pray that this information finds you in good health. These notes are written an integral part of Unisa s student

More information

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions SGSB Workshop: Using Statistical Data to Make Decisions Module 2: The Logic of Statistical Inference Dr. Tom Ilvento January 2006 Dr. Mugdim Pašić Key Objectives Understand the logic of statistical inference

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 6 Normal Probability Distributions 6-1 Overview 6-2 The Standard Normal Distribution

More information

Edexcel past paper questions

Edexcel past paper questions Edexcel past paper questions Statistics 1 Chapters 2-4 (Continuous) S1 Chapters 2-4 Page 1 S1 Chapters 2-4 Page 2 S1 Chapters 2-4 Page 3 S1 Chapters 2-4 Page 4 Histograms When you are asked to draw a histogram

More information

2CORE. Summarising numerical data: the median, range, IQR and box plots

2CORE. Summarising numerical data: the median, range, IQR and box plots C H A P T E R 2CORE Summarising numerical data: the median, range, IQR and box plots How can we describe a distribution with just one or two statistics? What is the median, how is it calculated and what

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 4 Random Variables & Probability Distributions Content 1. Two Types of Random Variables 2. Probability Distributions for Discrete Random Variables 3. The Binomial

More information

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers Cumulative frequency Diploma in Business Administration Part Quantitative Methods Examiner s Suggested Answers Question 1 Cumulative Frequency Curve 1 9 8 7 6 5 4 3 1 5 1 15 5 3 35 4 45 Weeks 1 (b) x f

More information

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis Descriptive Statistics (Part 2) 4 Chapter Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis McGraw-Hill/Irwin Copyright 2009 by The McGraw-Hill Companies, Inc. Chebyshev s Theorem

More information

4.1 Probability Distributions

4.1 Probability Distributions Probability and Statistics Mrs. Leahy Chapter 4: Discrete Probability Distribution ALWAYS KEEP IN MIND: The Probability of an event is ALWAYS between: and!!!! 4.1 Probability Distributions Random Variables

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, 2013 Abstract Review summary statistics and measures of location. Discuss the placement exam as an exercise

More information

Appendix A. Selecting and Using Probability Distributions. In this appendix

Appendix A. Selecting and Using Probability Distributions. In this appendix Appendix A Selecting and Using Probability Distributions In this appendix Understanding probability distributions Selecting a probability distribution Using basic distributions Using continuous distributions

More information

Measures of Central tendency

Measures of Central tendency Elementary Statistics Measures of Central tendency By Prof. Mirza Manzoor Ahmad In statistics, a central tendency (or, more commonly, a measure of central tendency) is a central or typical value for a

More information

Example: Histogram for US household incomes from 2015 Table:

Example: Histogram for US household incomes from 2015 Table: 1 Example: Histogram for US household incomes from 2015 Table: Income level Relative frequency $0 - $14,999 11.6% $15,000 - $24,999 10.5% $25,000 - $34,999 10% $35,000 - $49,999 12.7% $50,000 - $74,999

More information