Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree

CHINHOYI UNIVERSITY OF TECHNOLOGY SCHOOL OF BUSINESS SCIENCES AND MANAGEMENT POST GRADUATE PROGRAMME Master of Science in Strategic Management Degree Master of Science in Strategic Supply Chain Management Degree SUGGESTED SOLUTION EXAMINATION Course and code: Quantitative Management - MSCSM 6-01 Course and code: Statistics and Data Analysis - MSCSCM 6-01 Date due: Wednesday 20 January 2016 Time: Duration: 4 HOURS Instructions to candidates: 1. Answer all questions. 2. Total marks 185 marks 3. 30 minutes tea-break is allowed Materials Formulae booklet (provided 1 copy per group) Tables Graph paper Use of a scientific calculator is expected. 1

Section A (Total Marks: 35) Instructions: Answer all questions. 1. Fill in the blank space with the appropriate response. (15 Marks) a. There were 27,466,000 passenger cars and taxis registered in Zimbabwe in 1990; 40,339,000 in 2000; 61,671,000 in 2010, etc. The study of the statistical techniques which merely describe numerical data is called descriptive statistics. b. Based on survey of 219 regular church-goers in a remote village, it is estimated that 42 percent of all the church-goers will try attending our new community church called Glorious Life Ministries. When sample data of this nature is used to make statements about all church-goers, the facet of statistics is called statistical inference or inductive. c. The 219 church-goers in problem 1(b) above is just a small portion of all churchgoers in Zimbabwe. The portion is referred to as a sample. All the church-goers in Zimbabwe are called the sample frame or population. d. The enrolment at selected universities and colleges is: University of Zimbabwe enrolled 18,701; NUST enrolled 16,411; Midlands State enrolled 9,460; Harare Polytechnic enrolled 8,208 and Chinhoyi University enrolled 10,271. Data of this type which are organized in categories are referred to as nominal- scaled level of measurement. e. A cumulative frequency distribution enables us to see how many observations lie above or below certain values, rather than merely recording the numbers of items within intervals. f. Certain numerical measures such as weight, length, and pressure are examples of interval data. g. A random sample contains the relevant characteristics of the population in the same proportion as they are included in that population. h. A collection of data is called a data set and a single observation a data element. 2. State whether the following statements are true or false and give reasons for your answers. All questions carry equal marks (20 marks): a. Quantitative and qualitative are both random variables that generalize the process of statistical analysis in a similar manner. FALSE Quantitative random variables are ratio-scaled random variables. Qualitative random variables are nominal-scaled. Mean values cannot not be found for nominal data types. b. The mean may be a misleading measure of central location if the data form a skewed distribution. TRUE. The mean may be distorted by the extreme values at either end of the distribution. c. If the mean is greater than the median the distribution is skewed to the right. TRUE positive d. The mode is influenced by outliers. FALSE the mode or the modal class represents the most frequently occurring value of a random variable. The mode only accounts for a single value observation in ungrouped data- set or highest frequent class with the most 2

counts in grouped data-set. It is identified relative to all observations counts in its determination. Outliers (i.e. a few extreme values) are observations that are spatial and rare random observations in a data set and a out of the way in the identification of the mode. e. In a frequency distribution, the modal value need not necessarily in the interval with the highest frequency. TRUE The mode is the most frequently occurring value in a data-set. A frequency distribution is the ordering of data in either ascending or descending order to determine the occurrence of the different value of a random variable in data-set. An interval in a frequency distribution occurs in grouped datasets. Once data has been grouped the individual observations identified become submerged into the group. Their identity changes to that of the several observations within the same class or group. They become an integral part of the new class definition. In this process individual data points even though they might be high in ungrouped datasets they is no guarantee that their highest count status will be retained. f. The 50th percentile is another term to describe the mode. FALSE No, the mode is not rather it is the mean being described. The median is that value of a random variable which divides an ordered dataset into two equal parts. Half the observations will fall below this median value and other half above it. While, the mode is the most occurring observations in a dataset. Furthermore, the median need not be an observed value but can be computed in an even dataset. g. The value which divides an ordered data set into 25% below it and 75% above it is called the upper quartile. FALSE lower quartile h. The sum of the probabilities of a set of events equals 1 if the events are mutually exclusive and collectively exhaustive. TRUE Mutually exclusive events- when two or more events cannot occur simultaneously in a single trial of a random experiment (i.e. not at the same point in time). Collectively exhaustive when the union of all possible events is equal to the sample space. This means, that in a single trial of a random experiment, at least one of these events is certain to occur. 3

Section B (Total Marks: 150) Instructions: Answer all SIX questions. QUESTION 1 (ONE) [Total: 25 Marks] Econet Wireless Limited has implemented a system of charging out telephone calls based on the length of a call. To find out how this new charging out system would affect its telephone bill, a market research company which carries out extensive telephone interviews monitored the duration of 600 calls over a period of 3 days. The following frequency distribution was compiled: Duration (minutes) 2-4,9 2-4,9 2-4,9 2-4,9 2-4,9 2-4,9 Number of calls 38 38 38 38 38 38 i. Find the mean and median of call lengths. (5marks) Suggested solution: Classes No of calls Midpoint Ogive (Call Durations) f i x i x i f i x 2 i f i f(<) 2-4,9 38 3,5 133 465.5 38 5-7,9 122 6,5 793 5154,5 160 8-10,9 186 9,5 1767 16786,5 346 11-13,9 134 12,5 1675 20937,5 480 14-16,9 98 15,5 1519 23544,5 578 17-19,9 22 18,5 407 7529,5 600 600 6294 74418,0 The mean duration of calls = 6294/ 600 = 10,49 minutes The median position: n/2 = 600/2 = 300 th position. Hence the median interval (using Ogive) is [8-10,9]. The class width is 3 (i.e. 5-2). Median (Q 2 ) = 8 + [3(300-160)/ 186] = 8 + 2,258 = 10,258 Half the calls lasted less than 10,258 minutes, while the other half lasted longer than 10,258 minutes. ii. Find the standard deviation of call lengths and explain the significance of this measure. (5 marks) Suggested solution: 4

Standard deviation of call durations: S x = [[74418 600 (10,49) 2 ] /(600-1)] = 14,01326 = 3,7434 minutes. The standard deviation is a statistical measure which expresses the average deviation about the mean in the original units of the random variable (i.e. un-squared units of measure). The calls duration are 3, 7434 minutes spread about the mean of calls duration of 10,49 minutes. iii. Find the inter-quartile range and quartile deviation. (5 marks) Suggested solution: Lower Quartile (Q 1 ): Q 1 position: n/4 = 600/4 =150 th position. Hence the Q 1 interval (using the Ogive) is [5-7,9]. The class width is 3. Q 1 = 5 + [3 (150-38)/ 122] = 5 + 2,7541 = 7, 7541 minutes. One-quarter of all calls lasted less than 7, 7541 minutes. While 75% of calls lasted more than 7,7541 minutes. Upper Quartile (Q 3 ): Q 3 position: 3n/4 = 3(600)/4 = 450 th position. Hence the Q 3 interval (using the Ogive) is [11-13,9]. Q 3 = 11+ [3 (450-346)/ 134] = 11 + 2,3284 = 13,3284 minutes. Three-quarters of all calls lasted less than 13,3284 minutes, while only 25% of calls lasted more than 13, 3284 minutes. Inter-quartile Range (IR) = Q 3 Q 1 IR = 13, 3284 7, 7541 = 5, 5743 minutes. Quartile Deviation (QD) = 5, 5743 / 2 = 2, 7872 minutes. iv. Establish the value of skewness measure. (5 marks) Suggested solution: Pearson s coefficient of Skewness: SK p = [3(10,9-10, 258)]/ 3, 7434 = 0,18593 This measure shows a very moderate degree of positive (right) skewness in the distribution of call durations. This means that there are a few large positive outliers, i.e. a few long-duration calls. Bowley s Skewness coefficient: SK b = [(13,3284 10, 258) (10, 258 7, 7541)] / [(13, 3284 7, 7541)] = 0, 10163 This confirms the findings from Pearson s coefficient. v. Which set of descriptive measures (mean and standard deviation or mean and interquartile range) would you recommend to the management as the representative measures of call lengths? (5 marks) 5

Suggested solution: The slight positive skewness in the distribution of call durations may favour selecting the median and quartile deviation. However, since skewness is marginal, the mean and standard deviation can still be considered to be a true representation of the behaviour of the random variable, call durations. These latter measures are generally preferred. QUESTION 2 (TWO) [Total: 25 Marks In a comparative study of residence preferences between houses and flats, the following quantitative data was presented at a city council planning meeting. Residential area Preferred type of accommodation House (%) Flat (%) Entumbane 17 9 Mpopoma 31 8 Mzilikazi 13 11 Luveve 2 9 Total 63 37 a) Justify one method for collecting such data. Suggested solution: There are several data collection methods. These range from survey methods to scientific experiments in agricultural fields. Gathering data to be to analysed statistical can be categorised into three approaches: Direct observation (e.g., direct observation, desk research) Interview methods (e.g., personal, postal, telephone) Experimentation Statistics textbook should be consulted for a fuller discussion on the advantages and disadvantages of the above methods/ approaches as well as use of these data collection techniques. b) State, describe and justify the sampling technique used in the study. Suggested solution: Statistics textbook recommended for the course should be consulted for a fuller discussion on the advantages and disadvantages of the above methods/ approaches as well as use of these data collection techniques. c) Present the findings on a component bar-graph. 6

d) Write two interpretations of the findings from the component bar-graph. Suggested solution: The relative concentration of residential type preferences by residential area can still be seen to be house accommodation. In addition, the breakdown between house and flat preference within each residential area is highlighted. o Most house accommodation favourites reside in Mpopoma and Entumbane townships. o Mzilikazi shows an approximately equal spread of house and flat accommodation. o The majority of Luveve respondents are flat favourites. e) Suggest two strategic uses of the findings to the City Council. f) Suggested solution: City Father/ management should be cautious and not only depend on the information generated from this survey. Residents and other key stakeholders should be consulted for a fuller discussion on the advantages and disadvantages of the above preferences as well as use of these findings. Town planning Regulatory Investment policies Partnering with developers 7

Question THREE (3) a) A soft drink distributor does statistical analyses for an automobile racing team at Donnybrook race course in Mabvuku, Harare. Here are fuel consumption figures in kilometres (km) per litre for the team s cars in the recent races; 4.77 6.11 6.11 5.05 5.99 4.91 5.27 6.01 5.75 4.89 6.05 5.22 6.02 5.24 6.11 5.02 i. Is the fuel consumption classification discrete or continuous? Open or closed? Explain. [3 marks] Suggested solution: A random variable, whose observations can take on only specific values, usually only integer (whole number) values, is referred to as a discrete random variable. In such instances, certain values are valid, while others are invalid. Thus, the fuel consumption is an open continuous random variable. Here, the random variable s observations assume on any value in an interval, as such fuel consumption is said to generate continuous data. ii. Is the data qualitative or quantitative? Explain with reasons. [3 marks] Suggested solution- Data type is determined by the nature of the random variable which the data represent either as quantitative or qualitative. Quantitative random variables are variables which yield numeric responses. The data generated for quantitative random variables can be meaningfully manipulated using conventional arithmetic operations. Quantitative variables are measured using interval-scaled data (i.e. implied ranking, does not possess an absolute origin) and ratio-scaled data (i.e. with an absolute origin). Therefore quantitative data are numeric (e.g. heights, IQs, or speed) or counted (e.g. number of employees, phone calls per hour, or points scored in a soccer match). But not all the variables we encounter are quantitative. Variables such as marital status, heads or tails in a coin toss, or winning or losing a soccer game are categorical, or qualitative. Qualitative random variables are variables which yield categorical (non-numeric) responses. The data generated by qualitative random variables are classified into one of a number of categories. These numbers are arbitrary (i.e. codes): coded values cannot be manipulated arithmetically, as it does not make sense. Such data is associated 8

mainly with nominal-scaled data and ordinal-scaled (e.g. implied ranking between categories) data measures. iii. Calculate the median fuel consumption. [2 marks] Suggested solution- using the formulae in the provided booklet, the median fuel consumption is 5.51 km per litre. iv. Compute the mean fuel consumption. [2 marks] Suggested solution- using the formulae in the provided booklet, the mean fuel consumption is 5.5325 km per litre. Or 5.5 km per litre to nearest 1 decimal point. v. Group the data into five equally sized classes and construct a histogram. What is the fuel consumption value of the modal class? [5 marks] Suggested solution- Below is the grouped data for the construct of a histogram. Class (km/l) 4.77-5.03 5.04-5.30 5.31-5.57 5.58-5.84 5.85-6.11 Frequency 4 4 0 1 7 Using provided graph paper convert the table into a histogram. The fuel consumption value of the modal class is defined by the class interval from 5.85 to 6.11 with a frequency of 7. vi. Which of the three measures of central tendency is best for the soft drink distributor to use when she orders fuel? Explain. [2 marks] Suggested solution- It depends. If she is ordering fuel for only one car, she should be cautious and use the modal value. If she is ordering fuel for several cars running in the same race, the mean or median is probably an ok guide. vii. Construct the less-than-ogive for the data set and determine the inter-quartile range. Explain the use of inter-quartile range in decision-making. [4 marks] Suggested solution- First convert the frequency table in a cumulative frequency table, thereafter plot a graph using the midpoint of all the five class as point of connection. The line graph is the resultant less-than-ogive. Note that an ogive is a graph of a cumulative frequency distribution. From the graph determine the quartile point from the same line plotted. The inter-quartile range measures approximately how far from the median we go on either side before we can include one-half the values of the data set. It is the range of the most inner 50% of the observed frequency values of the data set. As such this rough 9

measure of dispersion ignores the extremely outliers from the low 25% of the observations as well as the upper 25% of the observed values. viii. Using the mean computed in (iv) above calculate the variance and the standard deviation. Explain which of the two measures of dispersion between standard deviation and inter-quartile range (refer to your (iv) answer above) is better. [4 marks] Suggested solution- Variance measures the average of all square deviations from the mean of a data set. Classes No. of Cars Midpoint f*x x^2 f*x^2 f x 4.77-5.03 4 4.90 19.6 24.01 96.04 5.04-5.30 4 5.17 20.68 26.73 106.92 5.31-5.57 0 5.44 0.00 29.59 0.00 5.58-5.84 1 5.71 5.71 32.60 32.60 5.85-6.11 7 5.98 41.86 35.76 250.32 f = 16 f*x = 87.85 f*x^2= 485.88 x = f*x / f = 87.85/ 16 = 5.490625 Variance of fuel consumption = ((f*x^2 ) n(( x ^2))/ ( n-1) = (485.88 (16) (5.49)(5.49))/ (16-1) = 0.24256 square km/ litre. Standard deviation = variance = 0.4925 km/ litre The standard deviation is more comprehensive since it considered all the observed midpoint value of classes. The sample standard score tell us how many standard deviations a particular sample observation lies below or above the sample mean. 10

Question FOUR (4) a) The following bivariate (joint) probability distribution for the two random variables, age and traffic offences of residents in Chinhoyi. Age Offences over last 12 months None F1 One F2 Two or More F3 E1 < 18 0,23 0,12 0,05 0,40 E2 18 0,45 0,14 0,01 0,60 0,68 0,26 0,06 1,00 i. What is the probability that a randomly selected resident had no traffic offences in the last 12 months, given that he/ she is 18 or older? [4 marks] Suggested solution -Conditional probability: P (F1/ E2) = P (F1 E2) / P (E2) = 0.45/ 0.60 = 0.75 ii. What is the probability that a randomly selected resident had two or more traffic offences in the last 12 months? [4 marks] Suggested solution- Marginal probability: P (F3) = 0.06 iii. What is P (E1 U E2) equal to? [4 marks] Suggested solution- Addition rule for non-mutually exclusive events: P (E1 U F2) = P (E1) + P (F2) P (E1 F2) = 0.40 + 0.26-0.12 = 0.54 iv. If event A is a randomly selected resident under 18 with less than two offences, find P(A ) [4 marks] Suggested solution- Note event A is a resident 18 years or older with two or more offences: P (A) = P (( F1 U F2) E1) = P (F1 E1) + P (F2 E1) 11

= 0.23 + 0.12 = 0.35 Then P (A ) = 1-0.35) = 0.65 b) An auditing firm has found from experience that 1 in 20 transactions are incorrectly processed in a client s financial records. An auditor randomly draws a sample of 8 transactions from this client s accounting records. i. What is the probability that 3 of these transactions will be incorrectly processed? Interpret the results. [3 marks] Suggested solution- Binomial distribution: If X has a binomial distribution with Parameters n and p, then: P(X = r) = n r n r p (1 p) k Where r = number of successses, n-r = number of failures, p= P (successes) and q= (1-p) hence P (failures) Justification for using the binomial distribution: The random variable (transaction processing) is discrete and fits the binomial probability because: There are only 2 possible outcomes of this discrete random variable, namely- incorrect processing (the success outcome), and correct processing (the failure outcome). A probability can be assigned to the occurrence of each outcome for a single transaction inspected, namely; o P (= probability of being incorrectly processed) = 0,05 o q (= probability of being correctly processed) = 0,95 Note that the success outcome is identified with an incorrect transaction since the probability question relates to finding probabilities for incorrectly processed transactions. 8 transactions are observed. Each transaction obseved represnts a single trail of this study. Hence n = 8. Each transaction (trial) can be regarded as independent of any other transaction in terms of its correctness. 12

Suggested solution procedure: From the problem description the folowing values are assigned to the binomial terms: p= 0,05; q= 0,95 and n= 8 The domain, r (i.e. successes), of all possible outcomes = 0, 1, 2, 3, 4, 5, 6, 7, 8. Find P (r= 3 incorrectly processed transactions), i.e. P (r=3) = (8!/ (3! (8-3)!) 0.05^ 3 * 0.95^5 = 56 (0.000096) = 0.0054 Interpretation: The probability is 0.0054 (just over 0,5%) that exactly 3 out the 8 transactions examined will have been incorrectly processed. ii. What is the probability that no more than 2 of the 8 transactions will be wrongly processed? Explain the findings. [3 marks] Suggested solution: The binomial approach is till appropriate. Find P (no more than 2 incorrect transactions out of 8) This translates mathematically into finding P (r 2). This implies that either 0 or 1 or 2 transactions can be incorrectly processed out of the sample of 8 transactions observed. Using the addition rule of probability for mutually exclusive events, the combined probability is: P (r 2) = P (r = 0) + P (r= 1) + P (r= 2) The three binomial probabilities must now be computed: P (r= 0)= (8! / (8 0)!) * (0.05^0) * (0.95^8) = 0.6634 Recall: 0! = 1. P (r= 1)= (8! / (1! (8 1)!)* (0.05^1)* (0.95^7) = 0.2793 P (r= 2)= (8! / (2! (8 2)!)* (0.05^2)* (0.95^6) = 0.0515 Finally P (r 2) = 0.6634 + 0.2793 + 0.0515 = 0.9942 Interpretation: The probability is 0.9942 (almost complete certainty) that no more than 2 out of 8 randomly selected transactions will have been incorrectly processed. iii. Find the probability that at least 2 out of 8 random selected transactions have been incorrectly processed? Interpret the results. [3 marks] Suggested solution-expressed mathematically, find: 13

P (r 2) = P (r = 2) + P (r = 3) + P (r = 4) + + P (r = 8) To avoid onerous calculations, the complementary law of probability, can be used. Thus P (r 2) = 1- P (r 1) = 1 [ P (r = 0) + P (r = 1) = 1 [0.6634 + 0.2793] (from question (6 ii) above) = 0.0573 Interpretation- The probability is only 0.0573 (almost a 6% chance) that at least2 incorrect transactions will be found from a random sample of 8 transactions. It is highly unlikely. General note: It should be noted that: Keywords such as at least; no more than; at most; no less than; greater than; imply the summing of individual probabilities. The complementary rule should be considered whenever possible to reduce the amount of calculations. QUESTION (5) FIVE QUESTION 5 (FIVE) [Total: 25 Marks] a) Identify and explain four differences between a bar graph and a histogram. [5 marks] Table 1, shows the Age distribution at which a sample of women had their first child. Table1, Women Age distribution at first child birth Age 10-20 21-25 26-30 31-50 Frequency 33 30 45 10 i. Present the findings on a frequency density graph. [3 marks] 14

For this distribution, calculate and interpret the following: i. Mean [3 marks] X = 24, 15 years ii. Median [3 marks] X median = 24, 47 years iii. Mode [3 marks] X mode = 27, 2 years iv. Standard deviation [3 marks] = 7, 5 years X c) Suggest two population management strategies for this community. [3 marks] e) Draw a probability tree diagram to show the probability distribution of 3 children in a family. [3 marks] g) Calculate the probability of having 3 children of the same gender in a family. [3 marks] Suggested solution: P(x) = 2/8 or 25% Suggested solution: Any of the Statistics textbook recommended for the course should be consulted for a fuller discussion on the data presentation, computation of measures of central location and discrete probability distribution. QUESTION 6 (SIX) [Total: 25 Marks] The distance travelled (in kilometers) by a courier service motorcycle on 30 trips were recorded by the driver. 24;19; 2; 27; 20; 17; 17; 32; 22; 26; 18;13; 23; 30; 10; 13; 18; 22; 34; 16; 18; 23; 15; 19; 28; 25; 25; 20; 17; 15 15

a. Define the random variable, the data type, and the measurement scale. (5 marks) Random variable: distance travelled (in kilometres) Measurement scale: ratio-sclaed Data type; continuous b. From the data set, prepare: (6 marks) i. An absolute frequency distribution, ii. A frequency distribution, and The (relative) less than ogive. Steps: Range =maximum minimum = 34-10 = 24 Number of classes: (using Sturge s rule as a guide) 2 5 30 Hence use 5 classes. Class width: range / number of classes = 24/5 approximately 5 Choose first class lower limit to be 10. Classes (distance) Absolute frequency distribution (No. of trips) Relative Frequency Distribution (%) Ogive (less than) (Cumulative %) 10 - < 15 3 10, 0 0 15 - < 20 11 36, 7 10, 0 20- < 25 8 26, 7 46, 7 25- < 30 5 16,7 73, 4 30- < 35 3 10, 0 90, 0 c. Construct the following graphs: (5 marks) i. A histogram of the relative frequency distribution, and ii. The cumulative frequency polygon. Suggested solution: (refer to Chapter 2 handout on the portal) d. From the graphs, read off: (9 marks) i. What percentages of trips were between 25 and 30 km long? 90% of trips were below 30km and 73% of trips were below 25km. Hence, approximately 17% of trips were between 25km and 30km. ii. What percentage of trips were less than 25 km? 73% of trips appear to have been under 25km. iii. What percentage of trips were 22 km or more? Approximately 58% of trips were less than 22km. Hence, approx. 42% of trips were 22 km or more. 16