DESCRIBING DATA: MESURES OF LOCATION

Size: px

Start display at page:

Download "DESCRIBING DATA: MESURES OF LOCATION"

Diane Heath
5 years ago
Views:

1 DESCRIBING DATA: MESURES OF LOCATION A. Measures of Central Tendency Measures of Central Tendency are used to pinpoint the center or average of a data set which can then be used to represent the typical or representative data value. 1. The population mean ( µ ), A population is the entire group of objects that are being studied. To find the population s, mean sum up all the observed values in the population ( X ) and divide this sum by the number of observations (N) in the population. µ = X / N Any measurable characteristic of the population, such as the mean, is called a parameter. is commonly called the data s arithmetic mean or just plain mean. X / N. The Sample Mean (X). The sample mean is the sum of all the values in the sample divided by the number of values in the sample. The sample mean is used to make inferences about the population mean. X = X n The sample mean, or any other measure based on sample data, is called a statistic. Note: (n) is the sample size while (N) is the population size. Example: There are 1 widget makers in the UK. The year 000 sales for each widget maker are a follows (all data are in millions of British pounds): 1, 5, 34, 15, 19, 44, 54, 33,, 8, 17, 4. Your research partner is exceedingly lazy and has decided to collect data on only five firms in the industry. Given this data, calculate the population mean and calculate the sample mean (your partner s data set is shown as the bold sales numbers in the original data series above.) µ = Population Mean = =

2 3. Properties of the arithmetic mean. The arithmetic mean is the sum of the observations divided by the number of observations. It is the most widely used measure of central tendency and has the following properties: a. All interval and ratio data sets have an arithmetic mean. b. All data values are considered and included in the arithmetic mean computation. c. A data set has only one arithmetic mean. This says that the mean is unique. d. The arithmetic mean is a useful measure for comparing two or more populations. e. The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. That is: The sum of the deviations from the mean = ( X X ) = 0 Example: A data set contains the following numbers: 5, 9, 4, and 10. The mean of these numbers is: X = ( ) / 4 = 7 The sum of the deviations from the mean is: ( X X ) = (5-7) + (9-7) + (4-7) + (10-7) = = 0 The arithmetic mean has the following disadvantages: a. The mean can be affected by extremes, that is, unusually large or small values. b. The mean cannot be determined for an open-ended data set. 4. The median is the mid-point of the data when the data is arranged from the largest to the smallest values. Half the observations are above the median and half are below the median. To determine the median, arrange the data from the highest to the lowest (or lowest to highest) and find the middle observation. The median is important when the arithmetic mean is affected by extremely large or small values. When this occurs, the median is a better measure of central tendency than the mean. Properties of the Median: The median, like the mean, is unique. The median is not affected by extreme values.

3 The median can be computed for an open-ended frequency distribution as long as the median does not lie in an open-ended class. Example: The five-year annualized total returns for five investment managers and 30%, 15%, 5%, 1% and 3%. Find the median return for the managers. First, rearrange the returns from highest to lowest: 30% 5% 3% 1% 15% The return observation half way down from the top or half way up from the bottom is 3% so the median return is 3% Example: Now there is a sixth manager with a return of 8%. What is the median return? Rearranging the returns gives: 30% 8% 5% 3% 1% 15% Here, the number of observations is even. Hence there is no single middle observation. To find the median, take the arithmetic mean of the two middle observations: (5+3)/. Thus, the median of the data set is 4.0%. 5. The mode of a data set is the value of the observation that appears most frequently. Advantages of using the mode: a. The mode can be used for all types of data nominal, ordinal, interval, and ratio. b. The mode is not affected by extremely large or small values. c. The mode can also be used to measure open-ended data sets. Disadvantages of using the mode: a. For many data sets there may be no value that appears more than once. b. On the other hand, some data sets have more than one mode. When two dominant numbers appear in equal proportion in a distribution, the distribution is said to be bimodal. In this case the distribution would look like: 6. Selecting an Average for Data in a Frequency Distribution. A distribution is symmetrical and bellshaped when the distribution has the same shape on either side of its center axis. When the distribution is symmetrical the mean, median and mode are located at the center of the distribution and are always equal. 3

4 Mean = median =mode A distribution is nonsymmetrical (skewed) when there are larger numbers on one side of the distribution than the other side. When a distribution is positively skewed the right side tail is longer than normal due to outliers. The mean will exceed the median, and the median will generally exceed the mode. Why? Large outliers falling to the far right side of the distribution can dramatically influence the mean. For negatively skewed distributions tailing off to the left, the mean will be smaller than the median, and the median will generally be smaller than the mode because of the influence of the few extremely small observations on the left. Skewness measures the asymmetry of a distribution. A normal distribution has a skewness equal to zero because it is symmetric. Non-normal distributions that have an elongated right tail are said to be skewed to the right, or positively skewed. Distributions that have an elongated left tail, are said to be skewed to the left or negatively skewed. A lognormal distribution is an example of a positively skewed distribution. Conclusion: When distribution is skewed, the mean should not be used to represent the data set. Kurtosis measures the amount of peakedness. Normal distributions have a kurtosis equal to three. Non-normal distributions with a high peak are leptokurtic (kurtosis > 3) and distributions with a flat peak are said to be platykurtic (kurtosis < 3). 4

5 DESCRIBING DATA: MEASURES OF DISPERSION A. Why Study Dispersion? 1. Measures of central tendency, such as the mean or median, pinpoint the center of the data set but don t explain anything about the variability of the individual observations within the data set.. Studying dispersion enables the analyst to better compare data sets. B. Measures of Dispersion for Ungrouped Data 1. The range is the distance between the largest and the smallest value in the data set. Range = Highest Value Lowest Value Example: The five-year annualized total returns for five investment managers are 30%, 1%, 5%, 0% and 3%. What is the range of the data? Range = 30 1 = 18%. The mean deviation is the average of the absolute values of the deviations from the arithmetic mean. MD = X X n You should remember that the sum of all the deviations from the mean is equal to zero. To get around this zeroing out problem, the mean deviation uses the absolute values of each deviation. For this reason, the mean deviation is also called the mean absolute deviation or MAD. Example continued: What is the mean deviation of investment returns and how is it interpreted? X = [ ]/5 = % MD = [ ]/5 MD = [ ]/5 = 4.8% On average, an individual return will deviate 4.8% from the mean return of %. 5

6 3. The variance and standard deviation are measures of dispersion based on the deviations from the mean. The variance is defined as the mean of the squared deviations from the mean while the standard deviation is the positive square root of the variance. Population Variance: The population variance for ungrouped data is given by: ( X µ ) σ = N Or in computational form: X X σ = Note that ( X N ) N N / is just the mean squared. Example continued: Find the population variance of the five investment manager s returns (30%,1%, 5%, 0%, and 3%). µ = [30% + 1% + 5% + 0% + 3%]/ 5 = percent σ = [(30 ) σ = 178 / 5 = (1 ) + (5 ) + (0 ) + (3 ) ]/ 5 Using the alternative computational formula: σ = [( σ = 598/5 484 = ) / 5 () Note: The major problem with using the variance is the difficulty interpreting it. Why? The variance, unlike the mean, is in terms of units squared. How does one interpret square percents or square dollars? The solution to this dilemma is to use the standard deviation. ] 4. The population standard deviation is the positive square root of the population variance and is found by: σ = ( X µ) N Continuing with the example, σ = = 5.97%. 6

7 5. The sample variance differs from the population variance. The population mean, X / N, equals the sample mean, X / n. Unfortunately this is not the case for the variance. The sample variance is calculated using the following formulas: S = ( X X ) n 1 or in computational form: S ( X ) X = n n 1 The difference between the calculation of the population and sample variances is in the denominator of the equation. In the math of statistics, using only N in the denominator when using a sample to represent its population will result in underestimating the population variance, especially for small sample sizes. This systematic understatement causes the sample variance to be a biased estimator of the population variance. By using (n-1) instead of N in the denominator, we compensate for this underestimation. Thus, by using n-1, the sample variance ( s population variance ( σ ). ) will be an unbiased estimator of the 6. The sample standard deviation for ungrouped data is given by: ( X X ) s = = n 1 X ( X ) n n 1 Example continued: You are now told that the five managers are just a sample of all the firm s investment managers. What is the data s sample variance and sample standard deviation? X = X / n( ) / 5 = 110 / 5 = s = ( X X ) /( n 1) s s = = [(30 ) 44.5 = 6.67% + (1 ) + (5 ) + (0 ) + (3 ) ]/ 4 = 178 / 4 =

8 C. Interpretation and Use of the Standard Deviation 1. Chebyshev s Theorem states that for any set of observations (sample or population, regardless of the shape of the distribution) the minimum proportion of observations falling within k standard deviations of the distribution mean is 1-1/ k has to be greater than 1.. The number of standard deviation (k) in the equation Example: What approximate percent of a distribution will lie within +/-two standard deviations of the mean? From Chebyshev s Theorem: 1 (1/ k ) = 1 (1/ ) = or 75% Thus, Chebyshev s Theorem states that for any distribution, approximately: 75% of observations lie within +/- standard deviations of the mean 88.9% of observations lie within +/-3 standard deviations of the mean 93.75% of observations lie within +/-4 standard deviations of the mean 96% of observations lie within +/-5 standard deviations of the mean 8

9 A SURVEY Of PROBABILITY CONCEPTS A. What is a Probability? A probability is the chance that something will happen. It measures the likelihood that an event will happen in the future. A probability can only have a value between 0 (no chance at all) and 1 (one hundred percent chance.) Some other concepts/key words: 1. An experiment is an observation of some activity or the act of taking a measurement.. An outcome is a particular or possible result of an experiment. 3. An event is the collection of one or more actual outcomes from an experiment. Example: Suppose you roll a single die. That would be an experiment. There are six possible outcomes from this experiment (you could roll a 1,,3,4,5, or 6). The outcome that actually occurs (like rolling a 4) is an event. B. Approaches to Determining Probabilities There are two objective approaches to determining probabilities (classical and empirical) and the subjective approach to setting probabilities. 1. Classical Probability is based on the assumption that the possible outcomes of an experiment are equally likely. This method is presumptive, thus it is called a priori. Probability of an event [P(A)] = Number of Total number favorable outcomes of possible outcomes Where P is the probability of an event occurring and A is the specific event. Example: What is the probability of rolling a 4 with the throw of a six sided die? Probability of observing a four = Number of Total number favorable outcomes 1 = of possible outcomes 6 The term mutually exclusive means the occurrence of any one event precludes the occurrence of another event occurring at the same time. The term collectively exhaustive means that at least one 9

10 of the possible events must occur when an experiment is conducted.. Empirical probabilities are based on the relative frequency of events or the number of times an event happened during similar circumstances in the past. Since this approach is based on actual observations it is called a posteriori. P(A) = Number of The times an event occurred in the total number of observations past Example: Ten of the 500 randomly selected cars manufactured at a certain auto factory are found to be lemons. Assuming that the lemons are manufactured randomly, what is the probability that the next car manufactured is a lemon? P(A) = 10 / 500 =.0 or % Where P stands for probability and A is the outcome that the next car is a lemon. 3. A subjective probability is the probability that a particular event will happen based on a subjective evaluation of all the available information. An educated guess. An example of a subjective probability is guessing that there is a 5% chance that the Iowa Hawkeye football team will play in the Rose Bowl next year. C. The Basic Rules of Probability Here we will deal with multiple events. 1. The special rule of addition is applicable only to mutually exclusive events. If two or more events are mutually exclusive then: P(A or B) = The probability of A or B happening = P(A) + P(B) Or, in general terms: P(A or B or C.) = P(A) + P(B) + P(C) Example: You go to rent a car and are told that the agency only has cars in three colors (green, blue and yellow). 10

11 Car Opinions Green (G) Blue (B) Yellow (Y) Total Total What is the probability that a randomly selected car from this group of 300 cars is green or yellow? Let s define the following events. G = the car selected is green. Y = the car selected is yellow. From the given information: P(G) = 135/300 =.450 P(Y) = 40/300 =.133 Hence: P(G or Y) = P(G) = =.583 The complement rule states that when events A and B are mutually exclusive they are by definition non-overlapping. So if A happens then B can t happen. This can also be restated by saying that if the probability of either A or B happening is one, then the probability of A happening plus A not happening is 1. P(A happening) + P(A not happening) = 1 So, P(A not happening) = 1 P(A happening) Example continued: What is the probability that a randomly selected car will be blue? P(B) + P(G or Y) = 1 So, P(B) = 1-P(G or Y) = =.417 This is also 15/300 =.417. The general rule of addition states that if two events, A and B, are NOT mutually exclusive then you must account for the joint probability of events. That is the possibility that the two events will occur at exactly the same time. Joint probability is shown by the overlap of the occurrence circles in the traditional Venn Diagram shown below to the right. P(A or B) = P(A) + P(B) P(A and B) Where P(A and B) is the joint probability of A and B. The joint probability [P(A and B)] is defined as the probability that measures the likelihood that two 11

12 or more events will happen concurrently. P(A and B) = P(A) * P(B) for independent events; or P(A and B) = P(A) * P(B given that A occurs) for conditional events. 3. The special rule of multiplication requires that two events A and B are independent. Two events are independent if the occurrence of one event has no effect on the occurrence of the other event. For two independent events A and B the probability that A and B will both occur is given by: P(A and B) = P(A) * P(B) Example continued: Suppose you were told that 3.3% of the rental agency s cars had CD Players. And, that the occurrence of a CD player is independent of the car s color. The joint probability of getting a green car and a CD player is: P(G and CD) = P(G) * P(CD) = (.45)(.33) =.105 The probability of getting a green car OR a car with a CD player is: P(G or CD) = P(G) + P(CD) P(G and CD) = (.105) =.578. A conditional probability is the probability of a particular event occurring given that another event has occurred. 4. The general rule of multiplication is used to find the joint probability when one event (B) is conditional on the occurrence of another event (A). P(A and B) = P(A) * P (B A) Where the conditional probability P(B A) stands for the probability that B will occur given that A has already occurred. The vertical line means, given that. Continuing on with the earlier example, assume that you discover that the CD players aren t evenly distributed among the rental agency s 300 cars. Of the green cars, 33.3% have CD players, 1% of the blue cars have CD players, and 5% of the yellow cars have CD players. 1

13 Car Opinions Green (G) Blue(B) Yellow(Y) Total Has a CD player Has no CD player Total % with CD players 45/135 = /15 =.1 10/40 =.5 70/300 =.33 Since you now know the car color and the presence of a CD player are not independent, you want to know the conditional probability of getting a green car and a CD player. Now you must consider the conditional probabilities. The presence of a CD player is contingent on the color of the car chosen. P(CD G) states: given that the car is green, the probability of it having a CD player is (in this case 33.3). P(G and CD) = (P(G)) * (P(CD G)) P(G) = 135/300(there are 135 green cars among the 300 cars) =.45 P(CD G) = 45/135 (45 of the 135 green cars have CD players) =.333 P(G and CD) = (P(Green)) * (P(CD Green)) = (135/300)(45/135) =.15 The probability of getting a green car OR any color car with a CD player is: P(G or CD) = P(G) + P(CD) P(G and CD) = = To summarize, the general rule of addition states that the probability of event A or B happening is: P(A or B) = P(A) + P(B) P(A and B) If A and B are independent events then P (A and B) = P(A) * P(B). This is called the special rule of multiplication. If A and B are conditional events then P(A and B) = P(A) * P(B A). This is called the general rule of multiplication. 13

14 DISCRETE PROBABILITY DISTRIBUTIONS A. Probability Distribution 1. What is a Probability Distribution? A Probability distribution is a listing of all the outcomes of an experiment and the probability associated with each of these outcomes. Example: Suppose you are interested in determining the number of heads you would get on two tosses of a coin. This is the experiment. The possible outcomes are: no heads, one head, or two heads. What is the probability distribution for the number of heads? Two Coin Tosses Possible result First Toss Second Toss Number of Heads 1 T T 0 T H 1 3 H T 1 4 H H Probability Distribution for the Outcomes of Zero, One, and Two Heads on Two Tosses of a Coin. Number of heads (X) Probability of outcome P(X) 0 1/4 =.5 1 /4 =.50 1/4 =.5 Total 4/4 = 1.00 Note: The following are three important characteristics of a probability distribution: a. The probability of an outcome is always between 0 (no chance) and 1(100% chance). b. The sum of the probabilities of all (mutually exclusive) outcomes in c. You can add probabilities together. The probability of getting at least one head (1 or heads) is =.75. This is called a cumulative probability. 14

15 . Continuous Random Variables: A random variable is a quantity resulting from a random experiment that, by chance, can assume different values. A discrete random variable is a variable that can assume only certain clearly separated values resulting from a count of some item of interest. When a variable can assume one of an infinitely large number of values, the variable is called a continuous random variable. When you organize a set of discrete random variables into a probability distribution, the distribution is called a discrete probability distribution. The binomial probability distribution is an example of a discrete probability distribution. If you organize a set of continuous random variables in a probability distribution, the distribution is called a continuous probability distribution. The normal probability distribution is an example of a continuous probability distribution. 3. The Mean, Variance, and Standard Deviation of a Probability Distribution Mean: The mean of a discrete probability distribution is given by: µ = Σ[XP(X)] Example: Kelly Smith sells TVs for Big Mart. She has established the following probability distribution for the number of TVs she expects to sell on a particular Sunday. Number of TVs sold X Probability P(X) Total 1.00 a. What type of distribution is this? This is an example of a discrete probability distribution. Note that Kelly expects to sell only within a certain ranger of TVs. She does not expect to sell 5 or more TVs. Further, she cannot sell half a TV. She will only sell 0, 1,, 3, or 4 TVs. Also, the outcomes are mutually exclusive. She cannot sell a total of both 3 and 4 RVs on the same Sunday. The probability of Kelly selling no more than TV sets is ( ) = 60% The probability of Kelly selling at least sets is ( ) = 70% b. On a typical Sunday, how many TVs should Kelly expect to sell? The mean number of TVs sold 15

16 is computed by weighting the number of TVs sold by the probability of selling that number and totaling the products using the formula: µ = [ XP( X )] = 0(.10) + 1.(.0) + (.30) + 3(.30) + 4(.10) =.1 Number of TVs sold(x) Probability P(X) (X)(P(X)) P (X ) = 1.00 XP (X ) =.10 The mean indicates that over a large number of Sundays, Kelly expects to sell, on average,.1 TVs every Sunday. c. What is the variance of the distribution? What is the standard deviation? The variance of a discrete probability distribution is ( X µ ) P( X ) Number of TVs sold (X) Probability P(X) ( µ) X ( X µ) ( X µ ) P( X ) Variance = ( X µ ) P( X ) = σ =1.90 The standard deviation is the square root of the variance: σ = 1. 9 = TVs. 16

17 NORMAL PROBABILITY DISTRIBUTION A. Normal Probability Distributions Normal probability distributions and normal curves have the following characteristics: 1. The normal curve is bell-shaped with a single peak at the exact center of the distribution. The arithmetic mean, median, and mode are equal. Half of the area under a normal curve lies above the mean and half below the mean. The normal curve can be completely defined by its mean and standard deviation.. The normal probability distribution is symmetrical about its mean. 3. The normal curve falls off smoothly in either direction from the central value. 4. The normal curve is asymptotic to the X-axis in both directions. This means that the tails of the distribution go to infinity in both directions as they get closer and closer to the horizontal axis. The normal curve is symmetrical. The two halves are identical. Theoretically, the curve Theoretically, the extends to curve extends to + The mean, median, and mode are equal. Normal Probability Distributions come in many sizes and shapes. For example: 17

18 1. The three normal curves below have the same mean but different standard deviations. σ A > σ > σ B C Mean A = B = C. The two normal curves below have the same standard deviations but different means. σ A = σ B Mean A < Mean B The number of different normal curves is unlimited. B. The Standard Normal Probability Distribution Even though normal curves have different sizes, they all have identical shape characteristics. So when discussing or comparing normal curves it is customary to compare them to a standardized normal curve with a mean of 0(Zero) and a standard deviation of 1. This standardized normal curve represents the standard normal distribution and is used for all problems relating to normal distributions. To standardize an observation from any given normal curve you must calculate the observation s Z value. The Z value tells you how far away the given observation is from the population mean in units of standard deviation. observation population Z = standard deviation mean = X µ σ Example: The EPS figures for a large group of firms are normally distributed with a mean of $5 and a standard deviation of $1. What is the Z-value given an EPS (X) of $6? How about $4? 18

19 Z for an X of $6 = ( X µ ) / σ = ($6 - $5)/$1 = +1 Z for an X of $4 = ( X µ ) / σ = ($4-$5)/$1 = -1 The Z of indicates that an EPS of $6 is one standard deviation above the mean, and a Z of 1.00 shows that the EPS of $4 is one standard deviation below the mean. Note that both EPSs ($6 and $4) are the same distance ($1) from the mean. C. Areas under the Normal Curve The area under a standardized curve represents probabilities. The area under the curve between two observations tells the probability of any third observation falling between these two numbers. Tables have been constructed for the standardized normal curve that tabulate these probabilities. An example of the use of the normal tables would be to answer the question, what is the probability that an event will fall between the mean and 1.5 standard deviations above the mean? This event is shown below in regular scale and standardized scale below Regular normal curve Mean = 60 and σ = 8 X = (70-60) / 8 = 1.5 Standardized normal curve Mean = 0 and σ = 1 Z = (1.5-0)/1 = 1.5 Z-scores are used to show the area under the normal curve and tabulated into tables. Verify from the table that a Z of 1.5 means that there is a 39.44% chance of an observation falling between 60 and 70. Example: Returning to the distribution of EPS figures ( µ = $ 5, σ = $1), what is the area under the normal curve between $3.40 and $7? This problem takes two steps. Fist calculate the area between $3.40 and the mean of $5: Z = ($ $5)/ $1 =

20 Now calculate the area between the mean of $5 and $7: Z = ($7 - $5)/$1 = The area under the curve for a Z of 1.60 is.445. The area under the curve for a Z of.00 is.477. Adding the two areas: =.94. Thus, the probability of observing and $3.40 $5.00 $7.00 EPS between $3.40 and $7 is.94. In other words, 9.4 percent of the EPSs are between $3.40 and $7. Example: What is the probability that an EPS figure will fall between the mean and one standard deviation in either direction from the mean? That is, what is the probability that the EPS figure will be between $6 and $4? A calculated Z of 1.00 gives.3413 in the table above. So the chance of an EPS figure being between the mean and one standard deviation from the mean is 34.13%. The chance that the EPS is within plus or minus one standard deviation of the mean is 68.5%. Knowing the probability that an observation will fall between the mean and one, two and three standard deviations from the mean is important. Also you should know what proportion of observations fall between plus or minus one, two, or three standard deviations of the mean. These figures are called confidence intervals and are used in hypothesis testing. You must remember the following probabilities and be able to verify them in the table above. 0 +Z -Z 0 +Z 34% of the area falls between 0 and +1 standard deviation from the mean. So, 68% of the observations fall within ± one standard deviation of the mean. 45% of the area falls between 0 and standard deviations from the mean. So, 90% of the observations fall within ± 1.65 standard deviations of the mean. 47.5% of the area falls between 0 and 1.96 standard deviations from the mean. So, 95% of the observations fall with ± 1.96 standard deviations of the mean. 0

21 49.5% of the area falls between 0 and.58 standard deviations from the mean. So, 99% of the observations fall within ±.58 standard deviations of the mean. Example: Returning again to the EPS figures ( µ = $5, σ = $1), what percent of the EPSs are $7.45 or more? We first find the area between the mean of $5 and an X of $ z = ( X µ ) / σ = ($7.45 $5) / $1 =. 45 The area associated with a Z of.45 is.499. This is the area between $5 and $7.45. Logically, the area for $7.45 and beyond is found by subtracting.499 from This area is.0071, indicating that only 0.71 percent of the EPS figures are $7.45 or more. Example: What percent of the EPS figures is less than $3.50? The problem is again separated into two parts. z = ($3.50 -$5)/$1 = 1.50 The area associated with a Z value of 1.50 is.433,.0668 so the probability of an EPS between $5 and $3.50 is Thus the probability of an EPS less than $3.50 is = $3.50 $5.00 Example: What percent of the EPS figures is greater than $3.50? Again this is a two-part problem. You add the.433 for.933 the EPS figures between $3.50 and $5 to.50, which represents the probability of EPS values above $5. The percent of EPSs above $3.50 is or 93.3%. $3.50 $5.00 1

22 SAMPLING METHODS AND SAMPLING DISTRIBUTIONS It is not always feasible to study the entire population. Hence a small sub-group of the population, called a sample, is drawn from the population. Based upon this sample, conclusions can be drawn about the entire population. A. What is a Probability Sample? A probability sample is one selected in such a way that each item or person in the population being studied has the same (non-zero) likelihood of being included in the sample. If all the members of the population do not have an equal chance of being included in the sample it is a nonprobability sample and may be biased. A sample is biased if the sample may not be truly representative of the population. Simple Random Sampling is where the observations are drawn randomly from the population. In a random sample, each observation must have the same chance of being drawn from the population. This is the standard sampling design. For example, assume you want to draw a sample of 5 numbers in a hat, and shake it up. Next, draw one number randomly from the hat. Repeat this process (experiment) four more times. The five drawn numbers (items) comprise a simple random sample from the population. An easier way to select a simple random sample is to use a random number table. You just pick the numbers from the table rather than shaking the hat and drawing. B. Sampling Error Sampling Error is the difference between a sample statistic (the mean and variance, and the standard deviation of the sample) and its corresponding population parameter (the mean and variance and the standard deviation of the population). Sampling error = sample mean population mean = X µ C. Sampling Distribution To learn the relationship between a population s statistics and its corresponding sample statistics requires some testing. To do this, a sampling distribution of the sample means is drawn. The sampling distribution of the sample means is a probability distribution made up of all possible sample means (of the same sample size) selected from a population along with their associated probabilities. In other words,

23 you create a distribution where the reported observations in the distribution are sample means (the average price of a sample of 5 stocks) rather than the individual observations (the stock s actual price). You should know that the mean of the sample means is always equal to the mean of the population: µ = µ This is always true because the sampling distribution contains all possible sample means population. samples of a given size selected from the population. D. Central Limit Theorem The central theorem tells us that for a population with a mean µ and a variance, the sampling distribution of the sample means of all possible samples of size n will be approximately normally distributed with a mean equal to µ and a variance equal to σ /n. (Note this assumes a large sample size, n 30). σ So you should know: 1. If the sample size n is sufficiently large, the sampling distribution of the sample means will be approximately normal.. The mean of the population, µ, and the mean of all possible sample means, µ x,are equal. 3. The variance of the distribution of sample means is σ /n. E. Point Estimates and Interval Estimates Point estimates are single (sample) values used to estimate population parameters. The sample mean, X, is the best estimator of the population mean µ and is estimated using the formula: X x = ( X ) /n Example: Assume that you are studying the relationship between stock price and investment returns for small firms. To estimate the mean stock price for small sized firms you draw a sample of 40 randomly selected firms and present them in the table below. Given the following sample of 40 stock prices:

24 What is the best estimate of the population s mean stock price? The sum of the 40 observations is $600. The mean stock price of $15 is found by: X = ( X ) /n = $600 / 40 = $15 Once you have the point estimate, you need to calculate the interval estimate. The interval estimate states the range within which a population parameter will probably fall. In interval estimation, this interval is constructed around the point estimate, X. The presumption is that this interval will likely contain the population parameter, µ. The interval within which a population parameter is expected to fall is called the confidence interval. You would expect with a confidence level of 95% that the population mean ( µ ) will fall within 1.96 standard deviations from the point estimate, the sample mean ( X ). To test this, you need to calculate the standard error of the sampling means. ± F. Standard Error of the Sample Means Standard Error of the Sample Means is the standard deviation of the distribution of the sample means. The standard error of the sample means when the standard deviation of the population is known is calculated by: σ σ x = n where: σ x = the standard error of the sample means σ = the standard deviation of the population n = the size of the sample If you don t know the population s standard deviation you can estimate the standard error of the sample means by dividing the standard deviation of you sample by working with a large sample size ( n 30). s S x = n n. This estimate assumes that you are where: S x = the standard error of the sample means based on statistics. s = the standard deviation of the sample. n = the size of the sample (assuming a large sample size.) 4

25 Example: The mean hourly wage for Iowa farm workers is $13.50 with a standard deviation of $.90. Let x be the mean wage per hour for a random sample of Iowa farm workers. Find the mean and standard error of the sample means, x, for a sample size of: a. 30; b. 75; and c The mean µ x of the sampling distribution of x is: µ = µ = $13.50 x Since σ is known, the standard error of the sample means is: σ.9 σ x = = = n 30 $.53 In conclusion, if you were to take all possible samples of size 30 from the Iowa farm worker population and prepare a sampling distribution of the sample means you will get a mean of $13.50 and standard error of $.53.. When the sample size is 75, the mean and standard error of the sample means is: µ = µ = $13.50 x σ.9 σ x = = = n 75 $ When the sample size is 00, the mean and standard error of the sample means is: µ = µ = $13.50 x σ.9 σ x = = = n 00 $.1 Conclusion: From the calculations above you should observe that the mean of the sampling distribution of x is always equal to the mean of the population whatever the size of the sample. However, the value of the standard error (standard deviation) of the sampling means decreases from $.53 to $.33 and then to $.1 as the sample size increases from 30 to 75 and then to 00. Why? As the sample size gets larger (and thus the size of your sample approaches the size of the entire population) the distribution of the sample means about the population mean gets smaller and smaller causing the 5

26 standard error of the sample means to get smaller. µ sample = µ population sample µ population µ = n = 30 n = 00 Distribution of Sample Means G. Constructing the 95 Percent and the 99 Percent Confidence Intervals For large samples (where n 30) the population parameters are: 95 percent confidence interval: X ± 1.96( s ) n 99 percent confidence interval: X ±.58( s ) n In general, a confidence interval for the mean is computed by: X ± ( Z)( s ) n Where Z reflects the statistic associated with the level of confidence, that is 1.96 or.58. Example: You select at random 81 earnings reports in the Widget Industry. You are interested in the EPS figure for the entire widget population. The sample mean EPS is computed to be $3.50 and the sample standard deviation is $ What is the estimated mean EPS in the widget industry (the population)?. What is the 95 percent confidence interval? 3. What are the 95 percent confidence limits? 4. Interpret your findings. 6

27 Solution: 1. The point estimate of the population mean is $ The confidence interval is between $3.11 and $3.89, found by: X ± 1.96( s ) $3.50 ± 1.96($1.80 / n 81) 3.50 ± and $3.89 The endpoints are frequently rounded and, in this case, would be recorded as $3.11 and $ The endpoints of the confidence interval are called the confidence limits. In this example $3.11 and $3.89 are the confidence limits. 4. Interpretation: If you had time to select 100 samples of size 81 from the population and compute the sample means and confidence intervals, the population mean EPS would be found within the confidence intervals about 95 out of the 100 times. The interval either contains the population mean or it does not. About 5 out of the 100 confidence intervals would not contain the population mean EPS, µ. 7

28 Binomial Probability Distribution 1. The binomial probability distribution is a discrete probability distribution. Characteristics of the binomial distribution are: a. There are only two possible outcomes of each trial. They are mutually exclusive. For example, in a coin toss the outcome is either heads or tails. Mutually exclusive means you cannot get heads and tails at the same time. b. The random variable is the result of counts. It s the number of successes out of the total number of trials you are looking for. For example, flip a coin 10 times and count the number of times heads appears. c. The probability of success remains constant from one trial to another. For example, no matter how many times you flips a coin the probability of getting a head is still 50% on each flip of the coin. d. The trials are independent of each other. This means that one trial does not affect the outcome of any other trial. The binomial probability distribution can be described using the formula: where: n x p (1-p) P(x) = n! x!(n is the number of trials. p x)! x (1 p) is the number of observed successes. n x is the probability of success on each trial. is the probability of a failure. This equation uses n! (n factorial). What you need to know about factorials is: n! stands for (n)(n-1)(n-) and so on. 3! = (3)()(1) = 6 5! = (5)(4)(3)()(1) =10 You also need to know that 0! = 1. Example: Five percent of all VCRs manufactured by a large electronics firm are defective. A quality control inspector randomly selects three VCRs from the production line. What is the probability that exactly one of these VCRs is defective? 8

29 Call the selection of a defective VCR a success and the selection of a good VCR a failure. The reason for calling a defective VCR a success is that the question asks for the probability of selecting exactly one defective VCR. The definitions of the terms are: n = total number of trials = 3 VCRs x = number of successes = number of defective VCRs = 1 n-x = number of failures = number of good VCRs = 3-1 = p = P(success) =.05 so (1-p) = P(failure) =.95 P(1) = [n!/(x!(n-x)!)](p x )((1-p) n-x ) = [((3)()(1)) / (1)(()(1))](.05) 1 (.95) P(1) = (3)(.05)(.905) = Using Binomial Probability Tables: For problems involving small n (say when n = 3 or 4) the probabilities of successes can be quickly generated mathematically using the earlier formula. However, for large n s the math can be tedious. Tables giving the probability for the binomial distribution for different n-sizes can be used in those cases. The n = 6 table is provided below. Binomial Probabilities for n = 6 Note: x = the # of observed successes. p = the prob. of success. x/p Example: Suppose you toss a coin six times. What is the probability of zero heads? Exactly one heads? Exactly two heads? The conditions for a binomial distribution are met: There is constant probability of success (call head a success) p = 50% = 0.5. There are a fixed number of trials (n = 6). The trials are independent, and There are only two possible outcomes (heads or tails). 9

30 Refer to the table under column 0.5. The probabilities for the different x s are: P(0) = P(1) = P() = 0.34 The mean and variance of a binomial distribution are given by: μ = np σ = np(1-p) Example continued: The mean μ = up = (6)(0.5) = 3, and The variance σ = np(1-p) = (6)(.5)(.5) = 1.5. The standard deviation σ = 1. 5 =1. If you assumed that the binomial distribution is symmetrical, then the empirical rule shows that 95 percent of all the trial observations will lie within + / - standard deviations of the mean. Cumulative Probability Distributions What if you want to find the probably of multiple outcomes in the coin flipping example? Example: You want to find the probability of observing 5 or fewer heads in 6 tosses. In this case, you are interested in finding the cumulative probability. The probability of getting 5 or less heads given 6 coin tosses is written P(x 5 n = 6). P(x 5) = P(x = 0) + P(x = 1) + P(x = ) + P(x = 3) + P(x = 4) + P(x = 5) P(x 5) = = Or using the complimentary rule: P(x 5) = 1 P(x > 5) = 1 P(x = 6) = = (Note the slight rounding error.) So what is a cumulative probability distribution? it is the sum of the individual event probabilities. In the binomial the calculated probability is for a specific outcome, like getting 5 successes. If you wanted to know the probability of getting a range of successes you must cumulate (add up) the individual probabilities. Examples: 30

31 What is the probability of getting exactly 5 heads? This is the basic binomial table figure. What is the probability of getting 4 or less? Sum the probability of getting 0, 1,, 3 or 4. What is the probability of getting 3 or more? Sum the probability of getting 3, 4, 5 or 6. What is the probability of getting more than 3? Sum the probability of getting 4, 5 or 6. 31

32 TESTS OF HYPOTHESIS (A) Introduction The purpose of this chapter is to discuss another aspect of statistical inference, hypothesis testing. (B) What is a Hypothesis? A hypothesis is a statement about the value of a population parameter developed for the purpose of testing. Some examples of hypotheses are: The mean monthly income for financial analysts is $5,65. The mean average return for an index fund is 14%. Ninety percent of all federal income tax forms are filled out correctly. (C) What is Hypothesis Testing? Hypothesis Testing is a procedure based on evidence from samples and probability theory used to determine whether a hypothesis is a reasonable statement and should not be rejected, or is an unreasonable statement and should be rejected. (D) Testing a Hypothesis The five-step procedure for testing a hypothesis: Step 1: Write the Null Hypothesis and the Alternate Hypothesis The null hypothesis (H 0 ) is a statement about the value of a population parameter. The alternative hypothesis or research (H 1 ) is the statement that will be accepted if the sample data provides evidence repudiating the null hypothesis. You should remember that it is usually the alternative hypothesis that you are really trying to support. Why? Since you can never really prove anything with statistics, when you discredit the null hypothesis you are implying that the alternative is valid. Example: A person is being tried in court. Based on the available evidence, the jury must make one of two decisions: guilty or not guilty. At the beginning of the trial the person is assumed to be not guilty. In statistics term, the assumption that the person is not guilty is called the null hypothesis; the alternate hypothesis would be that the person is guilty. 3

33 Step : Select the Level of Significance The level of significance is defined as the probability of rejecting the null hypothesis when it is actually true. The significance level is called the level of risk or α and usually set at 0.05 (5%) or 0.01 (1%). A Type I Error is defined as rejecting the null hypothesis, H 0, when it is actually true. The probability of committing a Type I Error is the risk level or alpha risk. A Type II Error is defined as accepting the null hypothesis when it is actually false. The chance of making a Type II error is called beta risk. The following table summarizes the possible jury verdicts discussed above: THE JURY VERDICT The hypothesis revealed Accepts H 0 : Find not guilty Rejects H 0 : Find guilty H 0 is true: The defendant is not guilty H 0 is false: The defendant is guilty Correct decision Release a guilty person. Type II error Convict an innocent person Type I error Correct decision A test with one rejection region is called a one-tailed test and a test with two rejection regions is called a two-tailed test. In general, a test is one-tailed when the alternate hypothesis states a direction like greater than or less than. The two-tailed test is usually stated as not being equal to some value. A one-tailed test: H 0 : MBA starting salaries are equal to or greater than ( ) $75,000. H 1 : MBA starting salaries are less than (<) $75,000. A two-tailed test: H 0 : The return on the portfolio equals (=) 1%. H 1 : The return of the portfolio does not equal ( ) 1% Step 3: Calculate the test Statistic The test statistic, Z, tells you how far the sample mean, X, is away from the center of the distribution in standard deviation units. Z is used to determine whether or not to reject the null hypothesis. For large samples (30 or more observations), the test statistic is computed with the following formula: 33

34 X µ Z = σ n Step 4: Establish the Decision Rule The decision rule states when to reject the null hypothesis. It indicates how many standard deviation units (Z) a sample mean, X, has to be away from the center of the distribution before it is considered not part of the distribution. Step 5: Make the Decision The final step is to decide whether or not to reject the null hypothesis. The decision will depend on the test of significance. If the calculated Z statistic is greater than the decision rule Z statistic, then the sample mean, X, is significantly far away from the center of the distribution and thus isn t associated with the distribution. So, reject the null hypothesis and accept the alternative hypothesis when the calculated test Z statistic exceeds the critical decision rule Z statistic. (E) Examples of the Five-step Procedure Example 1 When the gizmo machine is working properly, the mean length of gizmos is.5 inches. However, from time to time the machine gets out of alignment and produces gizmos that are either too long or too short. When this happens, production is stopped and the machine is adjusted. To check the machine, the quality control department takes a gizmo sample each day. Today a random sample of 49 gizmos showed a mean length of.49 inches. The population standard deviation is.01 inches. Using a 5% significance level, should the machine be shut down and adjusted? Let μ be the mean length of all gizmos made by this machine and X the corresponding mean for the sample. From the given information, Step 1: State the Null and Alternative Hypothesis H 0 : μ =.5 (The machine does not need an adjustment) H 1 : μ.5 (The machine needs an adjustment) This is a two-tailed test. Step : Set the Significance Level You are willing to make a Type I error 5% of the time so the level of significance is.05. The sign in the alternative hypothesis indicates that the test is two-tailed with two rejection regions, one in each tail of the normal distribution curve. Because the total area of both rejection regions is.05 34

35 (the significance level), the area of the rejection region in each tail is.05. Step 3: Calculate the Test Statistic The value of X from the sample is.49. Since σ is given as.01, we calculate the Z test statistic using σ as follows. Z = ( X µ )/( σ n ) = (.49.5) /(.01/ 49) =.01/.003 = Step 4: State the Decision Rule The calculated value of Z = is called the computed or observed test statistic Z. This Z value indicates the location of the observed sample mean relative to the population mean (3.33 size adjusted standard deviations to the left of the mean). You must now compare the computed Z to the critical Z value found in the normal curve table found in Chapter 7. Since this is a two-tailed test with a significance level of.05, you are expected to know that the table value is ±1.96 size adjusted standard deviations from the mean. This means that the null hypothesis should be accepted if the computed Z value lies between 1.96 and and rejected if it lies outside of these critical values. Step 5: Make the Decision The decision is based on the location of the calculated test statistic, Z, computed in Step 3. This value, Z = -3.33, is less than the critical value, Z = -1.96, and it falls in the rejection region in the left tail. Hence, reject H 0 and conclude that based on the sample information the machine is out of adjustment. Example You want to know if small capitalization stock performance equals or exceeds the Market Index. You take a sample of 64 small cap stocks and get a mean return of 1%. During the same time period the population mean of the stocks in the index is 11% with a standard deviation of 4%. Your desired level of significance is.01. Step 1: State the Null and Alternative Hypothesis H 0 : Small cap returns = or > 11% H 1 : Small cap returns < 11% This is a one tailed test. Critical Z values 1 Tail Tail Step : 1% Set the Significance Level ± ± ±.58 35

Basic Procedure for Histograms

Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that