DESCRIBING DATA: MESURES OF LOCATION

Size: px
Start display at page:

Download "DESCRIBING DATA: MESURES OF LOCATION"

Transcription

1 DESCRIBING DATA: MESURES OF LOCATION A. Measures of Central Tendency Measures of Central Tendency are used to pinpoint the center or average of a data set which can then be used to represent the typical or representative data value. 1. The population mean ( µ ), A population is the entire group of objects that are being studied. To find the population s, mean sum up all the observed values in the population ( X ) and divide this sum by the number of observations (N) in the population. µ = X / N Any measurable characteristic of the population, such as the mean, is called a parameter. is commonly called the data s arithmetic mean or just plain mean. X / N. The Sample Mean (X). The sample mean is the sum of all the values in the sample divided by the number of values in the sample. The sample mean is used to make inferences about the population mean. X = X n The sample mean, or any other measure based on sample data, is called a statistic. Note: (n) is the sample size while (N) is the population size. Example: There are 1 widget makers in the UK. The year 000 sales for each widget maker are a follows (all data are in millions of British pounds): 1, 5, 34, 15, 19, 44, 54, 33,, 8, 17, 4. Your research partner is exceedingly lazy and has decided to collect data on only five firms in the industry. Given this data, calculate the population mean and calculate the sample mean (your partner s data set is shown as the bold sales numbers in the original data series above.) µ = Population Mean = =

2 3. Properties of the arithmetic mean. The arithmetic mean is the sum of the observations divided by the number of observations. It is the most widely used measure of central tendency and has the following properties: a. All interval and ratio data sets have an arithmetic mean. b. All data values are considered and included in the arithmetic mean computation. c. A data set has only one arithmetic mean. This says that the mean is unique. d. The arithmetic mean is a useful measure for comparing two or more populations. e. The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. That is: The sum of the deviations from the mean = ( X X ) = 0 Example: A data set contains the following numbers: 5, 9, 4, and 10. The mean of these numbers is: X = ( ) / 4 = 7 The sum of the deviations from the mean is: ( X X ) = (5-7) + (9-7) + (4-7) + (10-7) = = 0 The arithmetic mean has the following disadvantages: a. The mean can be affected by extremes, that is, unusually large or small values. b. The mean cannot be determined for an open-ended data set. 4. The median is the mid-point of the data when the data is arranged from the largest to the smallest values. Half the observations are above the median and half are below the median. To determine the median, arrange the data from the highest to the lowest (or lowest to highest) and find the middle observation. The median is important when the arithmetic mean is affected by extremely large or small values. When this occurs, the median is a better measure of central tendency than the mean. Properties of the Median: The median, like the mean, is unique. The median is not affected by extreme values.

3 The median can be computed for an open-ended frequency distribution as long as the median does not lie in an open-ended class. Example: The five-year annualized total returns for five investment managers and 30%, 15%, 5%, 1% and 3%. Find the median return for the managers. First, rearrange the returns from highest to lowest: 30% 5% 3% 1% 15% The return observation half way down from the top or half way up from the bottom is 3% so the median return is 3% Example: Now there is a sixth manager with a return of 8%. What is the median return? Rearranging the returns gives: 30% 8% 5% 3% 1% 15% Here, the number of observations is even. Hence there is no single middle observation. To find the median, take the arithmetic mean of the two middle observations: (5+3)/. Thus, the median of the data set is 4.0%. 5. The mode of a data set is the value of the observation that appears most frequently. Advantages of using the mode: a. The mode can be used for all types of data nominal, ordinal, interval, and ratio. b. The mode is not affected by extremely large or small values. c. The mode can also be used to measure open-ended data sets. Disadvantages of using the mode: a. For many data sets there may be no value that appears more than once. b. On the other hand, some data sets have more than one mode. When two dominant numbers appear in equal proportion in a distribution, the distribution is said to be bimodal. In this case the distribution would look like: 6. Selecting an Average for Data in a Frequency Distribution. A distribution is symmetrical and bellshaped when the distribution has the same shape on either side of its center axis. When the distribution is symmetrical the mean, median and mode are located at the center of the distribution and are always equal. 3

4 Mean = median =mode A distribution is nonsymmetrical (skewed) when there are larger numbers on one side of the distribution than the other side. When a distribution is positively skewed the right side tail is longer than normal due to outliers. The mean will exceed the median, and the median will generally exceed the mode. Why? Large outliers falling to the far right side of the distribution can dramatically influence the mean. For negatively skewed distributions tailing off to the left, the mean will be smaller than the median, and the median will generally be smaller than the mode because of the influence of the few extremely small observations on the left. Skewness measures the asymmetry of a distribution. A normal distribution has a skewness equal to zero because it is symmetric. Non-normal distributions that have an elongated right tail are said to be skewed to the right, or positively skewed. Distributions that have an elongated left tail, are said to be skewed to the left or negatively skewed. A lognormal distribution is an example of a positively skewed distribution. Conclusion: When distribution is skewed, the mean should not be used to represent the data set. Kurtosis measures the amount of peakedness. Normal distributions have a kurtosis equal to three. Non-normal distributions with a high peak are leptokurtic (kurtosis > 3) and distributions with a flat peak are said to be platykurtic (kurtosis < 3). 4

5 DESCRIBING DATA: MEASURES OF DISPERSION A. Why Study Dispersion? 1. Measures of central tendency, such as the mean or median, pinpoint the center of the data set but don t explain anything about the variability of the individual observations within the data set.. Studying dispersion enables the analyst to better compare data sets. B. Measures of Dispersion for Ungrouped Data 1. The range is the distance between the largest and the smallest value in the data set. Range = Highest Value Lowest Value Example: The five-year annualized total returns for five investment managers are 30%, 1%, 5%, 0% and 3%. What is the range of the data? Range = 30 1 = 18%. The mean deviation is the average of the absolute values of the deviations from the arithmetic mean. MD = X X n You should remember that the sum of all the deviations from the mean is equal to zero. To get around this zeroing out problem, the mean deviation uses the absolute values of each deviation. For this reason, the mean deviation is also called the mean absolute deviation or MAD. Example continued: What is the mean deviation of investment returns and how is it interpreted? X = [ ]/5 = % MD = [ ]/5 MD = [ ]/5 = 4.8% On average, an individual return will deviate 4.8% from the mean return of %. 5

6 3. The variance and standard deviation are measures of dispersion based on the deviations from the mean. The variance is defined as the mean of the squared deviations from the mean while the standard deviation is the positive square root of the variance. Population Variance: The population variance for ungrouped data is given by: ( X µ ) σ = N Or in computational form: X X σ = Note that ( X N ) N N / is just the mean squared. Example continued: Find the population variance of the five investment manager s returns (30%,1%, 5%, 0%, and 3%). µ = [30% + 1% + 5% + 0% + 3%]/ 5 = percent σ = [(30 ) σ = 178 / 5 = (1 ) + (5 ) + (0 ) + (3 ) ]/ 5 Using the alternative computational formula: σ = [( σ = 598/5 484 = ) / 5 () Note: The major problem with using the variance is the difficulty interpreting it. Why? The variance, unlike the mean, is in terms of units squared. How does one interpret square percents or square dollars? The solution to this dilemma is to use the standard deviation. ] 4. The population standard deviation is the positive square root of the population variance and is found by: σ = ( X µ) N Continuing with the example, σ = = 5.97%. 6

7 5. The sample variance differs from the population variance. The population mean, X / N, equals the sample mean, X / n. Unfortunately this is not the case for the variance. The sample variance is calculated using the following formulas: S = ( X X ) n 1 or in computational form: S ( X ) X = n n 1 The difference between the calculation of the population and sample variances is in the denominator of the equation. In the math of statistics, using only N in the denominator when using a sample to represent its population will result in underestimating the population variance, especially for small sample sizes. This systematic understatement causes the sample variance to be a biased estimator of the population variance. By using (n-1) instead of N in the denominator, we compensate for this underestimation. Thus, by using n-1, the sample variance ( s population variance ( σ ). ) will be an unbiased estimator of the 6. The sample standard deviation for ungrouped data is given by: ( X X ) s = = n 1 X ( X ) n n 1 Example continued: You are now told that the five managers are just a sample of all the firm s investment managers. What is the data s sample variance and sample standard deviation? X = X / n( ) / 5 = 110 / 5 = s = ( X X ) /( n 1) s s = = [(30 ) 44.5 = 6.67% + (1 ) + (5 ) + (0 ) + (3 ) ]/ 4 = 178 / 4 =

8 C. Interpretation and Use of the Standard Deviation 1. Chebyshev s Theorem states that for any set of observations (sample or population, regardless of the shape of the distribution) the minimum proportion of observations falling within k standard deviations of the distribution mean is 1-1/ k has to be greater than 1.. The number of standard deviation (k) in the equation Example: What approximate percent of a distribution will lie within +/-two standard deviations of the mean? From Chebyshev s Theorem: 1 (1/ k ) = 1 (1/ ) = or 75% Thus, Chebyshev s Theorem states that for any distribution, approximately: 75% of observations lie within +/- standard deviations of the mean 88.9% of observations lie within +/-3 standard deviations of the mean 93.75% of observations lie within +/-4 standard deviations of the mean 96% of observations lie within +/-5 standard deviations of the mean 8

9 A SURVEY Of PROBABILITY CONCEPTS A. What is a Probability? A probability is the chance that something will happen. It measures the likelihood that an event will happen in the future. A probability can only have a value between 0 (no chance at all) and 1 (one hundred percent chance.) Some other concepts/key words: 1. An experiment is an observation of some activity or the act of taking a measurement.. An outcome is a particular or possible result of an experiment. 3. An event is the collection of one or more actual outcomes from an experiment. Example: Suppose you roll a single die. That would be an experiment. There are six possible outcomes from this experiment (you could roll a 1,,3,4,5, or 6). The outcome that actually occurs (like rolling a 4) is an event. B. Approaches to Determining Probabilities There are two objective approaches to determining probabilities (classical and empirical) and the subjective approach to setting probabilities. 1. Classical Probability is based on the assumption that the possible outcomes of an experiment are equally likely. This method is presumptive, thus it is called a priori. Probability of an event [P(A)] = Number of Total number favorable outcomes of possible outcomes Where P is the probability of an event occurring and A is the specific event. Example: What is the probability of rolling a 4 with the throw of a six sided die? Probability of observing a four = Number of Total number favorable outcomes 1 = of possible outcomes 6 The term mutually exclusive means the occurrence of any one event precludes the occurrence of another event occurring at the same time. The term collectively exhaustive means that at least one 9

10 of the possible events must occur when an experiment is conducted.. Empirical probabilities are based on the relative frequency of events or the number of times an event happened during similar circumstances in the past. Since this approach is based on actual observations it is called a posteriori. P(A) = Number of The times an event occurred in the total number of observations past Example: Ten of the 500 randomly selected cars manufactured at a certain auto factory are found to be lemons. Assuming that the lemons are manufactured randomly, what is the probability that the next car manufactured is a lemon? P(A) = 10 / 500 =.0 or % Where P stands for probability and A is the outcome that the next car is a lemon. 3. A subjective probability is the probability that a particular event will happen based on a subjective evaluation of all the available information. An educated guess. An example of a subjective probability is guessing that there is a 5% chance that the Iowa Hawkeye football team will play in the Rose Bowl next year. C. The Basic Rules of Probability Here we will deal with multiple events. 1. The special rule of addition is applicable only to mutually exclusive events. If two or more events are mutually exclusive then: P(A or B) = The probability of A or B happening = P(A) + P(B) Or, in general terms: P(A or B or C.) = P(A) + P(B) + P(C) Example: You go to rent a car and are told that the agency only has cars in three colors (green, blue and yellow). 10

11 Car Opinions Green (G) Blue (B) Yellow (Y) Total Total What is the probability that a randomly selected car from this group of 300 cars is green or yellow? Let s define the following events. G = the car selected is green. Y = the car selected is yellow. From the given information: P(G) = 135/300 =.450 P(Y) = 40/300 =.133 Hence: P(G or Y) = P(G) = =.583 The complement rule states that when events A and B are mutually exclusive they are by definition non-overlapping. So if A happens then B can t happen. This can also be restated by saying that if the probability of either A or B happening is one, then the probability of A happening plus A not happening is 1. P(A happening) + P(A not happening) = 1 So, P(A not happening) = 1 P(A happening) Example continued: What is the probability that a randomly selected car will be blue? P(B) + P(G or Y) = 1 So, P(B) = 1-P(G or Y) = =.417 This is also 15/300 =.417. The general rule of addition states that if two events, A and B, are NOT mutually exclusive then you must account for the joint probability of events. That is the possibility that the two events will occur at exactly the same time. Joint probability is shown by the overlap of the occurrence circles in the traditional Venn Diagram shown below to the right. P(A or B) = P(A) + P(B) P(A and B) Where P(A and B) is the joint probability of A and B. The joint probability [P(A and B)] is defined as the probability that measures the likelihood that two 11

12 or more events will happen concurrently. P(A and B) = P(A) * P(B) for independent events; or P(A and B) = P(A) * P(B given that A occurs) for conditional events. 3. The special rule of multiplication requires that two events A and B are independent. Two events are independent if the occurrence of one event has no effect on the occurrence of the other event. For two independent events A and B the probability that A and B will both occur is given by: P(A and B) = P(A) * P(B) Example continued: Suppose you were told that 3.3% of the rental agency s cars had CD Players. And, that the occurrence of a CD player is independent of the car s color. The joint probability of getting a green car and a CD player is: P(G and CD) = P(G) * P(CD) = (.45)(.33) =.105 The probability of getting a green car OR a car with a CD player is: P(G or CD) = P(G) + P(CD) P(G and CD) = (.105) =.578. A conditional probability is the probability of a particular event occurring given that another event has occurred. 4. The general rule of multiplication is used to find the joint probability when one event (B) is conditional on the occurrence of another event (A). P(A and B) = P(A) * P (B A) Where the conditional probability P(B A) stands for the probability that B will occur given that A has already occurred. The vertical line means, given that. Continuing on with the earlier example, assume that you discover that the CD players aren t evenly distributed among the rental agency s 300 cars. Of the green cars, 33.3% have CD players, 1% of the blue cars have CD players, and 5% of the yellow cars have CD players. 1

13 Car Opinions Green (G) Blue(B) Yellow(Y) Total Has a CD player Has no CD player Total % with CD players 45/135 = /15 =.1 10/40 =.5 70/300 =.33 Since you now know the car color and the presence of a CD player are not independent, you want to know the conditional probability of getting a green car and a CD player. Now you must consider the conditional probabilities. The presence of a CD player is contingent on the color of the car chosen. P(CD G) states: given that the car is green, the probability of it having a CD player is (in this case 33.3). P(G and CD) = (P(G)) * (P(CD G)) P(G) = 135/300(there are 135 green cars among the 300 cars) =.45 P(CD G) = 45/135 (45 of the 135 green cars have CD players) =.333 P(G and CD) = (P(Green)) * (P(CD Green)) = (135/300)(45/135) =.15 The probability of getting a green car OR any color car with a CD player is: P(G or CD) = P(G) + P(CD) P(G and CD) = = To summarize, the general rule of addition states that the probability of event A or B happening is: P(A or B) = P(A) + P(B) P(A and B) If A and B are independent events then P (A and B) = P(A) * P(B). This is called the special rule of multiplication. If A and B are conditional events then P(A and B) = P(A) * P(B A). This is called the general rule of multiplication. 13

14 DISCRETE PROBABILITY DISTRIBUTIONS A. Probability Distribution 1. What is a Probability Distribution? A Probability distribution is a listing of all the outcomes of an experiment and the probability associated with each of these outcomes. Example: Suppose you are interested in determining the number of heads you would get on two tosses of a coin. This is the experiment. The possible outcomes are: no heads, one head, or two heads. What is the probability distribution for the number of heads? Two Coin Tosses Possible result First Toss Second Toss Number of Heads 1 T T 0 T H 1 3 H T 1 4 H H Probability Distribution for the Outcomes of Zero, One, and Two Heads on Two Tosses of a Coin. Number of heads (X) Probability of outcome P(X) 0 1/4 =.5 1 /4 =.50 1/4 =.5 Total 4/4 = 1.00 Note: The following are three important characteristics of a probability distribution: a. The probability of an outcome is always between 0 (no chance) and 1(100% chance). b. The sum of the probabilities of all (mutually exclusive) outcomes in c. You can add probabilities together. The probability of getting at least one head (1 or heads) is =.75. This is called a cumulative probability. 14

15 . Continuous Random Variables: A random variable is a quantity resulting from a random experiment that, by chance, can assume different values. A discrete random variable is a variable that can assume only certain clearly separated values resulting from a count of some item of interest. When a variable can assume one of an infinitely large number of values, the variable is called a continuous random variable. When you organize a set of discrete random variables into a probability distribution, the distribution is called a discrete probability distribution. The binomial probability distribution is an example of a discrete probability distribution. If you organize a set of continuous random variables in a probability distribution, the distribution is called a continuous probability distribution. The normal probability distribution is an example of a continuous probability distribution. 3. The Mean, Variance, and Standard Deviation of a Probability Distribution Mean: The mean of a discrete probability distribution is given by: µ = Σ[XP(X)] Example: Kelly Smith sells TVs for Big Mart. She has established the following probability distribution for the number of TVs she expects to sell on a particular Sunday. Number of TVs sold X Probability P(X) Total 1.00 a. What type of distribution is this? This is an example of a discrete probability distribution. Note that Kelly expects to sell only within a certain ranger of TVs. She does not expect to sell 5 or more TVs. Further, she cannot sell half a TV. She will only sell 0, 1,, 3, or 4 TVs. Also, the outcomes are mutually exclusive. She cannot sell a total of both 3 and 4 RVs on the same Sunday. The probability of Kelly selling no more than TV sets is ( ) = 60% The probability of Kelly selling at least sets is ( ) = 70% b. On a typical Sunday, how many TVs should Kelly expect to sell? The mean number of TVs sold 15

16 is computed by weighting the number of TVs sold by the probability of selling that number and totaling the products using the formula: µ = [ XP( X )] = 0(.10) + 1.(.0) + (.30) + 3(.30) + 4(.10) =.1 Number of TVs sold(x) Probability P(X) (X)(P(X)) P (X ) = 1.00 XP (X ) =.10 The mean indicates that over a large number of Sundays, Kelly expects to sell, on average,.1 TVs every Sunday. c. What is the variance of the distribution? What is the standard deviation? The variance of a discrete probability distribution is ( X µ ) P( X ) Number of TVs sold (X) Probability P(X) ( µ) X ( X µ) ( X µ ) P( X ) Variance = ( X µ ) P( X ) = σ =1.90 The standard deviation is the square root of the variance: σ = 1. 9 = TVs. 16

17 NORMAL PROBABILITY DISTRIBUTION A. Normal Probability Distributions Normal probability distributions and normal curves have the following characteristics: 1. The normal curve is bell-shaped with a single peak at the exact center of the distribution. The arithmetic mean, median, and mode are equal. Half of the area under a normal curve lies above the mean and half below the mean. The normal curve can be completely defined by its mean and standard deviation.. The normal probability distribution is symmetrical about its mean. 3. The normal curve falls off smoothly in either direction from the central value. 4. The normal curve is asymptotic to the X-axis in both directions. This means that the tails of the distribution go to infinity in both directions as they get closer and closer to the horizontal axis. The normal curve is symmetrical. The two halves are identical. Theoretically, the curve Theoretically, the extends to curve extends to + The mean, median, and mode are equal. Normal Probability Distributions come in many sizes and shapes. For example: 17

18 1. The three normal curves below have the same mean but different standard deviations. σ A > σ > σ B C Mean A = B = C. The two normal curves below have the same standard deviations but different means. σ A = σ B Mean A < Mean B The number of different normal curves is unlimited. B. The Standard Normal Probability Distribution Even though normal curves have different sizes, they all have identical shape characteristics. So when discussing or comparing normal curves it is customary to compare them to a standardized normal curve with a mean of 0(Zero) and a standard deviation of 1. This standardized normal curve represents the standard normal distribution and is used for all problems relating to normal distributions. To standardize an observation from any given normal curve you must calculate the observation s Z value. The Z value tells you how far away the given observation is from the population mean in units of standard deviation. observation population Z = standard deviation mean = X µ σ Example: The EPS figures for a large group of firms are normally distributed with a mean of $5 and a standard deviation of $1. What is the Z-value given an EPS (X) of $6? How about $4? 18

19 Z for an X of $6 = ( X µ ) / σ = ($6 - $5)/$1 = +1 Z for an X of $4 = ( X µ ) / σ = ($4-$5)/$1 = -1 The Z of indicates that an EPS of $6 is one standard deviation above the mean, and a Z of 1.00 shows that the EPS of $4 is one standard deviation below the mean. Note that both EPSs ($6 and $4) are the same distance ($1) from the mean. C. Areas under the Normal Curve The area under a standardized curve represents probabilities. The area under the curve between two observations tells the probability of any third observation falling between these two numbers. Tables have been constructed for the standardized normal curve that tabulate these probabilities. An example of the use of the normal tables would be to answer the question, what is the probability that an event will fall between the mean and 1.5 standard deviations above the mean? This event is shown below in regular scale and standardized scale below Regular normal curve Mean = 60 and σ = 8 X = (70-60) / 8 = 1.5 Standardized normal curve Mean = 0 and σ = 1 Z = (1.5-0)/1 = 1.5 Z-scores are used to show the area under the normal curve and tabulated into tables. Verify from the table that a Z of 1.5 means that there is a 39.44% chance of an observation falling between 60 and 70. Example: Returning to the distribution of EPS figures ( µ = $ 5, σ = $1), what is the area under the normal curve between $3.40 and $7? This problem takes two steps. Fist calculate the area between $3.40 and the mean of $5: Z = ($ $5)/ $1 =

20 Now calculate the area between the mean of $5 and $7: Z = ($7 - $5)/$1 = The area under the curve for a Z of 1.60 is.445. The area under the curve for a Z of.00 is.477. Adding the two areas: =.94. Thus, the probability of observing and $3.40 $5.00 $7.00 EPS between $3.40 and $7 is.94. In other words, 9.4 percent of the EPSs are between $3.40 and $7. Example: What is the probability that an EPS figure will fall between the mean and one standard deviation in either direction from the mean? That is, what is the probability that the EPS figure will be between $6 and $4? A calculated Z of 1.00 gives.3413 in the table above. So the chance of an EPS figure being between the mean and one standard deviation from the mean is 34.13%. The chance that the EPS is within plus or minus one standard deviation of the mean is 68.5%. Knowing the probability that an observation will fall between the mean and one, two and three standard deviations from the mean is important. Also you should know what proportion of observations fall between plus or minus one, two, or three standard deviations of the mean. These figures are called confidence intervals and are used in hypothesis testing. You must remember the following probabilities and be able to verify them in the table above. 0 +Z -Z 0 +Z 34% of the area falls between 0 and +1 standard deviation from the mean. So, 68% of the observations fall within ± one standard deviation of the mean. 45% of the area falls between 0 and standard deviations from the mean. So, 90% of the observations fall within ± 1.65 standard deviations of the mean. 47.5% of the area falls between 0 and 1.96 standard deviations from the mean. So, 95% of the observations fall with ± 1.96 standard deviations of the mean. 0

21 49.5% of the area falls between 0 and.58 standard deviations from the mean. So, 99% of the observations fall within ±.58 standard deviations of the mean. Example: Returning again to the EPS figures ( µ = $5, σ = $1), what percent of the EPSs are $7.45 or more? We first find the area between the mean of $5 and an X of $ z = ( X µ ) / σ = ($7.45 $5) / $1 =. 45 The area associated with a Z of.45 is.499. This is the area between $5 and $7.45. Logically, the area for $7.45 and beyond is found by subtracting.499 from This area is.0071, indicating that only 0.71 percent of the EPS figures are $7.45 or more. Example: What percent of the EPS figures is less than $3.50? The problem is again separated into two parts. z = ($3.50 -$5)/$1 = 1.50 The area associated with a Z value of 1.50 is.433,.0668 so the probability of an EPS between $5 and $3.50 is Thus the probability of an EPS less than $3.50 is = $3.50 $5.00 Example: What percent of the EPS figures is greater than $3.50? Again this is a two-part problem. You add the.433 for.933 the EPS figures between $3.50 and $5 to.50, which represents the probability of EPS values above $5. The percent of EPSs above $3.50 is or 93.3%. $3.50 $5.00 1

22 SAMPLING METHODS AND SAMPLING DISTRIBUTIONS It is not always feasible to study the entire population. Hence a small sub-group of the population, called a sample, is drawn from the population. Based upon this sample, conclusions can be drawn about the entire population. A. What is a Probability Sample? A probability sample is one selected in such a way that each item or person in the population being studied has the same (non-zero) likelihood of being included in the sample. If all the members of the population do not have an equal chance of being included in the sample it is a nonprobability sample and may be biased. A sample is biased if the sample may not be truly representative of the population. Simple Random Sampling is where the observations are drawn randomly from the population. In a random sample, each observation must have the same chance of being drawn from the population. This is the standard sampling design. For example, assume you want to draw a sample of 5 numbers in a hat, and shake it up. Next, draw one number randomly from the hat. Repeat this process (experiment) four more times. The five drawn numbers (items) comprise a simple random sample from the population. An easier way to select a simple random sample is to use a random number table. You just pick the numbers from the table rather than shaking the hat and drawing. B. Sampling Error Sampling Error is the difference between a sample statistic (the mean and variance, and the standard deviation of the sample) and its corresponding population parameter (the mean and variance and the standard deviation of the population). Sampling error = sample mean population mean = X µ C. Sampling Distribution To learn the relationship between a population s statistics and its corresponding sample statistics requires some testing. To do this, a sampling distribution of the sample means is drawn. The sampling distribution of the sample means is a probability distribution made up of all possible sample means (of the same sample size) selected from a population along with their associated probabilities. In other words,

23 you create a distribution where the reported observations in the distribution are sample means (the average price of a sample of 5 stocks) rather than the individual observations (the stock s actual price). You should know that the mean of the sample means is always equal to the mean of the population: µ = µ This is always true because the sampling distribution contains all possible sample means population. samples of a given size selected from the population. D. Central Limit Theorem The central theorem tells us that for a population with a mean µ and a variance, the sampling distribution of the sample means of all possible samples of size n will be approximately normally distributed with a mean equal to µ and a variance equal to σ /n. (Note this assumes a large sample size, n 30). σ So you should know: 1. If the sample size n is sufficiently large, the sampling distribution of the sample means will be approximately normal.. The mean of the population, µ, and the mean of all possible sample means, µ x,are equal. 3. The variance of the distribution of sample means is σ /n. E. Point Estimates and Interval Estimates Point estimates are single (sample) values used to estimate population parameters. The sample mean, X, is the best estimator of the population mean µ and is estimated using the formula: X x = ( X ) /n Example: Assume that you are studying the relationship between stock price and investment returns for small firms. To estimate the mean stock price for small sized firms you draw a sample of 40 randomly selected firms and present them in the table below. Given the following sample of 40 stock prices:

24 What is the best estimate of the population s mean stock price? The sum of the 40 observations is $600. The mean stock price of $15 is found by: X = ( X ) /n = $600 / 40 = $15 Once you have the point estimate, you need to calculate the interval estimate. The interval estimate states the range within which a population parameter will probably fall. In interval estimation, this interval is constructed around the point estimate, X. The presumption is that this interval will likely contain the population parameter, µ. The interval within which a population parameter is expected to fall is called the confidence interval. You would expect with a confidence level of 95% that the population mean ( µ ) will fall within 1.96 standard deviations from the point estimate, the sample mean ( X ). To test this, you need to calculate the standard error of the sampling means. ± F. Standard Error of the Sample Means Standard Error of the Sample Means is the standard deviation of the distribution of the sample means. The standard error of the sample means when the standard deviation of the population is known is calculated by: σ σ x = n where: σ x = the standard error of the sample means σ = the standard deviation of the population n = the size of the sample If you don t know the population s standard deviation you can estimate the standard error of the sample means by dividing the standard deviation of you sample by working with a large sample size ( n 30). s S x = n n. This estimate assumes that you are where: S x = the standard error of the sample means based on statistics. s = the standard deviation of the sample. n = the size of the sample (assuming a large sample size.) 4

25 Example: The mean hourly wage for Iowa farm workers is $13.50 with a standard deviation of $.90. Let x be the mean wage per hour for a random sample of Iowa farm workers. Find the mean and standard error of the sample means, x, for a sample size of: a. 30; b. 75; and c The mean µ x of the sampling distribution of x is: µ = µ = $13.50 x Since σ is known, the standard error of the sample means is: σ.9 σ x = = = n 30 $.53 In conclusion, if you were to take all possible samples of size 30 from the Iowa farm worker population and prepare a sampling distribution of the sample means you will get a mean of $13.50 and standard error of $.53.. When the sample size is 75, the mean and standard error of the sample means is: µ = µ = $13.50 x σ.9 σ x = = = n 75 $ When the sample size is 00, the mean and standard error of the sample means is: µ = µ = $13.50 x σ.9 σ x = = = n 00 $.1 Conclusion: From the calculations above you should observe that the mean of the sampling distribution of x is always equal to the mean of the population whatever the size of the sample. However, the value of the standard error (standard deviation) of the sampling means decreases from $.53 to $.33 and then to $.1 as the sample size increases from 30 to 75 and then to 00. Why? As the sample size gets larger (and thus the size of your sample approaches the size of the entire population) the distribution of the sample means about the population mean gets smaller and smaller causing the 5

26 standard error of the sample means to get smaller. µ sample = µ population sample µ population µ = n = 30 n = 00 Distribution of Sample Means G. Constructing the 95 Percent and the 99 Percent Confidence Intervals For large samples (where n 30) the population parameters are: 95 percent confidence interval: X ± 1.96( s ) n 99 percent confidence interval: X ±.58( s ) n In general, a confidence interval for the mean is computed by: X ± ( Z)( s ) n Where Z reflects the statistic associated with the level of confidence, that is 1.96 or.58. Example: You select at random 81 earnings reports in the Widget Industry. You are interested in the EPS figure for the entire widget population. The sample mean EPS is computed to be $3.50 and the sample standard deviation is $ What is the estimated mean EPS in the widget industry (the population)?. What is the 95 percent confidence interval? 3. What are the 95 percent confidence limits? 4. Interpret your findings. 6

27 Solution: 1. The point estimate of the population mean is $ The confidence interval is between $3.11 and $3.89, found by: X ± 1.96( s ) $3.50 ± 1.96($1.80 / n 81) 3.50 ± and $3.89 The endpoints are frequently rounded and, in this case, would be recorded as $3.11 and $ The endpoints of the confidence interval are called the confidence limits. In this example $3.11 and $3.89 are the confidence limits. 4. Interpretation: If you had time to select 100 samples of size 81 from the population and compute the sample means and confidence intervals, the population mean EPS would be found within the confidence intervals about 95 out of the 100 times. The interval either contains the population mean or it does not. About 5 out of the 100 confidence intervals would not contain the population mean EPS, µ. 7

28 Binomial Probability Distribution 1. The binomial probability distribution is a discrete probability distribution. Characteristics of the binomial distribution are: a. There are only two possible outcomes of each trial. They are mutually exclusive. For example, in a coin toss the outcome is either heads or tails. Mutually exclusive means you cannot get heads and tails at the same time. b. The random variable is the result of counts. It s the number of successes out of the total number of trials you are looking for. For example, flip a coin 10 times and count the number of times heads appears. c. The probability of success remains constant from one trial to another. For example, no matter how many times you flips a coin the probability of getting a head is still 50% on each flip of the coin. d. The trials are independent of each other. This means that one trial does not affect the outcome of any other trial. The binomial probability distribution can be described using the formula: where: n x p (1-p) P(x) = n! x!(n is the number of trials. p x)! x (1 p) is the number of observed successes. n x is the probability of success on each trial. is the probability of a failure. This equation uses n! (n factorial). What you need to know about factorials is: n! stands for (n)(n-1)(n-) and so on. 3! = (3)()(1) = 6 5! = (5)(4)(3)()(1) =10 You also need to know that 0! = 1. Example: Five percent of all VCRs manufactured by a large electronics firm are defective. A quality control inspector randomly selects three VCRs from the production line. What is the probability that exactly one of these VCRs is defective? 8

29 Call the selection of a defective VCR a success and the selection of a good VCR a failure. The reason for calling a defective VCR a success is that the question asks for the probability of selecting exactly one defective VCR. The definitions of the terms are: n = total number of trials = 3 VCRs x = number of successes = number of defective VCRs = 1 n-x = number of failures = number of good VCRs = 3-1 = p = P(success) =.05 so (1-p) = P(failure) =.95 P(1) = [n!/(x!(n-x)!)](p x )((1-p) n-x ) = [((3)()(1)) / (1)(()(1))](.05) 1 (.95) P(1) = (3)(.05)(.905) = Using Binomial Probability Tables: For problems involving small n (say when n = 3 or 4) the probabilities of successes can be quickly generated mathematically using the earlier formula. However, for large n s the math can be tedious. Tables giving the probability for the binomial distribution for different n-sizes can be used in those cases. The n = 6 table is provided below. Binomial Probabilities for n = 6 Note: x = the # of observed successes. p = the prob. of success. x/p Example: Suppose you toss a coin six times. What is the probability of zero heads? Exactly one heads? Exactly two heads? The conditions for a binomial distribution are met: There is constant probability of success (call head a success) p = 50% = 0.5. There are a fixed number of trials (n = 6). The trials are independent, and There are only two possible outcomes (heads or tails). 9

30 Refer to the table under column 0.5. The probabilities for the different x s are: P(0) = P(1) = P() = 0.34 The mean and variance of a binomial distribution are given by: μ = np σ = np(1-p) Example continued: The mean μ = up = (6)(0.5) = 3, and The variance σ = np(1-p) = (6)(.5)(.5) = 1.5. The standard deviation σ = 1. 5 =1. If you assumed that the binomial distribution is symmetrical, then the empirical rule shows that 95 percent of all the trial observations will lie within + / - standard deviations of the mean. Cumulative Probability Distributions What if you want to find the probably of multiple outcomes in the coin flipping example? Example: You want to find the probability of observing 5 or fewer heads in 6 tosses. In this case, you are interested in finding the cumulative probability. The probability of getting 5 or less heads given 6 coin tosses is written P(x 5 n = 6). P(x 5) = P(x = 0) + P(x = 1) + P(x = ) + P(x = 3) + P(x = 4) + P(x = 5) P(x 5) = = Or using the complimentary rule: P(x 5) = 1 P(x > 5) = 1 P(x = 6) = = (Note the slight rounding error.) So what is a cumulative probability distribution? it is the sum of the individual event probabilities. In the binomial the calculated probability is for a specific outcome, like getting 5 successes. If you wanted to know the probability of getting a range of successes you must cumulate (add up) the individual probabilities. Examples: 30

31 What is the probability of getting exactly 5 heads? This is the basic binomial table figure. What is the probability of getting 4 or less? Sum the probability of getting 0, 1,, 3 or 4. What is the probability of getting 3 or more? Sum the probability of getting 3, 4, 5 or 6. What is the probability of getting more than 3? Sum the probability of getting 4, 5 or 6. 31

32 TESTS OF HYPOTHESIS (A) Introduction The purpose of this chapter is to discuss another aspect of statistical inference, hypothesis testing. (B) What is a Hypothesis? A hypothesis is a statement about the value of a population parameter developed for the purpose of testing. Some examples of hypotheses are: The mean monthly income for financial analysts is $5,65. The mean average return for an index fund is 14%. Ninety percent of all federal income tax forms are filled out correctly. (C) What is Hypothesis Testing? Hypothesis Testing is a procedure based on evidence from samples and probability theory used to determine whether a hypothesis is a reasonable statement and should not be rejected, or is an unreasonable statement and should be rejected. (D) Testing a Hypothesis The five-step procedure for testing a hypothesis: Step 1: Write the Null Hypothesis and the Alternate Hypothesis The null hypothesis (H 0 ) is a statement about the value of a population parameter. The alternative hypothesis or research (H 1 ) is the statement that will be accepted if the sample data provides evidence repudiating the null hypothesis. You should remember that it is usually the alternative hypothesis that you are really trying to support. Why? Since you can never really prove anything with statistics, when you discredit the null hypothesis you are implying that the alternative is valid. Example: A person is being tried in court. Based on the available evidence, the jury must make one of two decisions: guilty or not guilty. At the beginning of the trial the person is assumed to be not guilty. In statistics term, the assumption that the person is not guilty is called the null hypothesis; the alternate hypothesis would be that the person is guilty. 3

33 Step : Select the Level of Significance The level of significance is defined as the probability of rejecting the null hypothesis when it is actually true. The significance level is called the level of risk or α and usually set at 0.05 (5%) or 0.01 (1%). A Type I Error is defined as rejecting the null hypothesis, H 0, when it is actually true. The probability of committing a Type I Error is the risk level or alpha risk. A Type II Error is defined as accepting the null hypothesis when it is actually false. The chance of making a Type II error is called beta risk. The following table summarizes the possible jury verdicts discussed above: THE JURY VERDICT The hypothesis revealed Accepts H 0 : Find not guilty Rejects H 0 : Find guilty H 0 is true: The defendant is not guilty H 0 is false: The defendant is guilty Correct decision Release a guilty person. Type II error Convict an innocent person Type I error Correct decision A test with one rejection region is called a one-tailed test and a test with two rejection regions is called a two-tailed test. In general, a test is one-tailed when the alternate hypothesis states a direction like greater than or less than. The two-tailed test is usually stated as not being equal to some value. A one-tailed test: H 0 : MBA starting salaries are equal to or greater than ( ) $75,000. H 1 : MBA starting salaries are less than (<) $75,000. A two-tailed test: H 0 : The return on the portfolio equals (=) 1%. H 1 : The return of the portfolio does not equal ( ) 1% Step 3: Calculate the test Statistic The test statistic, Z, tells you how far the sample mean, X, is away from the center of the distribution in standard deviation units. Z is used to determine whether or not to reject the null hypothesis. For large samples (30 or more observations), the test statistic is computed with the following formula: 33

34 X µ Z = σ n Step 4: Establish the Decision Rule The decision rule states when to reject the null hypothesis. It indicates how many standard deviation units (Z) a sample mean, X, has to be away from the center of the distribution before it is considered not part of the distribution. Step 5: Make the Decision The final step is to decide whether or not to reject the null hypothesis. The decision will depend on the test of significance. If the calculated Z statistic is greater than the decision rule Z statistic, then the sample mean, X, is significantly far away from the center of the distribution and thus isn t associated with the distribution. So, reject the null hypothesis and accept the alternative hypothesis when the calculated test Z statistic exceeds the critical decision rule Z statistic. (E) Examples of the Five-step Procedure Example 1 When the gizmo machine is working properly, the mean length of gizmos is.5 inches. However, from time to time the machine gets out of alignment and produces gizmos that are either too long or too short. When this happens, production is stopped and the machine is adjusted. To check the machine, the quality control department takes a gizmo sample each day. Today a random sample of 49 gizmos showed a mean length of.49 inches. The population standard deviation is.01 inches. Using a 5% significance level, should the machine be shut down and adjusted? Let μ be the mean length of all gizmos made by this machine and X the corresponding mean for the sample. From the given information, Step 1: State the Null and Alternative Hypothesis H 0 : μ =.5 (The machine does not need an adjustment) H 1 : μ.5 (The machine needs an adjustment) This is a two-tailed test. Step : Set the Significance Level You are willing to make a Type I error 5% of the time so the level of significance is.05. The sign in the alternative hypothesis indicates that the test is two-tailed with two rejection regions, one in each tail of the normal distribution curve. Because the total area of both rejection regions is.05 34

35 (the significance level), the area of the rejection region in each tail is.05. Step 3: Calculate the Test Statistic The value of X from the sample is.49. Since σ is given as.01, we calculate the Z test statistic using σ as follows. Z = ( X µ )/( σ n ) = (.49.5) /(.01/ 49) =.01/.003 = Step 4: State the Decision Rule The calculated value of Z = is called the computed or observed test statistic Z. This Z value indicates the location of the observed sample mean relative to the population mean (3.33 size adjusted standard deviations to the left of the mean). You must now compare the computed Z to the critical Z value found in the normal curve table found in Chapter 7. Since this is a two-tailed test with a significance level of.05, you are expected to know that the table value is ±1.96 size adjusted standard deviations from the mean. This means that the null hypothesis should be accepted if the computed Z value lies between 1.96 and and rejected if it lies outside of these critical values. Step 5: Make the Decision The decision is based on the location of the calculated test statistic, Z, computed in Step 3. This value, Z = -3.33, is less than the critical value, Z = -1.96, and it falls in the rejection region in the left tail. Hence, reject H 0 and conclude that based on the sample information the machine is out of adjustment. Example You want to know if small capitalization stock performance equals or exceeds the Market Index. You take a sample of 64 small cap stocks and get a mean return of 1%. During the same time period the population mean of the stocks in the index is 11% with a standard deviation of 4%. Your desired level of significance is.01. Step 1: State the Null and Alternative Hypothesis H 0 : Small cap returns = or > 11% H 1 : Small cap returns < 11% This is a one tailed test. Critical Z values 1 Tail Tail Step : 1% Set the Significance Level ± ± ±.58 35

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

The normal distribution is a theoretical model derived mathematically and not empirically.

The normal distribution is a theoretical model derived mathematically and not empirically. Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.

More information

MidTerm 1) Find the following (round off to one decimal place):

MidTerm 1) Find the following (round off to one decimal place): MidTerm 1) 68 49 21 55 57 61 70 42 59 50 66 99 Find the following (round off to one decimal place): Mean = 58:083, round off to 58.1 Median = 58 Range = max min = 99 21 = 78 St. Deviation = s = 8:535,

More information

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.

More information

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution Section 7.6 Application of the Normal Distribution A random variable that may take on infinitely many values is called a continuous random variable. A continuous probability distribution is defined by

More information

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions SGSB Workshop: Using Statistical Data to Make Decisions Module 2: The Logic of Statistical Inference Dr. Tom Ilvento January 2006 Dr. Mugdim Pašić Key Objectives Understand the logic of statistical inference

More information

Math 227 Elementary Statistics. Bluman 5 th edition

Math 227 Elementary Statistics. Bluman 5 th edition Math 227 Elementary Statistics Bluman 5 th edition CHAPTER 6 The Normal Distribution 2 Objectives Identify distributions as symmetrical or skewed. Identify the properties of the normal distribution. Find

More information

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes. Introduction In the previous chapter we discussed the basic concepts of probability and described how the rules of addition and multiplication were used to compute probabilities. In this chapter we expand

More information

Statistical Methods in Practice STAT/MATH 3379

Statistical Methods in Practice STAT/MATH 3379 Statistical Methods in Practice STAT/MATH 3379 Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics Sam Houston State University Overview 6.1 Discrete

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

The Binomial Probability Distribution

The Binomial Probability Distribution The Binomial Probability Distribution MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2017 Objectives After this lesson we will be able to: determine whether a probability

More information

Lecture 9. Probability Distributions. Outline. Outline

Lecture 9. Probability Distributions. Outline. Outline Outline Lecture 9 Probability Distributions 6-1 Introduction 6- Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7- Properties of the Normal Distribution

More information

Lecture 9. Probability Distributions

Lecture 9. Probability Distributions Lecture 9 Probability Distributions Outline 6-1 Introduction 6-2 Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7-2 Properties of the Normal Distribution

More information

MAKING SENSE OF DATA Essentials series

MAKING SENSE OF DATA Essentials series MAKING SENSE OF DATA Essentials series THE NORMAL DISTRIBUTION Copyright by City of Bradford MDC Prerequisites Descriptive statistics Charts and graphs The normal distribution Surveys and sampling Correlation

More information

UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES

UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES f UNIT 4 NORMAL DISTRIBUTION: DEFINITION, CHARACTERISTICS AND PROPERTIES Normal Distribution: Definition, Characteristics and Properties Structure 4.1 Introduction 4.2 Objectives 4.3 Definitions of Probability

More information

Theoretical Foundations

Theoretical Foundations Theoretical Foundations Probabilities Monia Ranalli monia.ranalli@uniroma2.it Ranalli M. Theoretical Foundations - Probabilities 1 / 27 Objectives understand the probability basics quantify random phenomena

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 4 Random Variables & Probability Distributions Content 1. Two Types of Random Variables 2. Probability Distributions for Discrete Random Variables 3. The Binomial

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution January 31, 2019 Contents The Binomial Distribution The Normal Approximation to the Binomial The Binomial Hypothesis Test Computing Binomial Probabilities in R 30 Problems The

More information

Data Analysis. BCF106 Fundamentals of Cost Analysis

Data Analysis. BCF106 Fundamentals of Cost Analysis Data Analysis BCF106 Fundamentals of Cost Analysis June 009 Chapter 5 Data Analysis 5.0 Introduction... 3 5.1 Terminology... 3 5. Measures of Central Tendency... 5 5.3 Measures of Dispersion... 7 5.4 Frequency

More information

Chapter 4. The Normal Distribution

Chapter 4. The Normal Distribution Chapter 4 The Normal Distribution 1 Chapter 4 Overview Introduction 4-1 Normal Distributions 4-2 Applications of the Normal Distribution 4-3 The Central Limit Theorem 4-4 The Normal Approximation to the

More information

Chapter 4 and 5 Note Guide: Probability Distributions

Chapter 4 and 5 Note Guide: Probability Distributions Chapter 4 and 5 Note Guide: Probability Distributions Probability Distributions for a Discrete Random Variable A discrete probability distribution function has two characteristics: Each probability is

More information

CHAPTER 2 Describing Data: Numerical

CHAPTER 2 Describing Data: Numerical CHAPTER Multiple-Choice Questions 1. A scatter plot can illustrate all of the following except: A) the median of each of the two variables B) the range of each of the two variables C) an indication of

More information

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Chapter 3 Numerical Descriptive Measures Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1 Objectives In this chapter, you learn to: Describe the properties of central tendency, variation, and

More information

4.1 Probability Distributions

4.1 Probability Distributions Probability and Statistics Mrs. Leahy Chapter 4: Discrete Probability Distribution ALWAYS KEEP IN MIND: The Probability of an event is ALWAYS between: and!!!! 4.1 Probability Distributions Random Variables

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution January 31, 2018 Contents The Binomial Distribution The Normal Approximation to the Binomial The Binomial Hypothesis Test Computing Binomial Probabilities in R 30 Problems The

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 6 Normal Probability Distributions 6-1 Overview 6-2 The Standard Normal Distribution

More information

David Tenenbaum GEOG 090 UNC-CH Spring 2005

David Tenenbaum GEOG 090 UNC-CH Spring 2005 Simple Descriptive Statistics Review and Examples You will likely make use of all three measures of central tendency (mode, median, and mean), as well as some key measures of dispersion (standard deviation,

More information

CH 5 Normal Probability Distributions Properties of the Normal Distribution

CH 5 Normal Probability Distributions Properties of the Normal Distribution Properties of the Normal Distribution Example A friend that is always late. Let X represent the amount of minutes that pass from the moment you are suppose to meet your friend until the moment your friend

More information

Probability. An intro for calculus students P= Figure 1: A normal integral

Probability. An intro for calculus students P= Figure 1: A normal integral Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided

More information

5.1 Personal Probability

5.1 Personal Probability 5. Probability Value Page 1 5.1 Personal Probability Although we think probability is something that is confined to math class, in the form of personal probability it is something we use to make decisions

More information

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Chapter 8 Measures of Center Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc. Data that can only be integer

More information

2017 Fall QMS102 Tip Sheet 2

2017 Fall QMS102 Tip Sheet 2 Chapter 5: Basic Probability 2017 Fall QMS102 Tip Sheet 2 (Covering Chapters 5 to 8) EVENTS -- Each possible outcome of a variable is an event, including 3 types. 1. Simple event = Described by a single

More information

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions. ME3620 Theory of Engineering Experimentation Chapter III. Random Variables and Probability Distributions Chapter III 1 3.2 Random Variables In an experiment, a measurement is usually denoted by a variable

More information

Simple Descriptive Statistics

Simple Descriptive Statistics Simple Descriptive Statistics These are ways to summarize a data set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency

More information

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives Basic Statistics for the Healthcare Professional 1 F R A N K C O H E N, M B B, M P A D I R E C T O R O F A N A L Y T I C S D O C T O R S M A N A G E M E N T, LLC Purpose of Statistic 2 Provide a numerical

More information

Section Introduction to Normal Distributions

Section Introduction to Normal Distributions Section 6.1-6.2 Introduction to Normal Distributions 2012 Pearson Education, Inc. All rights reserved. 1 of 105 Section 6.1-6.2 Objectives Interpret graphs of normal probability distributions Find areas

More information

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg :

More information

Section Distributions of Random Variables

Section Distributions of Random Variables Section 8.1 - Distributions of Random Variables Definition: A random variable is a rule that assigns a number to each outcome of an experiment. Example 1: Suppose we toss a coin three times. Then we could

More information

The topics in this section are related and necessary topics for both course objectives.

The topics in this section are related and necessary topics for both course objectives. 2.5 Probability Distributions The topics in this section are related and necessary topics for both course objectives. A probability distribution indicates how the probabilities are distributed for outcomes

More information

STATS DOESN T SUCK! ~ CHAPTER 4

STATS DOESN T SUCK! ~ CHAPTER 4 CHAPTER 4 QUESTION 1 The Geometric Mean Suppose you make a 2-year investment of $5,000 and it grows by 100% to $10,000 during the first year. During the second year, however, the investment suffers a 50%

More information

A Derivation of the Normal Distribution. Robert S. Wilson PhD.

A Derivation of the Normal Distribution. Robert S. Wilson PhD. A Derivation of the Normal Distribution Robert S. Wilson PhD. Data are said to be normally distributed if their frequency histogram is apporximated by a bell shaped curve. In practice, one can tell by

More information

Section Distributions of Random Variables

Section Distributions of Random Variables Section 8.1 - Distributions of Random Variables Definition: A random variable is a rule that assigns a number to each outcome of an experiment. Example 1: Suppose we toss a coin three times. Then we could

More information

Part V - Chance Variability

Part V - Chance Variability Part V - Chance Variability Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 1 / 78 Law of Averages In Chapter 13 we discussed the Kerrich coin-tossing experiment.

More information

Lecture Data Science

Lecture Data Science Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics Foundations JProf. Dr. Claudia Wagner Learning Goals How to describe sample data? What is mode/median/mean?

More information

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table: Chapter8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables tthe value of the result of the probability experiment is a RANDOM VARIABLE. Example - Let X be the number

More information

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon. Chapter 14: random variables p394 A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon. Consider the experiment of tossing a coin. Define a random variable

More information

The Normal Probability Distribution

The Normal Probability Distribution 1 The Normal Probability Distribution Key Definitions Probability Density Function: An equation used to compute probabilities for continuous random variables where the output value is greater than zero

More information

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table: Chapter7 Probability Distributions and Statistics Distributions of Random Variables tthe value of the result of the probability experiment is a RANDOM VARIABLE. Example - Let X be the number of boys in

More information

Chapter 6. The Normal Probability Distributions

Chapter 6. The Normal Probability Distributions Chapter 6 The Normal Probability Distributions 1 Chapter 6 Overview Introduction 6-1 Normal Probability Distributions 6-2 The Standard Normal Distribution 6-3 Applications of the Normal Distribution 6-5

More information

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION

Module Tag PSY_P2_M 7. PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION Subject Paper No and Title Module No and Title Paper No.2: QUANTITATIVE METHODS Module No.7: NORMAL DISTRIBUTION Module Tag PSY_P2_M 7 TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Properties

More information

1/2 2. Mean & variance. Mean & standard deviation

1/2 2. Mean & variance. Mean & standard deviation Question # 1 of 10 ( Start time: 09:46:03 PM ) Total Marks: 1 The probability distribution of X is given below. x: 0 1 2 3 4 p(x): 0.73? 0.06 0.04 0.01 What is the value of missing probability? 0.54 0.16

More information

Discrete Probability Distributions

Discrete Probability Distributions Page 1 of 6 Discrete Probability Distributions In order to study inferential statistics, we need to combine the concepts from descriptive statistics and probability. This combination makes up the basics

More information

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads Overview Both chapters and 6 deal with a similar concept probability distributions. The difference is that chapter concerns itself with discrete probability distribution while chapter 6 covers continuous

More information

Review: Population, sample, and sampling distributions

Review: Population, sample, and sampling distributions Review: Population, sample, and sampling distributions A population with mean µ and standard deviation σ For instance, µ = 0, σ = 1 0 1 Sample 1, N=30 Sample 2, N=30 Sample 100000000000 InterquartileRange

More information

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range February 19, 2004 EXAM 1 : Page 1 All sections : Geaghan Read Carefully. Give an answer in the form of a number or numeric expression where possible. Show all calculations. Use a value of 0.05 for any

More information

Chapter 5 Basic Probability

Chapter 5 Basic Probability Chapter 5 Basic Probability Probability is determining the probability that a particular event will occur. Probability of occurrence = / T where = the number of ways in which a particular event occurs

More information

Population Mean GOALS. Characteristics of the Mean. EXAMPLE Population Mean. Parameter Versus Statistics. Describing Data: Numerical Measures

Population Mean GOALS. Characteristics of the Mean. EXAMPLE Population Mean. Parameter Versus Statistics. Describing Data: Numerical Measures GOALS Describing Data: Numerical Measures Chapter 3 McGraw-Hill/Irwin Copyright 010 by The McGraw-Hill Companies, Inc. All rights reserved. 3-1. Calculate the arithmetic mean, weighted mean, median, mode,

More information

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models STA 6166 Fall 2007 Web-based Course 1 Notes 10: Probability Models We first saw the normal model as a useful model for the distribution of some quantitative variables. We ve also seen that if we make a

More information

IOP 201-Q (Industrial Psychological Research) Tutorial 5

IOP 201-Q (Industrial Psychological Research) Tutorial 5 IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,

More information

Statistics 6 th Edition

Statistics 6 th Edition Statistics 6 th Edition Chapter 5 Discrete Probability Distributions Chap 5-1 Definitions Random Variables Random Variables Discrete Random Variable Continuous Random Variable Ch. 5 Ch. 6 Chap 5-2 Discrete

More information

Business Statistics. Chapter 5 Discrete Probability Distributions QMIS 120. Dr. Mohammad Zainal

Business Statistics. Chapter 5 Discrete Probability Distributions QMIS 120. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 5 Discrete Probability Distributions QMIS 120 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should

More information

Counting Basics. Venn diagrams

Counting Basics. Venn diagrams Counting Basics Sets Ways of specifying sets Union and intersection Universal set and complements Empty set and disjoint sets Venn diagrams Counting Inclusion-exclusion Multiplication principle Addition

More information

Chapter 5 Student Lecture Notes 5-1. Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 5 Student Lecture Notes 5-1. Department of Quantitative Methods & Information Systems. Business Statistics Chapter 5 Student Lecture Notes 5-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 5 Discrete Probability Distributions QMIS 120 Dr. Mohammad Zainal Chapter Goals

More information

5.2 Random Variables, Probability Histograms and Probability Distributions

5.2 Random Variables, Probability Histograms and Probability Distributions Chapter 5 5.2 Random Variables, Probability Histograms and Probability Distributions A random variable (r.v.) can be either continuous or discrete. It takes on the possible values of an experiment. It

More information

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables Chapter : Random Variables Ch. -3: Binomial and Geometric Random Variables X 0 2 3 4 5 7 8 9 0 0 P(X) 3???????? 4 4 When the same chance process is repeated several times, we are often interested in whether

More information

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going? 1 The Law of Averages The Expected Value & The Standard Error Where Are We Going? Sums of random numbers The law of averages Box models for generating random numbers Sums of draws: the Expected Value Standard

More information

Midterm Test 1 (Sample) Student Name (PRINT):... Student Signature:... Use pencil, so that you can erase and rewrite if necessary.

Midterm Test 1 (Sample) Student Name (PRINT):... Student Signature:... Use pencil, so that you can erase and rewrite if necessary. MA 180/418 Midterm Test 1 (Sample) Student Name (PRINT):............................................. Student Signature:................................................... Use pencil, so that you can erase

More information

Chapter 3-Describing Data: Numerical Measures

Chapter 3-Describing Data: Numerical Measures Chapter 3-Describing Data: Numerical Measures Jie Zhang Account and Information Systems Department College of Business Administration The University of Texas at El Paso jzhang6@utep.edu Jie Zhang, QMB

More information

We use probability distributions to represent the distribution of a discrete random variable.

We use probability distributions to represent the distribution of a discrete random variable. Now we focus on discrete random variables. We will look at these in general, including calculating the mean and standard deviation. Then we will look more in depth at binomial random variables which are

More information

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal Econ 6900: Statistical Problems Instructor: Yogesh Uppal Email: yuppal@ysu.edu Lecture Slides 4 Random Variables Probability Distributions Discrete Distributions Discrete Uniform Probability Distribution

More information

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD MAJOR POINTS Sampling distribution of the mean revisited Testing hypotheses: sigma known An example Testing hypotheses:

More information

Statistics and Probability

Statistics and Probability Statistics and Probability Continuous RVs (Normal); Confidence Intervals Outline Continuous random variables Normal distribution CLT Point estimation Confidence intervals http://www.isrec.isb-sib.ch/~darlene/geneve/

More information

PROBABILITY DISTRIBUTIONS

PROBABILITY DISTRIBUTIONS CHAPTER 3 PROBABILITY DISTRIBUTIONS Page Contents 3.1 Introduction to Probability Distributions 51 3.2 The Normal Distribution 56 3.3 The Binomial Distribution 60 3.4 The Poisson Distribution 64 Exercise

More information

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation Name In a binomial experiment of n trials, where p = probability of success and q = probability of failure mean variance standard deviation µ = n p σ = n p q σ = n p q Notation X ~ B(n, p) The probability

More information

Fundamentals of Statistics

Fundamentals of Statistics CHAPTER 4 Fundamentals of Statistics Expected Outcomes Know the difference between a variable and an attribute. Perform mathematical calculations to the correct number of significant figures. Construct

More information

Statistics, Measures of Central Tendency I

Statistics, Measures of Central Tendency I Statistics, Measures of Central Tendency I We are considering a random variable X with a probability distribution which has some parameters. We want to get an idea what these parameters are. We perfom

More information

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY 1 THIS WEEK S PLAN Part I: Theory + Practice ( Interval Estimation ) Part II: Theory + Practice ( Interval Estimation ) z-based Confidence Intervals for a Population

More information

Math 227 (Statistics) Chapter 6 Practice Test MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Math 227 (Statistics) Chapter 6 Practice Test MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Math 227 (Statistics) Chapter 6 Practice Test MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Using the following uniform density curve, answer the

More information

Probability Models.S2 Discrete Random Variables

Probability Models.S2 Discrete Random Variables Probability Models.S2 Discrete Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Results of an experiment involving uncertainty are described by one or more random

More information

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede, FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede, mb8@ecs.soton.ac.uk The normal distribution The normal distribution is the classic "bell curve". We've seen that

More information

Section Random Variables and Histograms

Section Random Variables and Histograms Section 3.1 - Random Variables and Histograms Definition: A random variable is a rule that assigns a number to each outcome of an experiment. Example 1: Suppose we toss a coin three times. Then we could

More information

3.1 Measures of Central Tendency

3.1 Measures of Central Tendency 3.1 Measures of Central Tendency n Summation Notation x i or x Sum observation on the variable that appears to the right of the summation symbol. Example 1 Suppose the variable x i is used to represent

More information

Lecture 2. Probability Distributions Theophanis Tsandilas

Lecture 2. Probability Distributions Theophanis Tsandilas Lecture 2 Probability Distributions Theophanis Tsandilas Comment on measures of dispersion Why do common measures of dispersion (variance and standard deviation) use sums of squares: nx (x i ˆµ) 2 i=1

More information

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7

Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7 Probability & Sampling The Practice of Statistics 4e Mostly Chpts 5 7 Lew Davidson (Dr.D.) Mallard Creek High School Lewis.Davidson@cms.k12.nc.us 704-786-0470 Probability & Sampling The Practice of Statistics

More information

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Statistics 431 Spring 2007 P. Shaman. Preliminaries Statistics 4 Spring 007 P. Shaman The Binomial Distribution Preliminaries A binomial experiment is defined by the following conditions: A sequence of n trials is conducted, with each trial having two possible

More information

FINAL REVIEW W/ANSWERS

FINAL REVIEW W/ANSWERS FINAL REVIEW W/ANSWERS ( 03/15/08 - Sharon Coates) Concepts to review before answering the questions: A population consists of the entire group of people or objects of interest to an investigator, while

More information

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS A random variable is the description of the outcome of an experiment in words. The verbal description of a random variable tells you how to find or calculate

More information

Binomial Random Variables. Binomial Random Variables

Binomial Random Variables. Binomial Random Variables Bernoulli Trials Definition A Bernoulli trial is a random experiment in which there are only two possible outcomes - success and failure. 1 Tossing a coin and considering heads as success and tails as

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and

More information

Probability is the tool used for anticipating what the distribution of data should look like under a given model.

Probability is the tool used for anticipating what the distribution of data should look like under a given model. AP Statistics NAME: Exam Review: Strand 3: Anticipating Patterns Date: Block: III. Anticipating Patterns: Exploring random phenomena using probability and simulation (20%-30%) Probability is the tool used

More information

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Measures of Dispersion (Range, standard deviation, standard error) Introduction Measures of Dispersion (Range, standard deviation, standard error) Introduction We have already learnt that frequency distribution table gives a rough idea of the distribution of the variables in a sample

More information

The Two-Sample Independent Sample t Test

The Two-Sample Independent Sample t Test Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 The General Formula The Equal-n Formula 4 5 6 Independence Normality Homogeneity of Variances 7 Non-Normality Unequal

More information

Chapter 9: Sampling Distributions

Chapter 9: Sampling Distributions Chapter 9: Sampling Distributions 9. Introduction This chapter connects the material in Chapters 4 through 8 (numerical descriptive statistics, sampling, and probability distributions, in particular) with

More information

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE 19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE We assume here that the population variance σ 2 is known. This is an unrealistic assumption, but it allows us to give a simplified presentation which

More information

A useful modeling tricks.

A useful modeling tricks. .7 Joint models for more than two outcomes We saw that we could write joint models for a pair of variables by specifying the joint probabilities over all pairs of outcomes. In principal, we could do this

More information

Review of the Topics for Midterm I

Review of the Topics for Midterm I Review of the Topics for Midterm I STA 100 Lecture 9 I. Introduction The objective of statistics is to make inferences about a population based on information contained in a sample. A population is the

More information

Sampling Distributions For Counts and Proportions

Sampling Distributions For Counts and Proportions Sampling Distributions For Counts and Proportions IPS Chapter 5.1 2009 W. H. Freeman and Company Objectives (IPS Chapter 5.1) Sampling distributions for counts and proportions Binomial distributions for

More information

Chapter 6 Probability

Chapter 6 Probability Chapter 6 Probability Learning Objectives 1. Simulate simple experiments and compute empirical probabilities. 2. Compute both theoretical and empirical probabilities. 3. Apply the rules of probability

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, 2013 Abstract Introduct the normal distribution. Introduce basic notions of uncertainty, probability, events,

More information

Module 4: Probability

Module 4: Probability Module 4: Probability 1 / 22 Probability concepts in statistical inference Probability is a way of quantifying uncertainty associated with random events and is the basis for statistical inference. Inference

More information