Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Statistics (This summary is for chapters 18, 29 and section H of chapter 19) Mean, Median, Mode Mode: most common value Median: middle value (when the values are in order) Mean = total how many = x n = fx f [For use on a frequency table] Note: the position of the median can be found using n +1, where is n how many numbers there are. 2 The mean uses all the data, so is normally the most useful average, but if there are extreme values then the median is far more useful. Find the mode, median and mean of 1, 1, 3, 5, 6, 7, 11, 20 Mode = 1 Median = 5.5 (half way between 5 and 6) Mean = 1+1+ 3 + 5 + 6 + 7 +11+ 20 8 = 54 8 = 6.75 Find the median of the data given in the following table. Number of arms 0 1 2 3 4 5 6 7 Frequency 3 7 9 10 11 5 3 1 The total frequency is 49. So the position of the median is n+1 2 = 49+1 2 =25. So we look for the 25 th number, this is done by adding up the frequencies until we have gone past the 25 th number. Number of arms 0 1 2 3 4 5 6 7 Frequency 3 7 9 10 11 5 3 1 3 10 19 29 So we went past the 25 th item whilst in the 3 category. Therefore the median is 3.

Calculate an estimate of the mean of the following. Weight (w kg) Frequency 0 < w 10 7 10 < w 20 9 20 < w 30 12 30 < w 40 8 40 < w 50 6 50 < w 60 3 To find the mean, we need to use the midpoints of each interval in order to calculate the total. Weight (w kg) Frequency (f) Midpoint (x) f x 0 < w 10 7 5 35 10 < w 20 9 15 135 20 < w 30 12 25 300 30 < w 40 8 35 280 40 < w 50 6 45 270 50 < w 60 3 55 165 How many = Σf = 45 Total = Σfx = 1185 So estimate of mean = Total How Many = fx x = 1185 45 = 26.3 Finding the median and mean using the TI- 84 Find the median and mean of 3, 6, 7, 9, 9, 11 1.) Press the STAT button, then select the edit option. 2.) If necessary clear list 1 (L 1), then enter the data into list 1. 3.) Press the STAT button again, then choose the calc option, then choose 1- Var Stats. Then press enter. 4.) The mean is given as x =7.5. 5.) Scroll down to find the median (Med = 8). Find the median and mean of the following data Number of cats Frequency 0 7 1 9 2 11 3 8 4 6 5 2

1.) Press the STAT button, then select the edit option. 2.) If necessary clear list 1 and list 2 (L 1 & L 2), then enter the first column of the table into list 1, and the frequencies into list 2 3.) Press the STAT button again, then choose the calc option, then choose 1- Var Stats, but do not press enter. 4.) The screen should now say 1- Var Stats. Now select L 1, by pressing 2ND and 1, then press the comma button, then select L 2, by pressing 2ND and 2. Press enter. 5.) The mean is given as x =2.09 6.) Scroll down to find the median (Med = 2). Measuring the spread of data Range = largest smallest Interquartile Range = Upper Quartile Lower Quartile Standard Deviation = ( x x) 2 n = x 2 x 2 n [The second version of the formula is easier to use, but only the first is given in the IB formula book!!] Note: the interquartile range may look straightforward, but it s not. How to find a quartile is not clearly defined. (Also this is not the only definition of standard deviation ) The standard deviation is the most useful measure of spread as it uses all the values. But if there are extreme values, then the interquartile range is more useful. Find the range, interquartile range and standard deviation of 8, 9, 10, 11, 12, 13, 15. The range is 15 8 = 7. To find the interquartile range, we first find the quartiles 8 9 10 11 12 13 15 So Interquartile range = 13 9 = 4 To find the standard deviation, we first find the mean. x = Lower Quartile 8 + 9 +10 +11+12 +13 +15 7 Median = 11.142 Upper Quartile So: 2 ( x x) = ( 8 11.142) 2 +... + ( 15 11.142) 2 = 34.84 Therefore: standard deviation = ( x x) 2 n = 34.84 7 = 2.23

Find the standard deviation of the data presented in the following table. Number of eggs Frequency 0 4 2 5 9 6 10 14 9 15 19 3 We need lots of extra columns to work out the standard deviation. It can be helpful to use a slightly different version of the formula Standard deviation = ( x x) 2 n = ( ) 2 f x x f Number of eggs Frequency Mid point (x) fx (x x ) 2 f(x x ) 2 0 4 2 2 4 68.05625 136.1125 5 9 6 7 42 10.5625 63.375 10 14 9 12 108 3.0625 27.5625 15 19 3 17 51 45.5625 136.6875 n = Σf = 20 Σfx = 205 Σf(x x ) 2 = 363.7375 The mean ( x ) is 10.25, using the technique used in a previous example. So now we can work out the standard deviation Standard deviation = ( ) 2 f x x f = 363.7375 20 = 4.26 (This is why we must be thankful for the statistical functions on calculators.) Finding the standard deviation using the TI- 84 Follow the steps outlined in the section Finding the median and mean using the TI- 84. The standard deviation is given by σ x =.

Statistical Graphs The histogram shown below shows that, for example, eight chickens weighed between 15kg and 20kg. Weight of some very fat chickens 10 Frequency 8 6 4 2 10 15 20 25 30 35 Weight (kg) Note: Histograms in IB are simpler than at IGCSE; all class widths are equal and you put frequency up the side (not frequency density). The stem- and- leaf diagram show you the original data. 7 1 3 5 6 8 0 2 2 4 6 8 9 5 5 7 9 10 1 7 11 0 3 So this means 8.6 Key: 7 1 means 7.1 This stem- and- leaf diagram shows you, for example that 11.3 is the largest number. A cumulative frequency diagram can be used to find the median, quartiles and percentiles of grouped data. The table below shows the length of some frogs caught at Barra Honda National Park. Length (l cm) Frequency 0 < l 1 4 1 < l 2 9 2 < l 3 10 3 < l 4 8 4 < l 5 5 5 < l 6 2 To write down the cumulative frequency table we keep adding up the frequency column as a running total.

Length (l cm) Cumulative Frequency 0 < l 1 4 0 < l 2 13 0 < l 3 23 0 < l 4 31 0 < l 5 36 0 < l 6 38 We can then plot these cumulative frequencies on a cumulative frequency diagram, plotting each cumulative frequency above the end of the interval. Cumulative Frequency Frog Length 40 35 30 25 20 15 10 5 0 0 1 2 3 4 5 6 the median Length of frog (cm) the 80 th percentile To the cumulative graph have been added dotted lines that show how to find the median and the 80 th percentile. The horizontal line for the median was positioned at 19, as ½ of 38 (the total) is 19. Following the line along and down, gives a median of approximately 2.62. The horizontal line for the 80 th percentile is positioned at 30.4, as 80% of 38 = 30.4. Following the line along and down, gives a 80 th percentile of approximately 3.85.

Box- plots are a way of showing the spread of the data in terms of quartiles. Draw a box plot for the following data: 5, 9, 10, 11, 11, 13, 16, 20, 24, 30, 31, 31, 33, 40, 44, 47, 60 First we find the median and the quartiles: 5, 9, 10, 11, 11, 13, 16, 20, 24, 30, 31, 31, 33, 40, 44, 47, 60 Lower Quartile = 11 Median = 24 Upper Quartile = 36.5 Also note that the minimum value is 5 and the maximum value is 60. 0 10 20 30 40 50 60 Skew The term skew is used to describe the shape of data. Symmetrical Positive Skew Negative Skew Mode = Median = Mean Mode < Median < Mean Mode > Median > Mean

Random Variables A random variable represents, in number form, the possible outcomes that could occur from some random experiment. A random variable can be either discrete or continuous. Discrete random variables can only have certain possible values (usually integers). For example: the number of dogs on one street; the number of music tracks on a laptop; etc. Continuous random variables can have all possible values on some interval (so includes decimals). Continuous random variables are usually measurements. For example: the heights of girls in a class; the area of leaves on a tree. For any random variable there is a probability distribution associated with it this gives the probability of each outcome (or set of outcomes). Discrete Probability Distributions The number of carrots that Cristina eats every day has the following probability distribution. x 0 1 2 3 4 5 P(x) 0.05 0.1 0.18 0.07 y y Find the value of y, and hence find the probability that Cristina will eat 3 or 4 carrots. The probabilities must add up to 1, therefore: 0.05 + 0.1 + 0.18 + 0.07 + y + y = 1 0.4 + 2y = 1 y = 0.3 Hence the probability that Cristina eats 3 or 4 carrots = 0.07 + 0.3 = 0.37 A random variable X has the probability distribution function: P(X = r) = k( 2r 1) r = 1, 2, 3, 4, 5, 6 0 otherwise Find the value of k and hence find the probability that X = 3.

We know that the only possible values that X can take are 1, 2, 3, 4, 5 and 6, so we substitute these into k(2r 1). r = 1 r = 2 r = 3 r = 4 r = 5 r = 6 k(2r 1) = k(2 1 1) = 1k k(2r 1) = k(2 2 1) = 3k k(2r 1) = k(2 3 1) = 5k k(2r 1) = k(2 4 1) = 7k k(2r 1) = k(2 5 1) = 9k k(2r 1) = k(2 6 1) = 11k But, all these probabilities must add up to 1, so 1k + 3k + 5k + 7k + 9k + 11k = 1 36k = 1 k = 1 36 Substituting this value for k back into the probability distribution function gives us the full probability distribution for each value that X can take. r 1 2 3 4 5 6 P(X = r) 1 36 3 36 5 36 7 36 9 36 11 36 So the probability that X = 3 is 5 36. Expectation of a Discrete Probability Distribution The expected value (mean) and standard deviation of a discrete probability distribution are given by: E(X) = µ = xp(x = x) x The formula is saying to multiply each possible value of X by its probability, then add together. The table below shows the probability distribution for X, the number of times Andrea annoys her teacher in one lesson. x 0 1 2 3 4 5 6 7 8 9 P(X = x) 0.01 0.01 0.02 0.04 0.10 0.11 0.15 0.16 0.17 0.23 The textbook also covers the standard deviation and variance of a probability distribution of a discrete random variable, but this is not required by the SL syllabus.

Find the value of E(X). It can help to write this out as a table. x 0 1 2 3 4 5 6 7 8 9 P(X = x) 0.01 0.01 0.02 0.04 0.10 0.11 0.15 0.16 0.17 0.23 x P(X = x) 0 0.01 0.04 0.12 0.40 0.55 0.9 1.12 1.36 2.07 Adding up the bottom row gives the mean E(X) = µ = xp(x = x) = 0 + 0.01 + 0.04 + + 1.36 + 2.06 x = 6.57 So, on average, you would expect Andrea to annoy her lovely mathematics teacher 6.57 times per lesson! The Binomial Distribution Conditions for the Binomial Distribution: A fixed number of independent trials; Each trial has only two possible results ( success and failure ); The probability for success of each trial is fixed. Notation: n = number of trials p = probability of success q = probability of failure If X is a random variable with a binomial distribution with n trials and probability of success p, then we write: X ~ B(n, p) To find the probability of r successes is found by the formula: P( X = r) = n C r p r q n r which can also be written as: P( X = r) = n r pr q n r The mean of a binomial distribution is given by µ = E(X) = np

José likes to smile at cats. Every time he smiles at a cat the probability it will scratch him is 0.3. One a given day José smiles at 10 cats. (i) What is the probability that José will be scratched by 2 cats? (ii) Find the expected numbers of cats that scratch José. This is a binomial distribution. Let X= number of cats that scratch José. And we know that: p = probability that José is scratched = 0.3 q = probability that José is not scratched = 0.7 n = 10 So X ~ B(10, 0.3) (i) P(X = 2) = 10 C 2 0.3 2 0.7 8 = 0.233 to 3 significant figures. (ii) E(X) = np = 10 0.3 = 3 Finding binomial probabilities using the TI- 84 Let X ~ B(20, 0.6). Find: (i) P(X = 13) (ii) P(X 10) (iii) P(X < 10) (iv) P(X 7) (v) P(X > 7) (vi) P(5 X 14) To calculate all these calculations we only need two functions on the TI- 84, both of which are found by pressing 2ND then VARS. The two functions are binompdf and binomcdf. binompdf for finding a probability, on a binomial distribution, of the outcome being equal to a given value. binomcdf for finding a probability, on a binomial distribution, of the outcome being less than or equal to ( ) a given value. (i) This is the only part of this example when we use binompdf. P(X = 13) = binompdf(20, 0.6, 13) = 0.166 to 3 s.f. n p r Note: If you use the Catalog Help on your TI- 84 Plus it will remind you in which order to type in these numbers when using binompdf or binomcdf.

(ii) We now use binomcdf. P(X 10) = binomcdf(20, 0.6, 10)= 0.245 to 3 s.f. n p r (iii) Note, that as the binomial is discrete, P(X < 10) = P(X 9), so we use: P(X < 10) = P(X 9) = binomcdf(20, 0.6, 9) = 0.128 (iv) P(X 7) = 1 P(X 6) = 1 binomcdf(20, 0.6, 6) = 1 0.006 = 0.994 (v) P(X > 7) = 1 P(X 7) = 1 binomcdf(20, 0.6, 7) = 1 0.021 = 0.978 (vi) P(5 X 14) = P(X 14) P(X 4) = binomcdf(20, 0.6, 14) binomcdf(20, 0.6, 4) = 0.8744 0.00317 = 0.8741 to 4 s.f. The Normal Distribution The normal distribution is a probability distribution for continuous random variables. It is used for many naturally occurring phenomena. These phenomena include height of trees, foot length, pregnancy time. It can also often be used for non- natural phenomena, such as test marks. If X is normally distributed with mean µ and standard deviation σ then we write X ~ N(µ, σ 2 ). The standard normal distribution has a mean = 0 and a standard deviation = 1 (i.e. Z ~N(0, 1 2 ) The transformation z = x µ is used to transform any normal distribution into the standard σ normal distribution. Note: As the normal distribution is continuous this means that P(X < a) = P(X a), where X is a normally distributed random variable.

The normal distribution graph (also known as the bell curve) is shown below. µ 3σ µ 2σ µ σ µ µ + σ µ + 2σ µ + 3σ 68% of all measurements lie between these two values 95% of all measurements lie between these two values 99.7% of all measurements lie between these two values A certain type of sheep has a mean weight of 40 kg and a standard deviation of 7 kg. Use the normal distribution tables to find the probability that a randomly chosen sheep has: (i) a weight less than 48.9 kg (ii) a weight less than 30 kg If, in a fit of madness, we decide to use the tables, then the values 48.9 and 30 need to be converted using the formula z = x µ σ, where µ = 40 and σ = 7. (i) P(X < 48.9) = P(Z < 48.9 40 7 ) = P(Z < 1.27) = 0.8980 (The value of 0.890 is taken from the table on page 7 of the SL formula book)

(ii) P(X < 30) = P(Z < 30 40 7 ) = P(Z < 1.42) However, the value of P(Z < 1.42) is not on the table, but as the normal distribution is symmetrical we can use the value for 1.42 and subtract it from 1 P(Z < 1.42) = 1 P(1.42) = 1 0.9222 = 0.0778 (0.9222 is from the table on page 7 of the formula booklet.) Normal Distribution Probabilities on the TI- 84 1 A machine produces screws. The lengths of these screws have a normal distribution with a mean of 19.8 cm and a standard deviation of 0.3 cm. A screw is selected at random from the machine. Find the probability that the screw is: (i) between 19.7 cm and 20 cm; (ii) less than 20.1 cm; (iii) more than 19.3 cm. To do both these calculations we use normalcdf, which is found on the DISTR menu (press 2ND then VARS). Note: Never use the normalpdf option it is utterly useless. The structure of normalcdf is as follows: normalcdf(lower- bound, upper- bound, mean, standard deviation) (i) P(19.7 X 20). So here the lower- bound is 19.7, the upper- bound is 20, them mean is 19.8 and the standard deviation is 0.3. So we type: normalcdf(19.7, 20, 19.8, 0.3) = 0.378 to 3 s.f. (ii) We have no lower- bound for this question, so we have to use a large negative value, usually 1000 will suffice P(X < 20.1) = normalcdf( 1000, 20.1, 19.8, 0.3) = 0.841 (iii) On this question we have no upper- bound, so we use a large positive value, usually 1000 will suffice P(X > 19.3) = normalcdf(19.3, 1000, 19.8, 0.3) = 0.9522

2 A population of adult alligators has a mean length of 3.5 m, with a standard deviation of 0.4 m. What is, with 95% probability, the longest length of alligator? Here we use the invnorm command in the DISTR menu. This function starts with a probability and works backward to find the value of the random variable which gives that probability. i.e. for which k does P(X k) = 0.95 when X ~N(3.5, 0.4). Type invnorm(0.95, 3.5, 0.4) into the calculator. This gives us 4.16 as an answer. probability mean Standard deviation So 95% of alligators are less than of equal to 4.16m in length.