Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19) Mean, Median, Mode Mode: most common value Median: middle value (when the values are in order) Mean = total how many = x n = fx f [For use on a frequency table] Note: the position of the median can be found using n +1, where is n how many numbers there are. 2 The mean uses all the data, so is normally the most useful average, but if there are extreme values then the median is far more useful. Find the mode, median and mean of 1, 1, 3, 5, 6, 7, 11, 2 Mode = 1 Median = 5.5 (half way between 5 and 6) Mean = 1+1+ 3 + 5 + 6 + 7 +11+ 2 8 = 54 8 = 6.75 Find the median of the data given in the following table. Number of arms 1 2 3 4 5 6 7 Frequency 3 7 9 1 11 5 3 1 The total frequency is 49. So the position of the median is n+1 2 = 49+1 2 =25. So we look for the 25 th number, this is done by adding up the frequencies until we have gone past the 25 th number. Number of arms 1 2 3 4 5 6 7 Frequency 3 7 9 1 11 5 3 1 3 1 19 29 So we went past the 25 th item whilst in the 3 category. Therefore the median is 3.

Calculate an estimate of the mean of the following. Weight (w kg) Frequency < w 1 7 1 < w 2 9 2 < w 3 12 3 < w 4 8 4 < w 5 6 5 < w 6 3 To find the mean, we need to use the midpoints of each interval in order to calculate the total. Weight (w kg) Frequency (f) Midpoint (x) f x < w 1 7 5 35 1 < w 2 9 15 135 2 < w 3 12 25 3 3 < w 4 8 35 28 4 < w 5 6 45 27 5 < w 6 3 55 165 How many = Σf = 45 Total = Σfx = 1185 So estimate of mean = Total How Many = fx x = 1185 45 = 26.3 Finding the median and mean using the TI- 84 Find the median and mean of 3, 6, 7, 9, 9, 11 1.) Press the STAT button, then select the edit option. 2.) If necessary clear list 1 (L 1), then enter the data into list 1. 3.) Press the STAT button again, then choose the calc option, then choose 1- Var Stats. Then press enter. 4.) The mean is given as x =7.5. 5.) Scroll down to find the median (Med = 8). Find the median and mean of the following data Number of cats Frequency 7 1 9 2 11 3 8 4 6 5 2

1.) Press the STAT button, then select the edit option. 2.) If necessary clear list 1 and list 2 (L 1 & L 2), then enter the first column of the table into list 1, and the frequencies into list 2 3.) Press the STAT button again, then choose the calc option, then choose 1- Var Stats, but do not press enter. 4.) The screen should now say 1- Var Stats. Now select L 1, by pressing 2ND and 1, then press the comma button, then select L 2, by pressing 2ND and 2. Press enter. 5.) The mean is given as x =2.9 6.) Scroll down to find the median (Med = 2). Measuring the spread of data Range = largest smallest Interquartile Range = Upper Quartile Lower Quartile Standard Deviation = ( x x) 2 n = x 2 x 2 n [The second version of the formula is easier to use, but only the first is given in the IB formula book!!] Note: the interquartile range may look straightforward, but it s not. How to find a quartile is not clearly defined. (Also this is not the only definition of standard deviation ) The standard deviation is the most useful measure of spread as it uses all the values. But if there are extreme values, then the interquartile range is more useful. Find the range, interquartile range and standard deviation of 8, 9, 1, 11, 12, 13, 15. The range is 15 8 = 7. To find the interquartile range, we first find the quartiles 8 9 1 11 12 13 15 So Interquartile range = 13 9 = 4 To find the standard deviation, we first find the mean. x = Lower Quartile 8 + 9 +1 +11+12 +13 +15 7 Median = 11.142 Upper Quartile So: 2 ( x x) = ( 8 11.142) 2 +... + ( 15 11.142) 2 = 34.84 Therefore: standard deviation = ( x x) 2 n = 34.84 7 = 2.23

Find the standard deviation of the data presented in the following table. Number of eggs Frequency 4 2 5 9 6 1 14 9 15 19 3 We need lots of extra columns to work out the standard deviation. It can be helpful to use a slightly different version of the formula Standard deviation = ( x x) 2 n = ( ) 2 f x x f Number of eggs Frequency Mid point (x) fx (x x ) 2 f(x x ) 2 4 2 2 4 68.5625 136.1125 5 9 6 7 42 1.5625 63.375 1 14 9 12 18 3.625 27.5625 15 19 3 17 51 45.5625 136.6875 n = Σf = 2 Σfx = 25 Σf(x x ) 2 = 363.7375 The mean ( x ) is 1.25, using the technique used in a previous example. So now we can work out the standard deviation Standard deviation = ( ) 2 f x x f = 363.7375 2 = 4.26 (This is why we must be thankful for the statistical functions on calculators.) Finding the standard deviation using the TI- 84 Follow the steps outlined in the section Finding the median and mean using the TI- 84. The standard deviation is given by σ x =. Sampling from a population If x is the mean of a sample, then x is an unbiased estimate of the mean of the population (µ). If s n 2 2 is the variance of a sample, then s n 1 the population (σ 2 ). = n 2 n 1 s n is an unbiased estimate of the variance of

Statistical Graphs The histogram shown below shows that, for example, eight chickens weighed between 15kg and 2kg. Weight of some very fat chickens 1 Frequency 8 6 4 2 1 15 2 25 3 35 Weight (kg) Note: Histograms in IB are simpler than at IGCSE; all class widths are equal and you put frequency up the side (not frequency density). The stem- and- leaf diagram show you the original data. 7 1 3 5 6 8 2 2 4 6 8 9 5 5 7 9 1 1 7 11 3 So this means 8.6 Key: 7 1 means 7.1 This stem- and- leaf diagram shows you, for example that 11.3 is the largest number. A cumulative frequency diagram can be used to find the median, quartiles and percentiles of grouped data. The table below shows the length of some frogs caught at Barra Honda National Park. Length (l cm) Frequency < l 1 4 1 < l 2 9 2 < l 3 1 3 < l 4 8 4 < l 5 5 5 < l 6 2 To write down the cumulative frequency table we keep adding up the frequency column as a running total.

Length (l cm) Cumulative Frequency < l 1 4 < l 2 13 < l 3 23 < l 4 31 < l 5 36 < l 6 38 We can then plot these cumulative frequencies on a cumulative frequency diagram, plotting each cumulative frequency above the end of the interval. Cumulative Frequency Frog Length 4 35 3 25 2 15 1 5 1 2 3 4 5 6 the median Length of frog (cm) the 8 th percentile To the cumulative graph have been added dotted lines that show how to find the median and the 8 th percentile. The horizontal line for the median was positioned at 19, as ½ of 38 (the total) is 19. Following the line along and down, gives a median of approximately 2.62. The horizontal line for the 8 th percentile is positioned at 3.4, as 8% of 38 = 3.4. Following the line along and down, gives a 8 th percentile of approximately 3.85.

Box- plots are a way of showing the spread of the data in terms of quartiles. Draw a box plot for the following data: 5, 9, 1, 11, 11, 13, 16, 2, 24, 3, 31, 31, 33, 4, 44, 47, 6 First we find the median and the quartiles: 5, 9, 1, 11, 11, 13, 16, 2, 24, 3, 31, 31, 33, 4, 44, 47, 6 Lower Quartile = 11 Median = 24 Upper Quartile = 36.5 Also note that the minimum value is 5 and the maximum value is 6. 1 2 3 4 5 6 Skew The term skew is used to describe the shape of data. Symmetrical Positive Skew Negative Skew Mode = Median = Mean Mode < Median < Mean Mode > Median > Mean

Random Variables A random variable represents, in number form, the possible outcomes that could occur from some random experiment. A random variable can be either discrete or continuous. Discrete random variables can only have certain possible values (usually integers). For example: the number of dogs on one street; the number of music tracks on a laptop; etc. Continuous random variables can have all possible values on some interval (so includes decimals). Continuous random variables are usually measurements. For example: the heights of girls in a class; the area of leaves on a tree. For any random variable there is a probability distribution associated with it this gives the probability of each outcome (or set of outcomes). Discrete Probability Distributions The number of carrots that Cristina eats every day has the following probability distribution. x 1 2 3 4 5 P(x).5.1.18.7 y y Find the value of y, and hence find the probability that Cristina will eat 3 or 4 carrots. The probabilities must add up to 1, therefore:.5 +.1 +.18 +.7 + y + y = 1.4 + 2y = 1 y =.3 Hence the probability that Cristina eats 3 or 4 carrots =.7 +.3 =.37 A random variable X has the probability distribution function: P(X = r) = k( 2r 1) r = 1, 2, 3, 4, 5, 6 otherwise Find the value of k and hence find the probability that X = 3.

We know that the only possible values that X can take are 1, 2, 3, 4, 5 and 6, so we substitute these into k(2r 1). r = 1 r = 2 r = 3 r = 4 r = 5 r = 6 k(2r 1) = k(2 1 1) = 1k k(2r 1) = k(2 2 1) = 3k k(2r 1) = k(2 3 1) = 5k k(2r 1) = k(2 4 1) = 7k k(2r 1) = k(2 5 1) = 9k k(2r 1) = k(2 6 1) = 11k But, all these probabilities must add up to 1, so 1k + 3k + 5k + 7k + 9k + 11k = 1 36k = 1 k = 1 36 Substituting this value for k back into the probability distribution function gives us the full probability distribution for each value that X can take. r 1 2 3 4 5 6 P(X = r) 1 36 3 36 5 36 7 36 9 36 11 36 So the probability that X = 3 is 5 36. Expectation and Variance of a Discrete Probability Distribution The expected value (mean) of a discrete probability distribution is given by: E(X) = µ = xp(x = x) x The formula is saying to multiply each possible value of X by its probability, then add together. The variance is given by: Var(X) = E X 2 ( ) E X ( ) 2 Remember, to get from the variance to the standard deviation you square root the variance. [It is possible to calculate E(X) and Var(X) using the TI- 84.]

Linear Coding If a, b are real numbers then: and E(aX + b) = ae(x) + b Var(aX + b) = a 2 E(X) The table below shows the probability distribution for X, the number of times Anna annoys her teacher in one lesson. x 1 2 3 4 5 6 7 8 9 P(X = x).1.1.2.4.1.11.15.16.17.23 Find the value of E(X) and Var(X). It can help to write this out as a table. x 1 2 3 4 5 6 7 8 9 P(X = x).1.1.2.4.1.11.15.16.17.23 x P(X = x).1.4.12.4.55.9 1.12 1.36 2.7 Adding up the bottom row gives the mean E(X) = µ = xp(x = x) = +.1 +.4 + + 1.36 + 2.6 x = 6.57 So, on average, you would expect Anna to annoy her lovely mathematics teacher 6.57 times per lesson. To find the variance, we first need to find the expected value of X 2. x 1 2 3 4 5 6 7 8 9 P(X = x).1.1.2.4.1.11.15.16.17.23 x 2 1 4 9 16 25 36 49 64 81 x 2 P(X = x).1.8.36 1.6 2.75 5.4 7.84 1.88 18.63 Adding up the bottom row gives E(X 2 ) = µ = x 2 P(X = x) = +.1 + + 1.88 + 18.63 = 47.55 x So: Var(X) = E(X 2 ) [E(X)] 2 = 47.55 6.57 2 = 4.3851

The Binomial Distribution Conditions for the Binomial Distribution: A fixed number of independent trials; Each trial has only two possible results ( success and failure ); The probability for success of each trial is fixed. Notation: n = number of trials p = probability of success q = probability of failure If X is a random variable with a binomial distribution with n trials and probability of success p, then we write: X ~ B(n, p) To find the probability of r successes is found by the formula: P( X = r) = n C r p r q n r which can also be written as: P( X = r) = n r pr q n r The mean of a binomial distribution is given by µ = E(X) = np The variance of a binomial distribution is given by Var(X) = np(1 p) José likes to smile at cats. Every time he smiles at a cat the probability it will scratch him is.3. One a given day José smiles at 1 cats. (i) What is the probability that José will be scratched by 2 cats? (ii) Find the expected numbers of cats that scratch José. This is a binomial distribution. Let X= number of cats that scratch José. And we know that: p = probability that José is scratched =.3 q = probability that José is not scratched =.7 n = 1 So X ~ B(1,.3) (i) P(X = 2) = 1 C 2.3 2.7 8 =.233 to 3 significant figures. (ii) E(X) = np = 1.3 = 3

Finding binomial probabilities using the TI- 84 Let X ~ B(2,.6). Find: (i) P(X = 13) (ii) P(X 1) (iii) P(X < 1) (iv) P(X 7) (v) P(X > 7) (vi) P(5 X 14) To calculate all these calculations we only need two functions on the TI- 84, both of which are found by pressing 2ND then VARS. The two functions are binompdf and binomcdf. binompdf for finding a probability, on a binomial distribution, of the outcome being equal to a given value. binomcdf for finding a probability, on a binomial distribution, of the outcome being less than or equal to ( ) a given value. (i) This is the only part of this example when we use binompdf. P(X = 13) = binompdf(2,.6, 13) =.166 to 3 s.f. n p r Note: If you use the Catalog Help on your TI- 84 Plus it will remind you in which order to type in these numbers when using binompdf or binomcdf. (ii) We now use binomcdf. P(X 1) = binomcdf(2,.6, 1)=.245 to 3 s.f. n p r (iii) Note, that as the binomial is discrete, P(X < 1) = P(X 9), so we use: P(X < 1) = P(X 9) = binomcdf(2,.6, 9) =.128 (iv) P(X 7) = 1 P(X 6) = 1 binomcdf(2,.6, 6) = 1.6 =.994 (v) P(X > 7) = 1 P(X 7) = 1 binomcdf(2,.6, 7) = 1.21 =.978 (vi) P(5 X 14) = P(X 14) P(X 4) = binomcdf(2,.6, 14) binomcdf(2,.6, 4) =.8744.317 =.8741 to 4 s.f.

The Poisson Distribution Conditions for Poisson Distribution: Events are randomly scattered in time; The events have a mean (m) number of occurrences in a given interval of time (or space); The mean remains constant and the number of events is a whole number; The probability of more than one occurrence in a given interval is small; The number of occurrences in two separate intervals are independent of each other. If X is a random variable on the Poisson distribution with mean m, we write X ~ Po(m) and we can find the probability of any value of X by using the formula: P( X = x) = m x e m x! for x =, 1, 2, 3, 4, Note: E(X) = m & Var (X) = m Juan likes to watch traffic. He stands outside his house and records the time of each yellow car that passes by. In any one hour period, the mean amount of cars is 4.5. Find the probability that the number of cars that pass by in a given hour will be exactly 3. Let X = number of cars passing by in a given hour. So X ~ Po(4.5) P(X = 3) = m x e m x! = 4.53 e 4.5 3! =.169 (m = mean = 4.5) Finding Poisson Probabilities on the TI- 84 Like the Binomial there are two functions for finding probabilities on the Poisson distribution (both in the DISTR menu). poissonpdf(mean, r) poissoncdf(mean, r) - to find the probability that X = r - to find the probability that X r

Continuous Probability Density Functions A continuous probability density function (pdf) is a function f(x) such that f(x) on a given interval [a, b] and f(x)dx = 1. a b For a continuous probability density function: Mode the value of x when f(x) is at a maximum Median this is the solution for m of the equation m f(x)dx = 1 a 2 Mean = µ = E(X) = Variance = E X 2 a b ( ) E X xf(x)dx ( ) 2 = x 2 f(x)dx µ 2 a b Let f(x) = ax( x 4) x 4 otherwise Find: (i) the value of a (ii) the median (iii) the mean. (i) The sum of the probabilities is 1, therefore ( ) ax x 4 dx = 1 4 4 ( ax 2 4ax) dx =1 ax 3 3 2ax2 64a 3 32a 4 = 1 ( ) = 1 32a 3 = 1 a = 3 32

(ii) Part (i) tells us that the pdf is f ( x) = 3 32 x( x 4) = 3 32 x2 + 3 8 x To find the median we solve: m 3 32 x2 + 3 8 x dx = 1 2 32 + 3x2 16 x3 m = 1 2 32 + 3m2 16 + m3 ( ) = 1 2 m3 32 + 6m2 32 = 1 2 m 3 + 6m 2 32 = 1 2 2m 3 + 12m 2 = 32 [Use a TI- 84 to solve] m = 2, 5.46, 1.464 Only m = 2 is in the interval m 4, so the median must be 2. (iii) E(X) = 4 4 xf(x)dx = x 3 32 x2 + 3 8 x dx 4 = 3x3 32 + 3x2 8 dx = 3x 4 128 + x3 8 4 = 128 + 43 8 3 44 ( ) = 2

The Normal Distribution The normal distribution is a probability distribution for continuous random variables. It is used for many naturally occurring phenomena. These phenomena include height of trees, foot length, pregnancy time. It can also often be used for non- natural phenomena, such as test marks. If X is normally distributed with mean µ and standard deviation σ then we write X ~ N(µ, σ 2 ). The standard normal distribution has a mean = and a standard deviation = 1 (i.e. Z ~N(, 1 2 ) The transformation z = x µ is used to transform any normal distribution into the standard σ normal distribution. Note: As the normal distribution is continuous this means that P(X < a) = P(X a), where X is a normally distributed random variable. The normal distribution graph (also known as the bell curve) is shown below. µ 3σ µ 2σ µ σ µ µ + σ µ + 2σ µ + 3σ 68% of all measurements lie between these two values 95% of all measurements lie between these two values 99.7% of all measurements lie between these two values

A certain type of sheep has a mean weight of 4 kg and a standard deviation of 7 kg. Use the normal distribution tables to find the probability that a randomly chosen sheep has: (i) a weight less than 48.9 kg (ii) a weight less than 3 kg If, in a fit of madness, we decide to use the tables, then the values 48.9 and 3 need to be converted using the formula z = x µ σ, where µ = 4 and σ = 7. (i) P(X < 48.9) = P(Z < 48.9 4 7 ) = P(Z < 1.27) =.898 (The value of.89 is taken from the table on page 13 of the HL formula book) (ii) P(X < 3) = P(Z < 3 4 7 ) = P(Z < 1.42) However, the value of P(Z < 1.42) is not on the table, but as the normal distribution is symmetrical we can use the value for 1.42 and subtract it from 1 P(Z < 1.42) = 1 P(1.42) = 1.9222 =.778 (.9222 is from the table on page 13 of the formula booklet.) Normal Distribution Probabilities on the TI- 84 1 A machine produces screws. The lengths of these screws have a normal distribution with a mean of 19.8 cm and a standard deviation of.3 cm. A screw is selected at random from the machine. Find the probability that the screw is: (i) between 19.7 cm and 2 cm; (ii) less than 2.1 cm; (iii) more than 19.3 cm. To do both these calculations we use normalcdf, which is found on the DISTR menu (press 2ND then VARS). Note: Never use the normalpdf option it is utterly useless. The structure of normalcdf is as follows: normalcdf(lower- bound, upper- bound, mean, standard deviation) (i) P(19.7 X 2). So here the lower- bound is 19.7, the upper- bound is 2, them mean is 19.8 and the standard deviation is.3. So we type:

normalcdf(19.7, 2, 19.8,.3) =.378 to 3 s.f. (ii) We have no lower- bound for this question, so we have to use a large negative value, usually 1 will suffice P(X < 2.1) = normalcdf( 1, 2.1, 19.8,.3) =.841 (iii) On this question we have no upper- bound, so we use a large positive value, usually 1 will suffice P(X > 19.3) = normalcdf(19.3, 1, 19.8,.3) =.9522 2 A population of adult alligators has a mean length of 3.5 m, with a standard deviation of.4 m. What is, with 95% probability, the longest length of alligator? Here we use the invnorm command in the DISTR menu. This function starts with a probability and works backward to find the value of the random variable which gives that probability. i.e. for which k does P(X k) =.95 when X ~N(3.5,.4). Type invnorm(.95, 3.5,.4) into the calculator. This gives us 4.16 as an answer. probability mean Standard deviation So 95% of alligators are less than of equal to 4.16m in length.