Chapter 7 Random Variables
Making quantifiable meaning out of categorical data Toss three coins. What does the sample space consist of? HHH, HHT, HTH, HTT, TTT, TTH, THT, THH In statistics, we are most interested in numerical outcomes. I don t see any numbers here how could we explore patterns and make predictions when the sample space doesn t seem to be numerical?
Two types of Random Variable Probability Models Random Variable Probability Models: 1. Each probability is between 0 and 1 2. The sum of the probabilities of all events is 1. Discrete Variable: Has a countable number of possible values (aka outcomes = X = integers) Continuous Variable: Takes all values in an interval of numbers (aka outcomes = X = any value)
First type of random variable probability model: Discrete variables Discrete Variables Probability Distribution organized in table Probabilities of each outcome plotted in histogram X: define outcome All listed here P(X) Corresponding listed here X-axis outcomes, Y-axis Probability
Example 1: Create the probability distribution for the sum when two dice are tossed. Create a histogram of these results.
Example 2: NHL Goals In 2010, there were 1319 games played in the National Hockey League s regular season. Imagine selecting one of these games at random and then randomly selecting one of the two teams that played in the game. Define the random variable X = number of goals scored by a randomly selected team in a randomly selected game. The table below gives the probability distribution of X: Goals: 0 1 2 3 4 5 6 7 8 9 Probabilit y 0.061 0.154 0.228 0.229 0.173 0.094 0.041 0.015 0.004 0.001 a. Show that the probability distribution for X is legitimate. b. Make a histogram of the probability distribution. Describe what you see. c. What is the probability that the number of goals scored by a randomly selected team in a randomly selected game is at least 6?
Example 2 Answers (a) Show that the probability distribution for X is legitimate. Sum should be 1 (b) Make a histogram of the probability distribution. Describe what you see. Skewed right Most of the time, 2-3 goals are made per game Scoring 8-9 goals is rare (c) What is the probability that the number of goals scored by a randomly selected team in a randomly selected game is at least 6? P(X 6) = 0.041+0.015+0.004+0.001 = 0.061
Mean, or expected value, of Discrete Random Variables Mean of a sample is X bar; Mean of a probability distribution is μ x Mean of random variable x has to take into consideration that all outcomes are not equally likely Each outcome is weighted by its probability
Example 3: Mean of rolling a pair of dice Based on the probability distribution from Example 1, calculate the mean sum when rolling a pair of dice μ x = 10.5 The mean of a random variable is the longrun average value of the variable after many repetitions of the chance process. So in this case, if two dice are infinitely tossed the sum would average to be 10.5. http://digitalfirst.bfwpub.com/stats_applet/stats _applet_11_largenums.html
Law of Large Numbers The law of large numbers says that with the increase of trials, the actual mean outcome will get closer and closer to μ. The law doesn t say how many trials are needed; it depends on the variability of the random outcomes. The more variable the outcomes, the more trials are needed to ensure the computed mean is close to the theoretical.
In Gambling It is because of the Law of Large numbers that the house always wins. There seems to be unpredictability for the gambler, who can afford a handful of bets to play. To him/her, the thrill of the unexpected can be enticing. The house, on the other hand, has thousands and thousands of bets/trials daily, and knows the expected mean of each game s distribution. Though the individual games may vary, over the course of these thousands of bets, they can predict how much money they will make/gamblers will lose.
Example 4: Winning and Losing at Roulette There are 38 slots numbered 1 through 36, plus 0 and 00. Half of the slots are red, the other half are black, and 0 and 00 are green. A player places a $1 bet on red. If the ball lands in a red slot, the player gets the original dollar back, plus an additional dollar. If the ball lands on a differentcolored slot, the player loses the dollar bet. What is the player s average gain?
What s the player s average gain? a. What are the possible outcomes for this scenario? Lose a dollar, win a dollar b. What are the probabilities with each of these outcomes? 20/38, 18/38 c. To find the average gain, it s not just the average of the two outcomes, since it s more likely that a player will lose a dollar than gain a dollar. In the long-run, the player gains a dollar 18 times out of every 38 games played, and loses $1 the other 20 times. So to calculate the average gain, find the expected value. What do you get? -$0.05: in the long run, the player loses (and the casino gains) five cents per bet.
Variance of a discrete random Variable Variance of a random variable X notated as σ 2 x (different from variance of a sample which is notated as s 2 ). Just like with a set of data, variance is found by subtracting the value of the mean from each outcome, square that difference, and take the square root if you want to find standard deviation. In probability distributions, the difference is multiplied by its corresponding probability.
Using your TI s to find mean and variance of probability distribution 1. Enter the values of the random variable in L1 2. Enter corresponding probabilities in L2 3. You can make a histogram with Xlist= L1 and Freq: L2 4. To calculate the mean and standard deviation, use command 1-Var Stats L1, L2
Practice! Example 5: NHL Goals In 2010, there were 1319 games played in the National Hockey League s regular season. Imagine selecting one of these games at random and then randomly selecting one of the two teams that played in the game. Define the random variable X = number of goals scored by a randomly selected team in a randomly selected game. The table below gives the probability distribution of X: Goals: 0 1 2 3 4 5 6 7 8 9 Probabilit y 0.061 0.154 0.228 0.229 0.173 0.094 0.041 0.015 0.004 0.001 d. Find and interpret the mean of the random variable X in context. e. Find and interpret the standard deviation of the random variable X in context.
Practice Solutions d. Find and interpret the mean of the random variable X in context. The mean njmber of goals for a randomly selected team in a randomly selected game is 2.851. (If you were to repeat the process over and over again, the mean number of goals scored would be about 2.851 in the long run.) e. Find and interpret the standard deviation of the random variable X in context. On average, a randomly selected team s number of goals in a randomly selected game will differ from the mean by about 1.63 goals.
Second type of random variable probability model: Continuous variables Random Variable Probability Models Discrete Variable: Continuous Variable: Takes all values in an interval of numbers Because it covers an interval of numbers, X can assume any value within those boundaries, and cannot be organized in a table Probability distribution of X is described by a density curve; the probability of any event is the area under the density curve and above the values of X.
Using Density Curves to calculate probability The event displayed (the shaded area) is finding the probability of selecting a random number between 0 and 1; Example 6: Draw a similar diagram to accompany finding P(0.3 X < 0.7.)
Differences between variables 1. Probability model for a continuous variable assigns probabilities to INTERVALS of outcomes- not individual outcomes! With continuous random variables, there is no such thing as finding the probability of x = 8. You can find the probability of x > 8 or x 8, but finding the probability of when x = 8 will be zero. This is because with continuous random variables, to find the probability you find area under the curve, and there is no real area for a single value on a curve. 2. We can ignore the distinction between > and in continuous but not discrete variables. Because of the insignificance of a single value among an interval for continuous variables, finding P(x > 8 ) and P(x 8) is indistinguishably different.
Normal Distributions The density curve we are most familiar with is a Normal Curve. Remember z = (x-μ)/σ Remember Z-score interprets data as the number of standard deviations above or below the mean. Remember by interpreting any data as a z-score, you can find the area under a normal distribution curve, which equates to probability of data being within these values. Remember N(μ, σ) is our notation for a normal distribution. Remember the mean is the point at which the area under the density curve would balance if it were made out of solid material (center if symmetrical)
Example 6: Cheating in School The true population probability that a student will report on another cheating student is 12% Based on a survey of 400 random students, done many times, we expect the average probability to get close to this 0.12 value. (with a standard deviation of 0.016) What is the probability that if I conduct a survey my result will be within the true probability by 2%?
Solution Example 6 Draw a diagram Use the given values (that μ=0.12, σ=0.016, and x = 0.10 and 0.14) to interpret these as z-scores. Beyonce will help you find the areas associated with each z-score. Based on the areas you find, determine the desired area to help you answer the question. If the result is less than.10 or greater than.14 P(From table A), P(-1.25 Z 1.25) = 0.8944 and 0.1056 Subtract areas =.8944 -.1056 =.7888 79% chance that a survey will produce a result within the true probability by 2%.
Example 7: Heights of women The heights of young women closely follow the normal distribution with mean μ= 64 inches and standard deviation σ=2.7 inches. If a young woman is chosen at random, find the probability that the chosen woman is between 68 and 70 inches tall. Solution: draw a diagram. There s about a 5.6% chance that a randomly chosen young woman has a height between 68 and 70 inches.