Statistics 511 Additional Materials

Discrete Random Variables In this section, we introduce the concept of a random variable or RV. A random variable is a model to help us describe the state of the world around us. Roughly, a RV can be thought of as the value that is assigned to the outcome of an experiment. There are different RVs depending upon the type of quantity that we are trying to model. In this section we will introduce some random variables for experiments whose outcome is a whole number. Sample and Population notation Before we begin with random variables, it is important that we specify some new notation that differentiates the data analysis that we were doing earlier and what we will do throughout the rest of the course. We are going to differentiate between the population mean and the sample mean. The population mean is the mean or average of all the elements of the population. The sample mean is the average for a subset of that population. Throughout the rest of the course we will be explicit. Thus we need notation to demarcate between these two ideas. For a variable X, the population mean will be denoted by the Greek symbol mu with a subscript of x, µ x. For the sample mean, we have already seen that we will use the symbol x with a bar over it or x. We will do the same for the standard deviation. For a variable X, the population standard deviation will be the Greek letter sigma with a subscript of x, σ x. The sample standard deviation will be denoted by the letter s with a subscript of x, s x. Quantity Sample Statistic Population Parameter Mean x estimates µ x Standard Deviation s x estimates σ x Example Suppose that the population is made up of 25 students in a Stat 215 lab and the variable that we are interested in is the number of siblings for each student, call it Y. y:4,3,1,1,5,2,2,4,0,2,1,2,3,0,2,4,4,6,0,0,1,3,3,1,3 The population mean µ y is 2.28 and the population standard deviation is σ y = 1.646. If we take a sample from this population of 5 individuals, we might get the observed data 1,2,2,3,0. The mean of this sample is y =1.6 and the sample standard deviation is s y =1.14. If another samples is drawn then we might get the observed data 4, 1, 0, 2, 4. The mean of this sample is y =2.1 and the sample standard deviation is s y = 1.79 Page 1 of 6

Random Variables Definition: A random variable or RV is a variable whose value is determined by the outcome of an experiment. Examples: W = the number of tomatoes that a plant will produce Y = the inflation rate for the U.S. in the fourth quarter of this year Z = the number of people who vote in the next presidential election T = number of tablets in a bottle of aspirin K = number of potatoes in a 5 pound bag of potatoes B = time between phone calls for a computer helpline There are two types of random variables discrete and continuous. Definition: A random variable is called discrete if it can only take a countable number of values. In the list of RVs above, W, Z, T, and K are all discrete random variables. Definition: A random variable is called continuous if it can take values inside an interval. In the list of RVs above, Y, and B are continuous random variables. Definition: A probability distribution (sometimes simply called a distribution) is the mechanism that assigns probabilities to each value that the random variable can assume. Discrete Random Variables In this section we will focus on discrete random variables. Notation: For a discrete RV X, we will use as shorthand P(X=x) to represent the probability that the random variable X takes the value x. Often we will make this even shorter and simply write P(x). For a six-sided die with the values {1, 2, 3, 4, 5, 6} on its faces, let X be the value that appears on the upward face when the die is rolled. Then P(2) or P(X=2) represents the probability that X will be 2. Rules There are two rules for all discrete probability distributions: Page 2 of 6

1. 0 P(X=x) 1 for each value of x. 2. Σ P(X=x) = 1 These rules imply that each outcome has a probability between 0 and 1 and that if we add up the probabilities of all the possible outcomes their sum must be 1. We can use several methods for describing the distribution of a random variable X 1. Table For the table each possible value that the RV may take is listed and beside it is listed the probability associated with that value. y 2 3 4 P(Y=y) 0.3 0.2 0.5 From this table we can see that the probability that the RV Y takes the value 3 is 0.2 or P(Y=3) =0.2. We can also note that P(Y=2) = 0.3. 2. Line graph The second possible way to represent a distribution is through a line graph. For a line graph each possible value that the RV can take is listed on the x-axis. Vertical lines are drawn up from each possible value so that the height of those lines corresponds to the probability of that value. For the RV W w 2 3 4 5 P(W=w) 0.4 0.1 0.2 0.3 The line graph for this distribution of W looks like the following: P(W=w) 0.30 0.10 1 2 3 4 5 w Page 3 of 6

3. Formula Lastly we can use a formula to describe the distribution of a random variable. Under this method, if we know certain quantities, usually called parameters, then we can calculate the probability of a particular value for the RV. P(X=x) = v(1-v) x-1 for x = 1, 2, 3,... If we know what v is we can then calculate the probabilities for each value of x. In this example the parameter we would need to know would be v. Calculating combined probabilities We are often interested in calculating probabilities other than those of the simple events of the two preceding sections. There we focused on calculating the probability of a single value. There are two possibilities. First for something like P(X<3) we could add up probabilities of the simple events that make up this probability. P(X<3) = P(X=0)+P(X=1)+P(X=2). Each one of these would have to be calculated using the formula that was appropriate. It gets worse for something like P(X 15) = P(X=0) + P(X=1) + P(X=2) + + P(X=15). Theoretical Mean and Standard Deviation of a Discrete Probability Distribution One of the important properties of a probability distribution is: If we know that a discrete RV possess a particular probability distribution, we no longer need to estimate the theoretical mean µ x by calculating the sample mean x. We can calculate the exact value of µ x using the information in the probability distribution of X. Similarly, we no longer need to estimate the theoretical standard deviation σ x by calculating the sample standard deviation s x. We can calculate the exact value of σ x using the information in the probability distribution of X. Page 4 of 6

Formulas Theoretical Mean: µ x = Σx * p(x) Theoretical Standard Deviation: σ x = Σ(x µ) 2 p(x) or σ x = [Σx 2 p(x)] u 2 Suppose that RV X has the probability distribution X 1 2 3 4 P(X) 0.1 0.4 0.2 0.3 The theoretical mean is µ x = Σx * p(x) = (1*0.1) + (2*0.4) + (3*0.2) +*(4 *0.3) = 0.1+ 0.8 + 0.6 +1.2 = 2.7 The theoretical variance is σ x 2 = Σ(x µ) 2 * p(x) = (1 2.7) 2 *0.1+ (2 2.7) 2 *0.4 + (3 2.7) 2 *0.2 + (4 2.7) 2 *0.3 = 0.289 + 0.196 + 0.018 + 0.507 =1.01 The theoretical standard deviation is σ x = σ x 2 Page 5 of 6

= 1.01 =1.00498756 Note that we could have computed the theoretical variance as σ 2 x = [Σx 2 p(x)] u 2 = [1 2 *0.1+ 2 2 *0.4 + 3 2 * 0.2 + 4 2 * 0.3] 2.7 2 = [0.1+1.6 +1.8 + 4.8] 7.29 = 8.3 7.29 =1.01 Page 6 of 6