Shifting our focus. We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

Probability

Introduction

Shifting our focus We were studying statistics (data, displays, sampling...) The next few lectures focus on probability (randomness) Why?

What is Probability? Probability is used to quantify uncertainty about the future. Probability is a way to model randomness

Why Study Probability in a Statistics Course? Probability is the language with which we express and interpret assessment of uncertainty in a formal statistical analysis. We take a random sample from a population. Board: Statistical Cartoon (random sample).

3.2: Random Sampling

Random Sampling Most of the formal methods of statistical inference we will use in this class are based on the assumption that the individual units in the sample are sampled at random from the population of interest. (Ignore for the present that in practice, individuals are almost never sampled at random, in a very formal sense, from the population of interest.)

Simple Random Sampling Taking a simple random sample of size n is equivalent to the process of: 1 representing every individual from a population with a single ticket; 2 putting the tickets into large box; 3 mixing the tickets thoroughly; 4 drawing out n tickets without replacement.

Simple Random Sampling The defining characteristic of the process of simple random sampling is that every possible sample of size n has the same chance of being selected. In particular, this means that (a) every individual has the same chance of being included in the sample; and that (b) members of the sample are chosen independently of each other.

Other Random Sampling Strategies Stratified random sampling and cluster sampling are examples of random sampling processes that are not simple. Data analysis for these types of sampling strategies go beyond the scope of this course.

Inference from Samples to Populations Statistical inference involves making statements about populations on the basis of analysis of sampled data. The Simple random sampling model is useful because it allows precise mathematical description of the random distribution of the discrepancy between statistical estimates and population parameters. This is known as chance error due to random sampling. When using the random sampling model, it is important to ask what is the population to which the results will be generalized? The use statistical methods that assume random sampling on data that is not collected as a random sample is prone to sampling bias, in which individuals do not have the same chance of being sampled. Sampling bias can lead to incorrect statistical inferences because the sample is unrepresentative of the population in important ways.

Examples For you You should read the following Examples (p.75-76): Example 3.2 - Example 3.7 Further examples of sampling procedures and sampling bias.

3.3: Introduction to Probability

Outcome Space Definition The outcome space is the set of possible simple outcomes from a random experiment (often denoted by Ω). Example: In a single die roll, the set of possible outcomes is: Ω = {1, 2, 3, 4, 5, 6}

Events Definition An event is a set of possible outcomes. Example: In a single die roll, possible events include: A = the die roll is even B = the die roll is a 6 C = the die roll is 4 or less

Probability Definition The probability of an event E, denoted P {E}, is a numerical measure between 0 and 1 that represents the likelihood of the event E in some probability model. Probabilities assigned to events must follow a number of rules. Note: 1 Probability is a numerical measure of the likelihood of an event. 2 Probabilities are always between 0 and 1, inclusive. 3 Notation: The probability of an event E is written P {E} Example: The probability P {the die roll is a 6} equals 1/6 under a probability model that gives equal probability to each possible result, but could be different under a different model.

Examples 1 If a fair coin is tossed, the probability of a head is P {Heads} = 0.5 2 If bucket contains 34 white balls and 66 red balls and a ball is drawn at random, the probability that the drawn ball is white is P {white} = 34/100 = 0.34

Probability Rules Non-negativity: For any event E, 0 P {E} 1. Outcome space: The probability of the event of all possible outcomes is 1. Complements: P {E c } = 1 P {E}.

3.7: Random Variables

Introduction to Random Variables Definition A random variable is a variable that depends on an outcome of a chance (random) experiment.

Examples Variable versus a Random Variable Examples: 1 Bret s height

Examples Variable versus a Random Variable Examples: 1 Bret s height not random

Examples Variable versus a Random Variable Examples: 1 Bret s height not random 2 Select a student in class at random and record the student s height

Examples Variable versus a Random Variable Examples: 1 Bret s height not random 2 Select a student in class at random and record the student s height random

Examples Variable versus a Random Variable Examples: 1 Bret s height not random 2 Select a student in class at random and record the student s height random 3 The total height of everyone in 331 SMI (right now)

Random Variables A random variable is a rule that attaches a numerical value to a chance outcome. Associated with each possible value of the random variable is a probability, a number between 0 and 1 that represents the long-run relative frequency of observing the given value. The sum of the probabilities for all possible values is one.

Probability Distributions A probability distribution gives a menu for the random variable. It lists the possible values, and the probability of each value. To describe the distribution, it is sufficient to provide a list of all possible values and the probability associated with each value. The sum of these probabilities is one. Frequently (as with the binomial distribution), there is a formula that specifies the probability for each possible value.

Example The probability distribution for a Random variable X. Here is the menu: lists the possible values, and the probability of each value k 0 1 5 10 P {X = k} 0.1 0.5 0.1 0.3

Example Write down the probability distribution for the random variable which describes the following chance outcome. I roll a die and then record the number on its face.

The Mean (Expected Value) The mean or expected value of a random variable X is written as E(X). For discrete random variables, E(X) = x i IP{X = x i }, where the x i s are the values that the variable takes on and the sum is taken over all possible values. Notation: µ X = E(X) Note that the expected value of a random variable is a weighted average of the possible values of the random variable, weighted by the probabilities.

The Variance The variance of a random variable X is written as Var(X). For discrete random variables, Var(X) = E ( (X µ X ) 2) = (x i µ X ) 2 IP{X = x i }, where the x i s are the values that the variable takes on and the sum is taken over all possible values. Notation: σ 2 X = Var(X) The variance is a weighted average of the squared deviations between the possible values of the random variable and its mean. If a random variable has units, the units of the variance are those units squared, which is hard to interpret.

Alternate Variance Formula Var(X) = E(X 2 ) (E(X)) 2 This formula is often easier to compute by hand. It suggests three steps 1 Compute E(X 2 ) 2 Compute E(X) (then square this answer) 3 Substract the two quantities.

Standard Deviation We also define the standard deviation to be the square root of the variance, so it has the same units as the random variable. A notation is SD(X) = Var(X). Also, σ X = SD(X).

Standard Problem Consider a random variable X defined by the following distribution k 0 1 5 10 P {X = k} 0.1 0.5 0.1 0.3 1 Compute P(X 5) 2 Compute P(.5 X 6) 3 Compute E(X), Var(X), and SD(X)