STAT Chapter 4/6: Random Variables and Probability Distributions

Similar documents
Random Variables Handout. Xavier Vilà

STAT 241/251 - Chapter 7: Central Limit Theorem

5. In fact, any function of a random variable is also a random variable

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

Commonly Used Distributions

Probability Distributions for Discrete RV

Binomial Random Variables. Binomial Random Variables

Chapter 3 Discrete Random Variables and Probability Distributions

Statistics for Managers Using Microsoft Excel 7 th Edition

Chapter 7: Random Variables

STAT Chapter 7: Central Limit Theorem

Welcome to Stat 410!

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Probability. An intro for calculus students P= Figure 1: A normal integral

Chapter 5. Sampling Distributions

2011 Pearson Education, Inc

4 Random Variables and Distributions

TOPIC: PROBABILITY DISTRIBUTIONS

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Random Variable: Definition

Statistics 6 th Edition

Simple Random Sample

STA Module 3B Discrete Random Variables

Random variables. Discrete random variables. Continuous random variables.

Probability and Random Variables A FINANCIAL TIMES COMPANY

6. Continous Distributions

Unit 5: Sampling Distributions of Statistics

Statistical Methods in Practice STAT/MATH 3379

Unit 5: Sampling Distributions of Statistics

4.2 Bernoulli Trials and Binomial Distributions

Discrete Random Variables

A useful modeling tricks.

MA : Introductory Probability

4.1 Probability Distributions

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

STOR Lecture 7. Random Variables - I

Chapter 3 Discrete Random Variables and Probability Distributions

Statistics for Business and Economics

Mathematics of Randomness

Statistics and Probability

(Practice Version) Midterm Exam 1

CS145: Probability & Computing

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

5.2 Random Variables, Probability Histograms and Probability Distributions

5.3 Statistics and Their Distributions

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Homework Assignments

Business Statistics 41000: Probability 3

CS134: Networks Spring Random Variables and Independence. 1.2 Probability Distribution Function (PDF) Number of heads Probability 2 0.

March 21, Fractal Friday: Sign up & pay now!

15.063: Communicating with Data Summer Recitation 3 Probability II

Chapter 7. Sampling Distributions and the Central Limit Theorem

Learning Objec0ves. Statistics for Business and Economics. Discrete Probability Distribu0ons

Chapter 5. Statistical inference for Parametric Models

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Chapter 7: Point Estimation and Sampling Distributions

II. Random Variables

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 7. Sampling Distributions and the Central Limit Theorem

Elementary Statistics Lecture 5

VIDEO 1. A random variable is a quantity whose value depends on chance, for example, the outcome when a die is rolled.

Theoretical Foundations

BIOL The Normal Distribution and the Central Limit Theorem

4.3 Normal distribution

II - Probability. Counting Techniques. three rules of counting. 1multiplication rules. 2permutations. 3combinations

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

Probability Theory. Mohamed I. Riffi. Islamic University of Gaza

Econ 250 Fall Due at November 16. Assignment 2: Binomial Distribution, Continuous Random Variables and Sampling

MATH MW Elementary Probability Course Notes Part IV: Binomial/Normal distributions Mean and Variance

Chapter 7. Random Variables

The Binomial and Geometric Distributions. Chapter 8

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Mean of a Discrete Random variable. Suppose that X is a discrete random variable whose distribution is : :

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

MA 1125 Lecture 14 - Expected Values. Wednesday, October 4, Objectives: Introduce expected values.

Chapter 16. Random Variables. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Chapter 4 Continuous Random Variables and Probability Distributions

Chapter 4 Continuous Random Variables and Probability Distributions

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Chapter 4 Probability Distributions

Part V - Chance Variability

Probability Models.S2 Discrete Random Variables

Section Distributions of Random Variables

Introduction to Probability and Inference HSSP Summer 2017, Instructor: Alexandra Ding July 19, 2017

Discrete Random Variables and Probability Distributions

Chapter 4 and 5 Note Guide: Probability Distributions

Chapter 5 Student Lecture Notes 5-1. Department of Quantitative Methods & Information Systems. Business Statistics

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 16. Random Variables. Copyright 2010 Pearson Education, Inc.

Random variables. Contents

PROBABILITY DISTRIBUTIONS

Business Statistics 41000: Probability 4

9 Expectation and Variance

Chapter 3 - Lecture 5 The Binomial Probability Distribution

Transcription:

STAT 251 - Chapter 4/6: Random Variables and Probability Distributions We use random variables (RV) to represent the numerical features of a random experiment. In chapter 3, we defined a random experiment as one where the possible outcomes are known before the experiment is performed (S), but the actual outcome is not known. A random variable is usually denoted by capital letters (eg) X, Y, Z, and its possible realized values denoted by the same lowercase letters (eg) x, y, z. So, X is a random quantity before the experiment is performed and x is the value of the random quantity after the experiment has been performed. Examples: X = # of defective items in a sample. Y = Time till failure of a mechanical part. Z = % yeild of a chemical process. X 2 = # rolled on a die Notation: In this course, we will write things like P(X = x) to mean the probability that the RV X takes on the value x. The Range of a RV is the set of all possible values it can take on (eg) X {0, 1,..., n} Y [0, ) Z [0, 100] X 2 {1, 2, 3, 4, 5, 6} First we will talk about discrete RVs and then continuous RVs. A Discrete RV has finite or countable range and a continuous RV is defined on a continuous range (which may be bounded). 1

Discrete Random Variables A discrete RV is a RV that has a finite or countable range. (eg) # defective items, # rolled with a dice, # of sales for a store, # of heads on n coin tosses. Example: Let X = The number of 1 s rolled, when rolling a die 4 times... Probability Mass Function (f(x)): The probability mass function of a Discrete RV, denoted by a lowercase f(x), is a function that gives the probability of occurance for each possible realized value x of the RV X. For a DISC RV, f(x) is a discrete step function. f(x) = P(X = x), for all possible values x of X. (for a discrete RV only!) Properties of f(x): 1. 0 f(x) 1, for all x X 2. allx f(x) = 1, Sum to 100% Probability mass functions can be represented in a frequency table, a histogram or as a mathematical function. Examples below... Combinatorial and Factorial Notation: Suppose we have n distinct objects and we would like to choose k of them. If we can only choose each object once (no replacement) and choosing the objects (a, b) is the same as choosing (b, a) (order they are selected in does not matter), then how many distinct sets of k objects can we select from the n of them? (eg) Select k=10 students from this class of n=110 or a poker hand: the deck has 52 (n) cards, we select 5 (k) of them and we cannot get the same card twice, and K -A -A -A - A is the same as A -A -A -K -A 2

How many ways can we choose k objects from n distinct objects? ( n ) k = n!... 2 1 (n k)!k! = n (n 1)... (n k+1) k!, where k! = k (k 1) (k 2) (eg) Poker hand: ( ) 52 5 = 52! = (52 5)!5! Examples: 52 51... 2 1 = 52 51 50 49 48 = 2,598,960 47 46... 2 1 5 4 3 2 1 5 4 3 2 1 1. If we have {1, 2, 3, 4} and we choose 2 of them without replacement, how many different combinations can we get? 2. What is the probability of winning the jackpot in Lotto 6/49? Note: We define 0! = 1. (eg) ( ) n n = n! = n! = 1 as it should! (n n)!n! 0!n! Some Important Discrete Random Variables: First, we will talk about the bernoulli process which gives rise to the binomial, the geometric and the negative binomial random variables. Later in the course (in Chapter 6), we will talk about the poisson process which gives rise to the poisson random variable (as well as the exponential RV). Bernoulli Process: A bernoulli process is a random experiment with the following features: 1) The experiment consists of n independent steps called Trials. 2) Each trial results in either a success or a failure (ie) S = {S, F} 3) The probability of a success is constant across trials. Prob. of success = P(S) = p You should be able to recognize when an experiment is a bernoulli process: 3

There is an event that will/will not occur (success/failure). Successes/Failures are measured over a # of trials. The probability of a success is the same for every trial. The terms success and failure are used as the theory has its roots in gambling. It can sometimes be weird to think of a success as finding a defective item, although this is the terminlology we use. Examples: (1) Testing Randomly selected items for defects: event: Find a defective item trial: Inspection of each selected item outcomes: Defective (success) or not-defective (failure) 2) The number of heads on repeated coin tosses: event: Get a H trial: Each coin toss outcomes: get a H (success) or a T (failure) As we mentioned, the bernoulli process gives rise to: Binomial RV: The # of successes in a fixed # of trials Geometric RV: The # of trials completed until the first success is observed. Examples: (1) Testing Randomly selected items for defects: Binomial RV the # of defects out of n = 50 sampled and inspected items Geometric RV the # of items inspected until the first defective item is found (2) The number of heads on repeated coin tosses: Binomial RV the # of H out of n = 100 coin tosses Geometric RV the # of tosses until you get the first head Binomial Random Variables: Experiment: Counting the # of occourances of an event (success) for n independent trials (fixed number of trials). Random Variable: X = # of successes 4

Range: {0, 1, 2,...,n} Parameters: n = # of trials, p = probability of success Notation: If X is a Binomial RV, we write X BIN(n, p), which indicates that X follows a binomial distribution with n trials that each have a probability of success p. Prob Mass Function: f(x) = P(X=x) = ( n x) p x (1 p) n x, for x = 0, 1,...,n Example: If we toss a coin 10 times, what is the probability of getting exactly 2 heads? Solution: We have n=10, p=0.5 Let X=the # of heads in 10 tosses Then, X BIN(10, 0.5) and f(2) = P(X=2) = ( ) 10 2 (0.5) 2 (1 0.5) 10 2 = 0.044 or 4.4% chance of tossing exactly 2 heads on 10 tosses. What is the probability of getting two or more heads on 10 tosses? What is the probability of getting 40 or more heads on 100 tosses?...in chapter 7, we will see how we can get an approximate answer for a question like this. Let s intuitively think about why the probability mass function of a binomial RV makes sense, using the last example. Firstly, how many ways can we get 2H on 10 tosses? There are ( 10 2 ) = 45, since the 2H can occur on any 2 of the 10 tosses. (ie) A head can occur on the 1 st or 2 nd or... or 10 th toss. We want to choose 2 of the 10 possible locations for the Heads. Recall: We are assuming that probability of success is the same for each trial and that the trials are independent. Let s take one of the 45 sequences with 2H, say...(h, H, T, T, T, T, T, T, T, T) or (S, S, F, F, F, F, F, F, F, F) and because of independence, P(S S F F F F F F F F) 5

= P(S)P(S)P(F)P(F)P(F)P(F)P(F)P(F)P(F)P(F) = p*p*(1 p)*(1 p)*(1 p)*(1 p)*(1 p)*(1 p)*(1 p)*(1 p) = p 2 (1 p) 8 The above is the probability of getting 2H and then 8T. All sequences of 2H and 8T have the same probability, so the probability of getting any particular sequence with 2H and 8T is... (# possible 2H combos) (probability of a 2H combo) = ( 10 2 ) p 2 (1 p) 8 Example: Suppose that 5% of computer chips made by a company are defective. If you randomly select and inspect 8 chips, what is the probability that you find at least one defective one? What assumption have you made when solving this problem? Example: Suppose you go to the casino and make 5 bets of $1 each, on the colour of the number coming up in the casino game Roulette. What is the probability that you leave the casino with more money than you came with? Geometric Random Variables: Experiment: Counting the # of trials until the first success. Random Variable: Y = # of trials until first success Range: {1, 2,...} (infinite but countable) Parameters: p = P(S) = probability of success Notation: If Y is a geometric RV, we write Y GEO(p), which indicates that y follows a geometric distribution with probability of success p. Prob. Mass Function.: f(y) = P(Y =y) = p(1 p) y 1, for y = 0, 1,... Example: If we toss a coin, what is the probability that we have to toss it exactly 5 times to see the first H show up? Solution: We have p=0.5 Let Y =the # of tosses until the first H Then, Y GEO(0.5) and f(5) = P(Y = 5) = 0.5 (1 0.5) 5 1 = 0.031 6

Let s intuitively think about why the prob mass function of a geometric RV makes sense using the last example. To have the first success on the 5 th trial, we must have the first 4 trials be failures...(f, F, F, F, S) Recall: We assume constant probability of success and independence of trials. So, P(first success on 5 th trial) = P(F F F F S) = by independence = P(F)P(F)P(F)P(F)P(S) = (1 p)(1 p)(1 p)(1 p)p = p(1 p) 4 Examples: 1. If 5% of the computer chips manufactured by a company are defective, what is the probability that we have to check exactly 8 chips before we find one defective one? 2. You go to the casino and play Roulette, betting on the colour of the number. What is the probability that the first bet you win is on the fourth bet you make? Sum of Independent Binomial Random Variables: If X 1, X 2,...,X k are k independent binomial RVs, where X i BIN(n i, p) (have same p but may have different n) Then Y = X 1 + X 2 +...+X k BIN(n 1 + n 2 +...+n k, p) (eg) X 1 BIN(10, 0.25) and X 2 BIN(25, 0.25), and X 1 and X 2 are independent, then Y = X 1 + X 2 BIN(35, 0.25) (ie) A BIN(n, p) is the sum of n independent BIN(1, p) trials (a BIN RV is the sum of Bernoulli RVs) Sum of Independent Geometric Random Variables: Summing more that one geometric RV results in a new type of random variable; the Negative Binomial RV! If X 1, X 2,...,X k are k independent geometric RVs, where X i GEO(p) (have same p) 7

Then, Y = X 1 + X 2 +...+X k NEGBIN(k, p) Y = # of trials until the k th success Negative Bin. Prob Mass Fcn: f(y) = P (Y = y) = ( ) y 1 p k (1 p) y k Where y = the # of trials and k = the # of successes. Note: We now have the binomial, geometric and negative binomial random variables (for discrete RVs) binomial # of successes in fixed # of trials geometric # of trials until the 1 st success negative binomial # of trials until the k th success Example: We will toss a coin until we get 2 heads. Suppose that the coin is biased and flips heads 40% of the time. What is the probability that the coin will be tossed exactly 3 times? What is the probability that it will be tossed at least 4 times? Solution: Let Y = # of tosses until you get 2 heads NEGBIN(2, 0.4) For the first question, we have y = 3 and k = 2 P(Y = 3) = f(3) = ( 3 1 2 1) (0.4) 2 (1-0.4) 3 2 = 0.192 For the second question, P(Y 4) = 1 P(Y 3) = = 1 P(Y = 3) P(Y = 2) P(Y = 1) = 1 ( ) 3 1 2 1 (0.4) 2 (1-0.4) 3 2 ( 2 1) (0.4) 2 (1-0.4) 2 2-0 = 1 0.192 0.16 = 0.648 So far, we have introduced 3 different types of Discrete RVs. In chapter 6, we will introduce one more (The Poisson RV). We will now introduce what we call a cumulative distribution function (cdf, F(x)) and then begin talking about the mean and standard deviation for these discrete RVs. Then we will move into doing the same things, but for a few Continuous RVs. k 1 8

Cumulative Distribution Functions (CDF): The cumulative distribution function (F (x)) is defined as F (x) = P(X x) = k x f(k) Let s consider the example of rolling a die 4 times, and let X = the # of 1 s rolled. x f(x) = P (X = x) F (x) = P (X x) 0 0.4823 0.4823 1 0.3858 0.8681 2 0.1157 0.9838 3 0.0154 0.9992 4 0.0008 1.00 Properties of f(x) and F (x): In principal, a Discrete RV can take its value on any countable or finite set of real # s. In this course, we will only deal with integer valued discrete RVs. (1) 0 f(x) 1, 0 F (x) 1 (2) F (x) is non-decreasing and right-continuous (3) allx X f(x) = 1, F ( ) = 0 and F ( ) = 1 (4) f(x) = F (x) - F (x 1), for all discrete integer valued x. ( = P (X x) P (X x 1) = P (x 1 < X x) ) (5) P (a < X b) = F (b) F (a), for all integers a and b (6) P (X < a) = P (X a 1) = F (a 1), for all integers a Note: In the discrete case, P (X < x) P (X x). 9

The Geometric Distribution Function (F (x)): Example: Derive the cumulative distribution function (F (x)) of a geometric RV. Fact: For a geometric series, 1 + r + r 2 +... + r m = 1 rm+1 1 r Recall: For Y GEO(p), f(y) = P (Y = y) = p(1 p) y 1 Therefore, F (y) = y k=1 p(1 p)k 1 = p(1 p) 0 + p(1 p) 1 +... + p(1 p) y 1 = p( 1 + (1 p) 1 + (1 p) 2 +... + (1 p) y 1 ) which is a geometric series where r=(1 p) and m=(y 1), so... = p 1 (1 p)(y 1)+1 1 (1 p) = p 1 (1 p)y p = 1 (1 p) y Therefore, if Y GEO(p), then F (y) = P (Y y) = 1 (1 p) y The Binomial Distribution Function (F (x)): There is no closed form expression for F (x) when X BIN(n, p). Instead, one can (a) Use the binomial table in the course notes...we won t do this! (b) Calculate F (x) = P(X x) = f(0) + f(1) +...+ f(x) (c) In chapter 7, we will learn some approximation methods we can use, when the number of trials (n) is large. Examples: Let X BIN(18, 0.4). Find... (a) P(5 < X 10) (b) P(5 X < 10) (c) P(X > 4), and P(X 4) Expected Values: In this section, we will talk about how to find the expected value for a random variable. By expected, we mean the most representative value. The value that would be expected in the long run. Expected Values make sense in the long run, but often not in the short run... let s discuss this... Let X be a discrete RV with prob mass function f(x) and range R. Let g(x) be a function of x. (eg) g(x)=x or g(x)=x 2 or g(x)=e x 10

Then the Expected Value of g(x) is... E(g(x)) = x R g(x)f(x) Note: Often we are interested in the expected value of g(x)=x, although we leave it general, as there are instances where we want to find the expected value of things such as g(x)=x 2 and so on... Note: That if g(x)=x, then the expected value, E(g(x)), is just the most likely value for the random variable X (in the long run). (eg) If X=# of defects and we use g(x)=x, then E(g(x)) = E(x) = The expected # of defects. Examples: 1. Consider the following. (a) Find the expected value of X. (b) Find the expected value of X 2. x f(x) 0 0.10 1 0.20 2 0.50 3 0.15 4 0.05 2. Consider rolling a standard die one time. Let X be the number rolled. Find (a) E(X) and E(X 2 ) 3. Suppose you will make a $1 bet on the colour of the number in roulette. What is the expected gain/loss for the bet? Note: In general, E(X 2 ) E(X) 2... check for the previous example! (ie) The expected value of the square the square of the expected value! Mean of X = µ x = E(X) = x R xf(x) = allx R xp(x=x) Note: The expected value/mean of X is simply a weighted average of the possible values X may take on, weighted by how often that value will be observed. 11

We use the mean as our measure of center Variance of X = E( (X µ x ) 2 ) = σ 2 x = x R (x µ x) 2 f(x) Here, we are using g(x)=(x µ x ) 2...we are finding the expected squared deviation from the mean. This gives us an estimate of the expected squared deviation from the mean, or spread. YES, this is the same as the variance we saw in chapters 1/2...the only difference is that there it was for a SAMPLE of data, here it is for the POPULATION/true variance. The above definition of the variance is in a form that helps us to understand what it is estimating. Below is a definition of the variance that is more useful when actually calculating a variance. σ 2 x = E(X 2 ) E(X) 2 = E(X 2 ) µ 2 x Standard Deviation of X = σ x = σ 2 x = square root of the variance (we work with this one most often) Properties of The Mean, Variance and Standard Deviation: If you recall, in our review of Chapter 1/2 we discussed a few rules of what happens to the mean/variance/sd when you add or multiply a variable by a constant. These rules are exactly the same, only stated in notational form here. Let X be a random variable with mean µ x and variance σ 2 x, and let a and b be any real-valued constants. (1) If Y = ax + b, then µ y = E(Y ) = E(aX + b) = E(aX) + E(b) = ae(x) + b = aµ x + b σ 2 y = VAR(Y ) = VAR(aX + b) = VAR(aX) + VAR(b) = a 2 VAR(X) = a 2 σ 2 x. σ y = a σ x Recall: Adding a constant changes the mean by that constant, but the SD remains the same. Multiplying by a constant changes both the mean and SD by that constant. 12

Let X and Y be random variables with means µ x, µ y and variances σ 2 x, σ 2 y (2) If we add/subtract the two random variables X and Y, then... Mean of X + Y = µ x+y = E(X + Y ) = E(X) + E(Y ) = µ x + µ y Mean of X Y = µ x y = E(X Y ) = E(X) E(Y ) = µ x µ y In general, E(aX + by ) = aµ x + bµ y (3) If X and Y are independent, then... VAR(aX + by ) = a 2 σ 2 x + b 2 σ 2 y VAR(aX by ) = a 2 σ 2 x+b 2 σ 2 y When X and Y are independent, the variance of the sum/difference is equal to the sum of the variances. (ie) VAR(X ± Y ) = VAR(X) + VAR(Y ) (4) If X and Y are independent, then... E(XY ) = E(X)E(Y ) Examples: 1. You own 4 machines and you want to have them inspected. It costs $50 for all 4 inspections, and a defective machine is repaired at a cost of $25 each. The probability of finding 0, 1, 2, 3, 4 defective machines is summarized in the table below. x f(x) xf(x) X 2 f(x) 0 0.10 0.00 0.00 1 0.20 0.20 0.20 2 0.50 1.00 2.00 3 0.15 0.45 1.35 4 0.05 0.20 0.80 sum 1.00 1.85 4.35 (a) What is the expected # of defective machines? (b) What is the expected total cost? (c) What is the standard deviation for the total cost? 13

2. Suppose that in the month of July in Vancouver, the temperature has µ=25 o C with σ 2 = 4. What are the mean and SD of temperatures in degrees fahrenheit? Hint: Fahrenheit = Celsius* 9 5 + 32 Now that we know how to find means and variances, let s go thru and find the means and variances for some distributions we already know. Mean and Variance of Binomial Random Variables: If X BIN(n, p) then... Mean of X = E(X) = µ x = np Variance of X = VAR(X) = σ 2 x = np(1 p) Proof: Let X = X 1 + X 2 +... + X n, where X i iid BIN(1, p), then X BIN(n, p) Then E(X i ) = x R xf(x) = x R xp (X = x) = 1p + 0(1 p) = p and E(X 2 i ) = x R x2 f(x) = x R x2 P (X = x) = 1 2 p + 0 2 (1 p) = p Therefore, σ 2 x i = VAR(X i ) = E(X 2 i ) E(X i ) 2 = p p 2 = p(1 p) So, E(X) = E(X 1 + X 2 +... + X n ) = E(X 1 ) + E(X 2 ) +...+ E(X n ) = = p + p +... + p = np σ 2 x = VAR(X) = VAR(X 1 + X 2 +... + X n ) =(because of Indep.)= VAR(X 1 ) + VAR(X 2 ) +...+ VAR(X n ) = p(1 p) + p(1 p) +... + p(1 p) = np(1 p) 14

Example: You toss a coin 75 times. If we let X be the # of heads for the 75 tosses, then what is the expected # of heads and what is its variance? Example: You make 80 bets on the colour of the number in Roulette. What is the mean and variance for the number of bets won? Mean and Variance of Geometric Random Variables: If Y GEO(p) then... Mean of Y = E(Y ) = µ y = 1 p Variance of Y = VAR(Y ) = σ 2 y = 1 p p 2 ** See course notes for proof! Example: A telemarketer makes successive independent phone calls and gets a sale with a probability of 5% each call. Each phone call costs 25 cents. Find (a) What is the expected cost of making one sale? (b) What is the corresponding standard deviation? Mean and Variance of Negative Binomial RVs: If Y NEGBIN(k, p) then...(recall: k=# successes, p=prob. of success) Mean of Y = E(Y ) = µ y = k p Variance of Y = VAR(Y ) = σ 2 y = k(1 p) p 2 Proof: Recall: If Y = Y 1 + Y 2 +... + Y k where Y i iid GEO(p), then Y NEGBIN(k,p) E(Y ) = E(Y 1 + Y 2 +... + Y k ) = E(Y 1 ) + E(Y 2 ) +...+ E(Y k ) = 15

= 1 p + 1 p +...+ 1 p = k p VAR(Y ) = VAR(Y 1 + Y 2 +... + Y k ) =(by Indep.)= = VAR(Y 1 ) + VAR(Y 2 ) +...+ VAR(Y k ) = 1 p p 2 Example Continued: (c) What is the expected cost of making 3 sales? (d) What is the SD of this? + 1 p p 2 +...+ 1 p p 2 = k(1 p) p 2 Example: A hockey team needs to sign two free agents before the season starts. Suppose that each player they speak with will join the team with a 20% probability, and assume that the player s probabilities of joining are independent of each other. (a) What is the expected # of players they talk to before signing 2 players? (b) What is the SD of this? 16

Examples: 1. Suppose that it is known that the probability of being able to log on to a computer from a remote terminal is 0.7 at any given time. (a) What is the probability that out of 10 attempts, you are able to log on exactly 6 times? (b) What is the probability that out of 8 attempts, you are able to log on at least 6 times? (c) How many times you you expect to have to try to log on before being successfully logging on? (d) What is the probability that you must attempt to log on at least 3 times before successfully logging on? *Note that we can solve this by defining a BIN or GEO RV (e) What is the probability that it takes you more than 4 attempts to successfully log on twice? *Note that we can solve this by defining a BIN of BEGBIN RV 2. Let s play a game. You have to pay me $60 to play the game. you then roll 2 dice and look at the sum of those 2 dice. I will give you back whatever the sum of the 2 dice is squared. (a) What is the expected gain/loss for playing this game? (b) What is the variance for the gain/loss when playing this game? (c) What is the probability that you make money on a single play of the game? 3. The probability that a wildcat well will be productive is 1, regardless of the location 13 being drilled. We will assume that the locations being drilled are far enough from each other so that the probability of it being productive is independent for each of the wells drilled. (a) How many wells do they expect to have to drill before finding 1 that is productive? (b) How many wells do they expect to have to drill before finding 3 that are productive? (c) What is the probability that the first productive well they find is on the 13th well they drill? (d) If they drill 20 wells, what is the probability that at least 2 are found to be productive? (e) If it costs the company $1000 start-up for drilling plus $100 for each well they drill, what is the expected cost and the variance of the cost, of finding 3 productive wells? 17

Continuous Random Variables Things in this section will look very similar to the section on Discrete RVs. Conceptually, all the ideas are the same. A Continuous random variable is one that has an infinite or uncountable range. (eg) Weight of an item, time until failure of an mechanical component, length of an object,... (Probability) Density Function (f(x)): The density function is a function that alows us to work out the probability of occurance over a range of x-values. It does not have the exact same definition as in the discrete case. (That is, f(x) P (X = x), for a CONT RV) P(a X b) = b a f(x)dx Note: In the discrete case, P(a X b) = b a f(x). Now that it is continuous we must integrate instead of sum. Note: In the continuous case, P (X = a) = 0, for any a. (ie) the probability of X taking on any particular value is 0. Note: f(x)dx = 1. This is similar to what we saw in the discrete casae, except there we summed over all values X could take on. (Cumulative) Distribution Function (F (x)): The c.d.f. gives the probability of being less than or equal to a particular value. F (x) = P (X x) = x f(t)dt Note: Again, this is what we saw in the discrete case, except that now we have an integration instead of a summation. Properties of F (x): 1. 0 F (x) 1 18

2. F ( ) = 0 and F ( ) = 1 3. F (x) is continuous and non-decreasing 4. P (a < X < b) = P (a X < b) = P (a < X b) = P (a X b) = F (b) F (a) Including or excluding the endpoints does not matter for continuous RVs as the probability of X taking on any given value is 0. 5. F (x) = f(x). The derivative of the distribution gives the density. f(x) F (x) by integration, and F (x) f(x) by differentiation. Now we will introduce two continuous RVs; the Uniform RV and the Exponential RV. The next chapter will be devoted to the Normal RV (bell curve), which is the most useful and most common continuous distribution in statistics. Uniform Random Variables: Notation: If X is a Uniform RV, then we write X UNI(a, b), indicating that X is uniformly (or evenly) distributed over the interval [a, b] Density Function: f(x) = 1 b a, for a x b Distribution Function: F (x) = x a Mean of X = E(X) = µ x = a+b 2 Variance of X = VAR(x) = σ 2 x = (b a)2 12 f(u)du = x a b a Exercise: On your own, show that the above are true...you will be able to do this after the section on expected values (on p. 21)! Example: Let X UNI(0, 10). Find... (a) P(X = 5) (b) P(X 2) (c) P(3 X 7) (d) P(X < 2) 19

Exponential Random Variables: The Exponential RV is often used to model the time until an event occurs. As a consequence, it takes on values 0. In chapter 6 we will come back to this, but it arises from something called the poisson process, which we will discuss more in depth then. Notation: If X is an Exponential RV with a failure rate of λ, then we write X EXP(λ) λ is a positive constant and is equal to the reciprocal of the mean life-time. (ie) If the lightbulb has a mean lifetime of 5 years, then λ = 1 5 Density Function: f(x) = λe λx, for x 0 Distribution Function: F (x) = 1 e λx, for x 0 Mean of X = E(X) = µ x = 1 λ Variance of X = VAR(x) = σ 2 x = 1 λ 2 Example: Suppose that a lightbulb has an expected lifetime of 10 years. Find... (a) The failure rate of the lightbulb (b) The probability that a bulb lasts more than 15 years? (c) The probability that a bulb lasts 4-6 years? 20

Expected Values: Again, finding the expected values is very similar to the discrete case, except now we use integrals instead of summations. Let X be a continuous RV with density function f(x). Let g(x) be a given function of x (eg) g(x) = x or g(x) = x 2 or,... Then the expected value is E(g(x)) = g(x)f(x)dx We integrate over the range of X. Here we write (, ) to be general...what we mean is over the range of the RV. Note: If we use g(x) = x, then we are finding the expected value of X. Mean of X: µ x = E(X) = xf(x)dx Variance of X: σ 2 x = VAR(X) = E(x 2 ) E(x) 2 Note: The properties of µ x, σ 2 x and σ x are the same as in the discrete case (outlined on page 12/13) Example: Suppose that X UNI(0, 5). Find... (a) The expected value of X (b) The expected value of X 2 (c) The SD of X Example: The density function for X, the lead concentration in gasoline in grams per liter is given by f(x) = 12.5x 1.25, for 0.1 x 0.5 (a) What is the expected concentration of lead? (b) What is the variance of the concentration of lead? 21

The Median/Half-life of a Continuous RV: The median of a continuous RV is defined as F (m) = 1 2 (ie) The probability that x is above or below m, the median, is 1 2 P(X < m) = P(X > m) = 1 2 When X is the measure of a lifetime, we often refer to the median (m) as the Half-Life Example: Suppose that the lifetime of your TV (in years) follows an exponential distribution with a failure rate of 0.04. Find... (a) The expected lifetime of the TV. (b) The SD of the lifetime of the TV. (c) Calculate the half-life of your TV. (d) What % of TVs like yours will exceed their expected lifetimes? (e) What % of TVs like yours will exceed their half-lives? 22

Sum and Average of Independent RVs: Note: We will use many of the ideas in this section once we come to Chapter 7. Random experiments are often independently repeated creating a sequence X 1, X 2,..., X n of n independent RVs. Typically these n RVs, X i where i = 1, 2,..., n, will have a common mean (µ xi ) and a common variance (σ 2 x i ). In this case, {X 1, X 2,..., X n } is called an Independent Random Sample (eg) Roll a die repeatedly, measure the lifetime of a type of component repeatedly, crash test a type of car repeatedly,... So, we have independent RVs X 1, X 2,..., X n, where µ xi = µ and σ 2 x i = σ 2 (common mean/variance) Often, we are interested in the sum or mean of the sample values. S = n i=1 X i = X 1 + X 2 +... + X n X = ni=1 X i n = X 1+X 2 +...+X n n Example: Suppose we have a series of 9 independent measurements on the compressive strength of steel fibre reinforced concrete. Let X i UNI(500, 1500), where X i is the compressive strength of block i, i=1, 2,...,9. Question: do you think the breaking strength would follow a uniform distribution? Recall: E(X i ) = a+b 2 = 500+1500 2 = 1000 VAR(X i ) = (b a)2 12 = (1500 500)2 12 = 83333.33 BUT, a more accurate measure of the true compressive strength of the steel fibre reinforced concrete would be the average/mean strength of the 9 blocks, X. So, when we have independent RVs X 1, X 2,..., X n then... with common mean and variance, 23

Sum: µ S = E(S) = nµ xi. σ 2 S = VAR(S) = nσ2 x i. σ S = SD(S) = nσ xi Mean: µ X = E( X) = µ xi. σ 2 X = VAR( X) = σ2 x i n. σ X = SD( X) = σ x i n Square Root Rule: The SD of the sum is proportional to the square root of n. The SD of the average is proportional to the inverse of the square root of n Example (continued): Recall our example of measurements on the compressive strength of 9 concrete blocks. Find... (a) The expected value of the mean of the 9 blocks (b) The SD of the mean of the 9 blocks (SD X) Note: We can see that the mean is the same, but the SD is 3-times smaller square root rule! Note: As the number of measurements or observations grows large (n ), the variability of the mean of the measurements gets very small (SD( X) = σ X 0) More measurements means greater accuracy! Example: 20 randomly selected UBC students are asked if they smoke and 6 say that they do and the other 14 do not. Find... (a) The estimated proportion of smokers among UBC students? Is this a valid estimate? (b) The SD of this estimate In general, when we are working with proportions (which arise from categorical p(1 p) variables), we estimate SD(p) = n 24

Max/Min of Independent RVs As we just discussed, we often conduct a series of random experiments to get a sequence of Independent RVs {X 1, X 2,..., X n }, known as a random sample. Often, we are interested in the sum or average of all the observations, but there are also instances where we are interested in the Maximum or Minimum value in this random sample. If we let = max{x 1, X 2,..., X n }. Then can be used to model: 1) The lifetime of a system of n independent parallel components, where X i = the lifetime of component i. 2) The completion time of a project of n independent sub-projects, which can be done simultaneously and X i = completion time of project i If we let = min{x 1, X 2,..., X n }. Then can be used to model: 1) The lifetime of a system of n independent components in a series, where X i = the lifetime of component i. 2) The completion time of a project pursued by n independent competing teams and X i = completion time of project i 25

The Maximum: Suppose that we have an independent random sample, X 1, X 2,..., X n, and we wish to find the maximum of them; = max{x 1, X 2,..., X n } Also, suppose that F Xi (x) and f Xi (x) are the distribution and density functions for the RV X i. Define F W (ν) and f W (ν) to be the distribution and density functions of the maximum ( ) Then, F (ν) = P( ν) = P((X 1 ν) (X 2 ν)... (X n ν) ) =(by indep)= P(X 1 ν)*p(x 2 ν)*...*p(x n ν) = F X1 (ν) F X2 (ν)... F Xn (ν) Here, we will only discuss the cases when the X i distributed), as this is most often the case. are iid (=independent and identically Distribution: F (ν) = (F Xi (ν)) n Density: f (ν) = F (ν) = n(f Xi (ν)) n 1 f Xi (ν) Example: A system consists of 3 components arranged in parallel. Components are independent with a mean lifetime of 5 years. The lifetimes are thought to follow an exponential distribution. Find... (a) The median/half-life for component 1 (b) The probability that the first component fails before 5.5 years have passed (c) The probability that the system fails before 5.5 years have passed (d) The median/half-life of the system Note: How we can solve some of these problems using the simple rules of probability that we learned back in chapter 3. 26

The Minimum: As before, Suppose that we have a random sample, X 1, X 2,..., X n, and we wish to find the minimum of them; = min{x 1, X 2,..., X n } Also, suppose that F Xi (x) and f Xi (x) are the distribution and density functions for the RV X i. Define F V (υ) and f V (υ) to be the distribution and density functions of the minimum ( ) Then, F (υ) = P( υ) = 1 P ( > υ) = 1 P( (X 1 > υ) (X 2 > υ)... (X n > υ) ) =(by indep)= 1 P(X 1 > υ)*p(x 2 > υ)*...*p(x n > υ) = 1 (1 F X1 (υ))(1 F X2 (υ))...(1 F Xn (υ)) Distribution: F (υ) = 1 (1 F Xi (υ)) n Density: f (υ) = F (υ) = n(1 F Xi (υ)) n 1 f Xi (υ) Example: A system consists of 3 components in a series. Components are independent with a mean lifetime of 5 years, and are thought to follow an exponential distribution. Find... (a) The probability of component 1 failing before 5.5 years have passed (b) The probability that the system fails before 5.5 years have passed (c) The median/half-life for the system 27