Statistics/BioSci 141, Fall 2006 Lab 2: Probability and Probability Distributions October 13, 2006

Similar documents
BIOINFORMATICS MSc PROBABILITY AND STATISTICS SPLUS SHEET 1

It is common in the field of mathematics, for example, geometry, to have theorems or postulates

LAB 2 Random Variables, Sampling Distributions of Counts, and Normal Distributions

Review. Binomial random variable

Probability and distributions

Distributions and Intro to Likelihood

Lab 9 Distributions and the Central Limit Theorem

Lab #7. In previous lectures, we discussed factorials and binomial coefficients. Factorials can be calculated with:

Intro to Likelihood. Gov 2001 Section. February 2, Gov 2001 Section () Intro to Likelihood February 2, / 44

4. Basic distributions with R

Central Limit Theorem (CLT) RLS

4-2 Probability Distributions and Probability Density Functions. Figure 4-2 Probability determined from the area under f(x).

Statistics and Probability

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 41

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

The probability of having a very tall person in our sample. We look to see how this random variable is distributed.

Section The Sampling Distribution of a Sample Mean

Statistics 251: Statistical Methods Sampling Distributions Module

Lecture 2. Probability Distributions Theophanis Tsandilas

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

STATISTICAL LABORATORY, May 18th, 2010 CENTRAL LIMIT THEOREM ILLUSTRATION

Statistical Computing (36-350)

Chapter 6: Random Variables and Probability Distributions

Basic Probability Distributions Tutorial From Cyclismo.org

MAS187/AEF258. University of Newcastle upon Tyne

Probability. An intro for calculus students P= Figure 1: A normal integral

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Inverse Normal Distribution and Approximation to Binomial

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Standard Normal, Inverse Normal and Sampling Distributions

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

2011 Pearson Education, Inc

Commonly Used Distributions

Populations and Samples Bios 662

(# of die rolls that satisfy the criteria) (# of possible die rolls)

The Normal Probability Distribution

Introduction to the Practice of Statistics using R: Chapter 4

Chapter 5. Sampling Distributions

CS 237: Probability in Computing

. (i) What is the probability that X is at most 8.75? =.875

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

BIOL The Normal Distribution and the Central Limit Theorem

Binomial and Geometric Distributions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Binomial Distributions

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Assignment 4. 1 The Normal approximation to the Binomial

The Binomial Probability Distribution

ECON 214 Elements of Statistics for Economists 2016/2017

Chapter 4 and 5 Note Guide: Probability Distributions

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models

Lab#3 Probability

Discrete Random Variables

1 PMF and CDF Random Variable PMF and CDF... 4

Statistical Intervals (One sample) (Chs )

The following content is provided under a Creative Commons license. Your support

CS 361: Probability & Statistics

LAB 2 INSTRUCTIONS PROBABILITY DISTRIBUTIONS IN EXCEL

Chapter 8. Binomial and Geometric Distributions

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Discrete Probability Distributions

STA Module 3B Discrete Random Variables

Chapter 7: Point Estimation and Sampling Distributions

Stochastic Components of Models

Lecture 3. Sampling distributions. Counts, Proportions, and sample mean.

CHAPTER 5 SAMPLING DISTRIBUTIONS

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

One sample z-test and t-test

Market Volatility and Risk Proxies

Part V - Chance Variability

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

A useful modeling tricks.

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Chapter 8: The Binomial and Geometric Distributions

Much of what appears here comes from ideas presented in the book:

Chapter 7 Study Guide: The Central Limit Theorem

***SECTION 8.1*** The Binomial Distributions

MATH 264 Problem Homework I

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 8: Binomial and Geometric Distributions

MAS187/AEF258. University of Newcastle upon Tyne

Chapter 5: Statistical Inference (in General)

The Binomial Distribution

Chapter 6: Random Variables

4.2 Probability Distributions

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

The Binomial and Geometric Distributions. Chapter 8

chapter 13: Binomial Distribution Exercises (binomial)13.6, 13.12, 13.22, 13.43

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Probability Distributions II

Unit 5: Sampling Distributions of Statistics

Describing Uncertain Variables

Unit 5: Sampling Distributions of Statistics

The Binomial Distribution

Data Analysis and Statistical Methods Statistics 651

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STAT 241/251 - Chapter 7: Central Limit Theorem

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Transcription:

Statistics/BioSci 141, Fall 2006 Lab 2: Probability and Probability Distributions October 13, 2006 1 Using random samples to estimate a probability Suppose that you are stuck on the following problem: (SW 333) If two carriers of the gene for albinism marry, each of their children has probability of being albino If such a couple has six children, what is the probability that 1 4 (a) none will be albino? (b) at least one will be albino? One way to solve this problem would be to use the material of section 34 or section 38 but say that you didn t read these sections What do you do? One strategy is to estimate the probabilities by drawing random samples Let s say that 0 represents non-albino and 1 represents albino, and let s create a population vector that has 3 non-albino children for every 1 albino: > population <- c(0,0,0,1) [1] 0 0 0 1 0 0 [1] 0 0 0 0 0 1 [1] 0 1 1 0 0 1 For event (b), count the proportion of samples that have at least one albino child and divide by the number of samples you ve taken Obviously, the more samples you take, the more accurate your answer will be in this case, try at least 20 samples to get a rough idea Question: why do we need replace=true? This strategy is called the Monte Carlo method for estimating probabilities [To get an estimate that s close to the true probability, you need to do at least 500 repetitions, which can get tedious if you re doing it manually] 1

2 Random sampling: named distributions Now consider the following problem: SW Example 42 Eggshell Thickness In the commercial production of eggs, breakage is a major problem Consequently the thickness of the eggshell is an important variable We approximate the thickness of any given egg to be a normal random variable with mean µ = 38 mm and standard deviation σ = 03 mm What is the probability that an egg has thickness 35 mm or less? We can again estimate this probability using random sampling Let s draw many (say ten thousand) normal random samples with mean µ = 38 and standard deviation σ = 03: > simulations <- rnorm(n=10000, mean=38, sd=03) > events <- (simulations <= 35) > sum(events) [1] 1558 Out of our ten thousand random eggshells, 1558 were less than or equal to 35 mm thick, so we estimate that the probability is about 1558 You can draw random samples from many other distributions Some of them are: rbinom for binomial, rpois for Poisson, runif for uniform, and rexp for exponential Question: can you think of a better way to solve the albinism problem 333 using the function rbinom? Look up the help entry for rbinom and estimate the probability in 333 (a) using 10000 random draws as we did in the eggshell problem Make sure to use == for equality tests: the symbol = is for variable assignment 3 Densities Look again at the albinism problem Can we calculate exactly the probability of having zero albino children? The method detailed in section 38 is to use the binomial distribution with 6 draws and success probability 25 If N is the number of albino children, then we can use the function dbinom to calculate P (N = 0) > dbinom(0, size=6, prob=25) [1] 01779785 Is this close to the Monte Carlo estimate you made in part 1? The R functions for densities are all prefixed with d : examples are dbinom, dpois, dnorm, dunif, dexp 4 Cumulative distribution functions How about the eggshell problem how do we calculate the exact probability of finding an eggshell that s 35 mm or thinner? We use the cumulative distribution function (cdf) Let T be a random variable denoting the thickness of a particular egg s shell The pnorm function gives us P (T <= 35): 2

> pnorm (35, mean=38, sd=03) [1] 01586553 The R functions for cdfs are all prefixed with p (for probability): examples are pbinom, ppois, pnorm, punif, pexp Problem: we can model the count of earthworms in a one-square-meter plot of soil as a Poisson random variable with mean parameter λ = 130 earthworms What is the probability of finding 70 or fewer earthworms in a given plot of soil? How about 110 or more? 5 Plotting densities and cdfs In section??, we calculated the probability that an eggshell was 35 mm thick or less by using the pnorm function A more visual strategy is to plot the random variable s density function and infer probabilities from the graph 51 Method 1: random histogram We can draw random samples as we did in sections?? and??, and then draw the histogram of the random samples (here we treat the eggshells example): > eggshells <- rnorm(1000, mean=38, sd=03) > hist(eggshells) Exercise Marshall et al (1990) studied the opening and closing of the nicotinic receptor of frog muscle They modeled the number times that the receptor opens per millisecond as an exponential random variable with rate parameter rate = 118 openings/ms Draw 1000 random opening times according to this model and plot the histogram From this diagram, can you roughly estimate the probability that a receptor opens two or fewer times per millisecond? Between 2 and 4 times? 52 Method 2: plotting the exact density We can also graph the random variable s density function, which is an exact version of the histograms above The eggshell example: > range <- seq(from=25, to=51, by=001) > dens <- dnorm(x=range, mean=38, sd=03) > plot(x=range, y=dens, xlab= eggshell thickness, + main= Density function for eggshell thickness, + type= l ) Approximately what percentage of the area of the graph is to the left of 35? Exercise Can you draw the density function for the nicotinic receptor example? Question What does graphing the histogram or density give you that the probability calculation in section?? does not? What does the probability calculation give you that the density does not? 3

6 Probability calculations with the normal curve Exercise Graph the density function of the eggshell thickness again Say we want the probability that an eggshell is between 40 and 45 mm thick 1 How do you estimate this probability from the density function? 2 (Thought question you don t actually have to do it in R) How would you estimate this probability using random sampling? 3 How could you use pnorm to calculate this probability exactly? 4 (Thought question) If you don t have a computer available, you may have only a piece of paper that contains the quantiles of N(0, 1) How would you do this calculation using the standardized scale method described in section 43? Exercise Use the pnorm function to determine the area under the normal curve within one standard deviation of the mean 7 Reading files from disk We ve received many questions about reading files from disk, and from Excel Download the following file from the web onto your Desktop: wwwstanfordedu/~gtchang/lentilxls Open the file in Excel, then save the file to a comma-separated values (CSV) file so that R can read it (File Save-as, then select the CSV file type in the save as type dropdown box) Load it into R: lentil <- readtable( C:/Documents and Settings/stat141/Desktop/lentilcsv,header=T,sep=, ) 8 Assessing normality The lentils data that you just loaded represent the growth rate, in cm per day, for a sample of 47 lentil plants (SW exmaple 49 on p 138) Are these data normally distributed? Question: what s a good way to tell if the data are normally distributed? One way to test normality to see if the quantiles of the data match the corresponding quantiles of a normal random variable with the same mean and standard deviation The function qnorm returns quantiles of a normal random variable with given mean and standard deviation You can get quantiles for many other distributions: qbinom is for binomial, qpois is for Poisson, qunif is for uniform, and qexp is for exponential > mean(lentil$growth) [1] 0502766 > sd(lentil$growth) 4

[1] 04398602 > lentilquantiles <- quantile(lentil$growth, probs=seq(from=1,to=9,by=1)) > normalquantiles <- qnorm(seq(from=1, to=9, by=1), mean=502766, sd=4397602) > lentilquantiles 10% 20% 30% 40% 50% 60% 70% 80% 90% 0112 0130 0240 0350 0400 0460 0500 0670 1200 > normalquantiles [1] -006080937 013265448 027215553 039135403 050276600 061417797 073337647 087287752 10 The quantiles look kind of different, so the lentils data are probably not normal However, it s difficult to tell for sure just by looking at the numbers Let s visualize the quantiles by plotting the quantiles against each other: plot(normalquantiles,lentilquantiles) They deviate quite a bit from the line y = x, so the lentils data are definitely not normal This plot is a simplified Q-Q plot (SW section 44) R has a function for drawing nice Q-Q plots: use qqnorm(lentil), and notice the similarities and differences with the simplified plot that we graphed manually Question: How close is the binomial(n = 100, p = 5) distribution to normality? How about the exponential(rate = 5) distribution? How would you use a Q-Q plot to answer this question? 5