When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Similar documents
When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

The probability of having a very tall person in our sample. We look to see how this random variable is distributed.

But suppose we want to find a particular value for y, at which the probability is, say, 0.90? In other words, we want to figure out the following:

Since his score is positive, he s above average. Since his score is not close to zero, his score is unusual.

7.1 Graphs of Normal Probability Distributions

ECON 214 Elements of Statistics for Economists 2016/2017

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

ECON 214 Elements of Statistics for Economists

The Binomial Distribution

The normal distribution is a theoretical model derived mathematically and not empirically.

2. Modeling Uncertainty

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

The Binomial Distribution

Business Statistics 41000: Probability 4

The Normal Probability Distribution

The Normal Distribution

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Contents. The Binomial Distribution. The Binomial Distribution The Normal Approximation to the Binomial Left hander example

CH 5 Normal Probability Distributions Properties of the Normal Distribution

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Part V - Chance Variability

MATH 264 Problem Homework I

Probability. An intro for calculus students P= Figure 1: A normal integral

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Chapter 6 Analyzing Accumulated Change: Integrals in Action

Section Distributions of Random Variables

The Normal Distribution

6.2 Normal Distribution. Normal Distributions

Data Analysis and Statistical Methods Statistics 651

Introduction to Business Statistics QM 120 Chapter 6

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Central Limit Theorem

Chapter 18: The Correlational Procedures

STAT 201 Chapter 6. Distribution

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

CHAPTER 5 Sampling Distributions

4: Probability. What is probability? Random variables (RVs)

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Chapter 6. The Normal Probability Distributions

Chapter 5. Sampling Distributions

Multiple regression - a brief introduction

Chapter 7 Sampling Distributions and Point Estimation of Parameters

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Business Statistics 41000: Probability 3

On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2

Expected Value of a Random Variable

Real Estate Private Equity Case Study 3 Opportunistic Pre-Sold Apartment Development: Waterfall Returns Schedule, Part 1: Tier 1 IRRs and Cash Flows

Statistical Intervals (One sample) (Chs )

Prof. Thistleton MAT 505 Introduction to Probability Lecture 3

Random variables The binomial distribution The normal distribution Other distributions. Distributions. Patrick Breheny.

E509A: Principle of Biostatistics. GY Zou

5.1 Mean, Median, & Mode

MA 1125 Lecture 18 - Normal Approximations to Binomial Distributions. Objectives: Compute probabilities for a binomial as a normal distribution.

The topics in this section are related and necessary topics for both course objectives.

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

Lecture 9. Probability Distributions. Outline. Outline

Module 4: Probability

Lecture 9. Probability Distributions

Statistics, Measures of Central Tendency I

5.1 Personal Probability

11.5: Normal Distributions

The "bell-shaped" curve, or normal curve, is a probability distribution that describes many real-life situations.

Introduction to Statistical Data Analysis II

Math 227 Elementary Statistics. Bluman 5 th edition

Section Distributions of Random Variables

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

7 THE CENTRAL LIMIT THEOREM

4 Random Variables and Distributions

A useful modeling tricks.

2011 Pearson Education, Inc

Stat511 Additional Materials

Valuation Public Comps and Precedent Transactions: Historical Metrics and Multiples for Public Comps

Chapter 6: Supply and Demand with Income in the Form of Endowments

Binomial Random Variables. Binomial Random Variables

We use probability distributions to represent the distribution of a discrete random variable.

Elementary Statistics

Statistics 6 th Edition

Using the Central Limit Theorem It is important for you to understand when to use the CLT. If you are being asked to find the probability of the

A Derivation of the Normal Distribution. Robert S. Wilson PhD.

Chapter 8 Estimation

What s Normal? Chapter 8. Hitting the Curve. In This Chapter

Binomial Random Variable - The count X of successes in a binomial setting

Continuous Probability Distributions & Normal Distribution

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Chapter 4. The Normal Distribution

Numerical Descriptive Measures. Measures of Center: Mean and Median

Statistical Methods in Practice STAT/MATH 3379

1/12/2011. Chapter 5: z-scores: Location of Scores and Standardized Distributions. Introduction to z-scores. Introduction to z-scores cont.

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

= 0.35 (or ˆp = We have 20 independent trials, each with probability of success (heads) equal to 0.5, so X has a B(20, 0.5) distribution.

AMS7: WEEK 4. CLASS 3

Normal Probability Distributions

Section 0: Introduction and Review of Basic Concepts

Theoretical Foundations

Review: Population, sample, and sampling distributions

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Transcription:

Distributions 1. What are distributions? When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution? In other words, if we have a large number of Y s, what kind of shape does the frequency histogram have? We talked about some of these shapes already (shapes of distributions, etc.) The basic idea (simplified): We take a sample and measure some random variable (e.g. blood oxygen levels of bats). We look to see how this random variable is distributed. Based on this distribution, we then make estimates and/or perform tests that might reveal interesting information about the population. But how we proceed is based on how the random variable is distributed. Not only that, but many of our analyses and tests rely on particular kinds of distributions. Why is this so important? Because the probabilities of getting a particular result are different based on the outcome. For example, consider the two following distributions for length of an insect: Obviously, the probability of of our insect being less then 10 cm depends a lot on the shape of the distribution.

So here are some examples of examples of distributions: 2. The binomial distribution If we toss a coin 25 times, and if Y = number of heads, then Y will have a binomial distribution We write Y ~ Binomial. The ~ symbol means distributed as Often we put the parameters (more on this soon) of our distribution in parenthesis after the type of distribution, for example: Y ~ Binom(25, 0.5) Notice we abbreviated the distribution (we usually do). What are 25 and 0.5? They are parameters (n = 25, and p = 0.5), but we'll save up the details for another page or two. If we measure heights of a sample of men on campus (Y = heights of men on campus), we can be pretty sure that Y will have a normal distribution. We write Y ~ N (The normal distribution is almost always abbreviated N ). We already used this distribution when we did probability. Here it is again: n p y y n y 1 p From now on we will definitely be using y in stead of j. So what makes this a distribution? Because we can use this to calculate all possible outcomes and then see what the distribution of Y looks like. Here's an example using our coin. We toss it 10 times and note that n = 10, p = 0.5 (these are the parameters of our distribution, but more soon).

We get: Heads Tails Probability 10 0 0.00098 9 1 0.00977 8 2 0.04395 (You should recognize 7 3 0.11719 some of these numbers) 6 4 0.20508 5 5 0.24609 4 6 0.20508 3 7 0.11719 2 8 0.04395 1 9 0.00977 0 10 0.00098 Sum: 1.00000 A summary like this can be very useful. For example, we can now easily calculate the probability that Y = 0, 1 or 2 (where Y = number of heads): Pr{0 Y 2} = 0.00098 + 0.00977 + 0.04395 = 0.05470 Also notice that if we add up all the possible outcomes we get 1.0: Pr{0 Y 10} = 1.0 This is important but ought to be obvious: if we toss a coin, something has to happen, and the above list is every single possibility! If we want to see what the distribution of Y looks like we can plot it:

So what (finally), are parameters? Parameters determine what our distribution looks like. For a random variable, Y, we need to know two things to figure out what the distribution of Y looks like: 1) What kind of distribution Y has. 2) What the parameters of this distributions are. Since we're looking at the binomial distribution, let's change the parameters and see what happens to Y: Instead of n = 10 and p = 0.5, let's use n = 3 and p = 0.2. Notice that now Y can go from 0 to 3. So let's again calculate all the probabilities for Y: Y Probability 0 0.512 1 0.384 2 0.096 3 0.008 Sum 1.000 And if we plot Y, this time our distribution looks rather different:

(See also figure 3.13 p. 109 [3.15, p. 106] {3.6.4, p. 111} in your text using n = 5 and p = 0.39 {remember that the 4 th edition uses p =.37}. Again, to emphasize this: the parameters determine what our particular distribution looks like! 3. About distributions in general: We've learned several things about distributions: 1) The shape of a distribution can vary based on the parameters. 2) All possible outcomes must add up to one. a) If Y is discrete, this is easy. For example, with the binomial what we are saying is: n ( n y) py (1 p) n y = 1 y=0 In other words, take take all possible values of y, put them into the binomial distribution formula, and add these up and you'll get 1.00. b) If Y is continuous, then the area under the curve formed by our distribution will add up to one. How do we add up all possible outcomes if our distribution is continuous? We need calculus. Don't worry, you're not responsible for anything involving calculus. But what we're saying is: + (continuous distribution of y) dy = 1 Historical note: the symbol is short for sum (same word in Latin). In calculus we can add up a sequence of infinitely small things, which, in this case must add up to one. Let's use the normal distribution as an example. 4. The normal distribution The importance of the normal distribution to statistics can not be overemphasized. The Germans even put this on the old 10DM bill! Sometimes also known as the Gaussian distribution.

So what is it? f ( y) = 1 σ 2π 1 ( y μ e 2 σ ) 2 Good! Now you know everything, right? Seriously, here are a couple of examples from your text: Example 4.2, p. 122-123 [120-121] {4.1.3, p. 122}: We're looking at the thickness of eggshells from hens, and somehow we know that: μ =.38 mm, and σ =.03 mm This gives us the following picture (note the scale on the x-axis): Example 4.4, p. 124 [p. 121] {4.1.4, p. 123}: This time we're looking at the number of white blood cells per cubic mm, and again we somehow know that:: μ = 7,000 cells/mm 3, and σ = 100 cells/mm 3 (By the way, are these data really normally distributed?)

Some comments on the normal distribution: The curve peaks at the mean (μ) The inflection (direction of the curve) changes at ± σ. See also fig. 4.6, p. 125 [122] {fig. 4.2.1, p. 124}) The parameters for the normal distribution are μ and σ. If I know what these are, I know what my normal distribution looks like. The curve for the normal distribution actually goes from - to +. The area under the curve will add up to 1, or using calculus we can say: + 1 σ 2π 1 ( y μ e 2 σ ) 2 dy = 1 (Again, since this is calculus you are not responsible for the above equation). So now we know the normal distribution is and what it looks like. Why is it so important? 1) Because many things, particularly in biology, have a normal, or approximately normal distribution: Heights, weights, IQ, many blood hormone levels, etc.

5. The normal distribution and probability: 2) Because of something called the Central Limit Theorem. Well get back to this. If you re really curious, see section 5.4 in your text. (Basically it implies that even if Y is not normally distributed, we can often still use a normal distribution in statistics). (It is one of the most important theorems/results in statistics). Since many things in biology (and elsewhere) have a normal distribution, we need to learn how to answer probability questions using the normal distribution. For example, suppose Y = height of male basketball players, and we want to know: Pr{Y < 6 } Incidentally, notice that: Pr{Y < 6 } = Pr{Y 6 }. Why? What we're asking is, what's the probability a male basketball player is less than 6 feet tall? If you know calculus, then you might think we could do: 6 1 σ 2 π 1 ( y μ e 2 σ ) 2 dy Unfortunately this doesn't work except for a few special values of y (notice also that we need to know μ and σ). Instead, we need to use normal distribution tables that list probabilities. If we know μ and σ we find a table for those values of μ and σ, and then find Pr{Y < 6 }. The obvious problem is that we would need an infinite number of normal tables, one for every possible combination of μ and σ. This is obviously impossible, so we need to do something else. The standard normal distribution. Instead, we use one normal distribution to calculate all our probabilities. This is called the standard normal distribution and has: μ = 0, and σ = 1 (= σ 2 )

Here s how we use this distribution: Subtract the mean from the distribution you re studying. This will obviously give you μ = 0. Divide by the standard deviation of the distribution you re studying. This will give you σ = 1. We call this new number Z, for z-score. We say Z ~ N(0,1) Here s the formula: Z = Y So if we use Z instead of our original Y, we only need to list our areas in one table and then use the standard normal (or sometimes z ) curve. Comment: usually we let a computer (or fancy calculator) spit out the answer.

So here's how we can calculate some probabilities using the standard normal (or z) curve/tables: Pr{Z > 1.53}: Let's look at what we want first (it's always a good idea to sketch/draw pictures of what you want): Table 3 in your text will give you the area less than a particular value of Z Go to table 3 in your text. Find 1.53. Read 1.5 off the column on the left side going down. Read the.03 off the top row going across. Now read across and down until these two values (1.5 and.03 in our example) intersect, and write down that number. This is the area of the normal curve that is below 1.53. You should see 0.9370. So we can write Pr{Z < 1.53} = 0.9370. But we want the area above 1.53: We remember that the total area under the curve = 1.0, so we can do: 1-0.9370 = 0.0630. And finally we can say: Pr{Y > 1.53} = 0.0630

Comment: since the standard normal distribution is symmetrical around 0, you could also do the following: Let's try Pr{-1.2 < Z < 0.8}: Change the sign of the value we're interested in: instead of 1.53, use -1.53. Now we can just look up Pr{Y < -1.53} and we get 0.0630. This is a little bit of a shortcut - if it's confusing, don't worry about it and stick to method presented above. Again, let's look at our area first: Look up the values in the z-table for for 0.8 and -1.2: Pr{Z < -1.2} = 0.1151 Pr{Z < 0.8} = 0.7881 And since we want the area between these two z-values, we can subtract one from the other: Pr{-1.2 < Z < 0.8} = Pr{Z < 0.8} - Pr{Z < -1.2} = 0.6730 Look in your text on p. 127 [125] {126} for this example.

But of course, we usually deal with Y, not Z. So let's do a practical example. Exercise 4.3, p. 133 [131] (we're only doing select parts of the exercise): For Swedish men, we somehow know that μ = 1,400 gm, and σ = 100 gm. a) Find the probability that a (random) brain is 1,500 gm or less (note that your text asks the question just a little differently, but it works out the same): Pr{Y < 1,500}: Convert to Z: Z = 1500 1400 100 = 1 very convenient! Note that Pr{Y < 1,500} Pr{Z < 1.0} (The symbol means exactly equivalent to ) Before we go on, let's look at some pictures: Notice that the areas are identical. Look up 1.00 in table 3 and get 0.8413. So Pr{Y < 1,500} = Pr{Z < 1.0} = 0.8413 c) Find the probability that a brain is 1,325 gm or more: Pr{Y > 1,325}: Z = 1325 1400 100 = 0.75 Again, remember that Pr{Y > 1,325} Pr{Z > -0.75}

Just one picture this time: And, of course, we want the area in gray. Look up -0.75 in table 3 and get 0.2266. Remember that this time we need to subtract this result from 1: So Pr{Y > 1,325} = Pr{Z > -0.75} = 1-0.2266 = 0.7734 f) (last one) find probability that a brain is between 1,200 and 1,325 gm: Pr{1,200 < Y < 1,325}: This time we need two values of Z: Z 1 = 1200 1400 100 = 2.0 Z 2 = 1325 1400 100 = 0.75 Look up Z 1 to get 0.0228. Look up Z 2 to get 0.2266. So we have: Pr{1,200 < Y < 1,325} = Pr{-2.0 < Z < -0.75} =.2266 -.0228 =.2038