Data Analysis and Statistical Methods Statistics 651

Similar documents
Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651

MAKING SENSE OF DATA Essentials series

Chapter 4 Continuous Random Variables and Probability Distributions

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Data Analysis and Statistical Methods Statistics 651

ECO220Y Continuous Probability Distributions: Normal Readings: Chapter 9, section 9.10

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Chapter 4 Continuous Random Variables and Probability Distributions

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

The Normal Probability Distribution

Continuous Probability Distributions & Normal Distribution

Chapter 7 1. Random Variables

Theoretical Foundations

Math 227 Elementary Statistics. Bluman 5 th edition

ECON 214 Elements of Statistics for Economists 2016/2017

Continuous Distributions

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Introduction to Business Statistics QM 120 Chapter 6

Chapter 6. The Normal Probability Distributions

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Inverse Normal Distribution and Approximation to Binomial

Probability. An intro for calculus students P= Figure 1: A normal integral

AP Statistics Chapter 6 - Random Variables

Statistical Methods in Practice STAT/MATH 3379

Standard Normal, Inverse Normal and Sampling Distributions

Section Introduction to Normal Distributions

ECON 214 Elements of Statistics for Economists

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

Data Analysis and Statistical Methods Statistics 651

Lecture 6: Chapter 6

Prob and Stats, Nov 7

Statistics 511 Supplemental Materials

Lecture 9. Probability Distributions. Outline. Outline

Statistics, Measures of Central Tendency I

Lecture 9. Probability Distributions

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

NORMAL RANDOM VARIABLES (Normal or gaussian distribution)

Statistics for Managers Using Microsoft Excel/SPSS Chapter 6 The Normal Distribution And Other Continuous Distributions

STAT 201 Chapter 6. Distribution

The Normal Distribution

2011 Pearson Education, Inc

Business Statistics 41000: Probability 3

The Binomial Distribution

8.2 The Standard Deviation as a Ruler Chapter 8 The Normal and Other Continuous Distributions 8-1

The topics in this section are related and necessary topics for both course objectives.

Introduction to Statistics I

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Data Analysis and Statistical Methods Statistics 651

Chapter 4 Probability and Probability Distributions. Sections

Chapter 6. y y. Standardizing with z-scores. Standardizing with z-scores (cont.)

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Continuous Random Variables and Probability Distributions

Chapter 4. The Normal Distribution

Chapter 7. Sampling Distributions

Normal Model (Part 1)

Counting Basics. Venn diagrams

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Lecture 8. The Binomial Distribution. Binomial Distribution. Binomial Distribution. Probability Distributions: Normal and Binomial

Statistics for Business and Economics

The Normal Distribution

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

STAB22 section 1.3 and Chapter 1 exercises

Chapter ! Bell Shaped

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

Expected Value of a Random Variable

Basic Procedure for Histograms

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 10: Continuous RV Families. Prof. Vince Calhoun

What was in the last lecture?

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

Normal Curves & Sampling Distributions

Statistics 6 th Edition

Making Sense of Cents

CS 237: Probability in Computing

Discrete Probability Distribution

Part V - Chance Variability

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

MidTerm 1) Find the following (round off to one decimal place):

Statistical Intervals (One sample) (Chs )

Both the quizzes and exams are closed book. However, For quizzes: Formulas will be provided with quiz papers if there is any need.

Introduction to Statistical Data Analysis II

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Shifting and rescaling data distributions

Standard Normal Calculations

Chapter 7: Random Variables

Probability Distributions II

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

STAT 157 HW1 Solutions

Section Distributions of Random Variables

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Transcription:

Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao The binomial: mean and variance Recall that the number of successes out of n, denoted by S n is a random variable taking values in {0,1,..., n} (eg. S 4 is the number of successes out of 4 and has the outcomes {0,1,2, 3,4}). S n has all the properties of a random variable, we can associate a probability to each outcome (the binomial distribution) and it has a probability plot. Since it has a probability plot, it must have a center and a spread, therefore it has a mean and a variance. The mean of a binomial is n π. The variance of a binomial is n π (1 π). 1 Example 1 Example 2 Suppose we make 4 independent trials, each trial can take the value {0, 1}. The probability of a success is P(X = 1) = 0.4 and the probability of a failure is P(X = 0) = 0.6. We are interested in the number of successes out of 4, this is the random variable S 4. Using the arguments we gave earlier we can show that: P(S 4 = 0) = (0.6) 4, P(S 4 = 1) = 4 (0.6) 3 (0.4) P(S 4 = 2) = 6 (0.6) 2 (0.4) 2, P(S 4 = 3) = 4 (0.6) 1 (0.4) 3 P(S 4 = 4) = (0.4) 4 Hence we can plot the histogram, which has a center and a spread. The mean of S 4 is 4 0.4 = 1.6 and the variance is 4 0.4 0.6 = 0.96. Suppose we make 50 independent trials, each trial can take the value {0, 1}. The probability of a success is P(X = 1) = 0.5 and the probability of a failure is P(X = 0) = 0.5. We are interested in S 50 (number of successes out of 50). The average number of successes is 50 0.5 = 25. Because S 50 is a random variable, it has a histgram (distribution) and thus a variance (measure of spread). Its variance is 50 0.5 0.5 = 12.5. This measures how spread out the distribution is from the mean. Make a rough sketch of the histogram. We see that it is symmetric about 25. 2 3

Example 3 Suppose we make 50 independent trials, each trial can take the value {0, 1}. The probability of a success is P(X = 1) = 0.8 and the probability of a failure is P(X = 0) = 0.2. We are interested in S 50 (number of successes out of 50). The average number of successes is 50 0.8 = 40. Its variance is 50 0.8 0.2 = 8. Observe that the variance is less than the previous example (same number of people, just different probabilities). This the distribution is concentrated about the mean of 40. Make a rough sketch of the histogram. The distribution is not symmetric but it is close to symmetric locally about 40. Observations on the binomial distribution We showed if P(X = 1) = 0.8 and P(X = 0) = 0.2, then for n = 4 the mean is 4 0.8 (the variance is 4 0.2 0.8) and the histogram is right skewed (leaning towards the right). This means we were more likely to observe large values of S n (in terms of surveys this means a lot of people say yes). On the other hand if P(X = 1) = 0.2 and P(X = 0) = 0.8, then for n = 4, the mean is 4 0.2 (the variance is 4 0.2 0.8) and the histogram is left skewed (leaning towards the left). This meant we were more likely to observe small values of S n (in terms of surveys this means a lot of people xsay no). If P(X = 1) = P(X = 0) = 1/2, then for n = 4, the mean is 4 0.5 = 2 and we are most likely to observe in the middle of the interval [0,4]. This time the histogram is symmetric (about 2). 4 5 Now suppose the number of people we sample increases (we go from n = 4 to n = 100). The above observations hold true, but what we observe is that around the peak of the histogram there is a symmetry (regardless of whether overall there is symmetry or not). In other words, regardless of the overall skew, about the peak its close to symmetric and, as we shall demonstrate, is almost normal (as in the distribution). Approximations of the binomial distribution Suppose that the number of trials, n, is quite large and we want to evaluate the probability that 20 or less people out of 100 prefered apple juice to orange. This means calculating the probability P(S 100 20) = P(S 100 = 0) + P(S 100 = 1) +... + P(S 100 = 20). Calculating this is cumbersome! We would like to have a quick and dirty way of calculating this probability. Look at the handout approximation binomial lecture7.pdf to see what happens if n (number of trials) is large and the probabilities π and 1 π are not too small. We see that if n is quite large, π and 1 π are not too small, the distribution of S n looks rather like a bell shape. 6 7

But you cannot say that the distribution curves get closer and closer, because as n grows the mean of S n gets larger (recall that the mean is n/2) and the variance also grows (recall the variance is n/4). So the distribution curve keeps moving to the right (because the mean is moving to the right) and because the variance is getting larger (notice that the range of S n is [0, n]) the distribution is getting more and more stretched (look at the plots). To stop the distributions from shifting getting stretched, we transform the x-axis (normalise) but keep the probabilities almost the same as before, actually they need to be multiplied by the standard deviation n/4 (see the example of Y4 in the handout). Subtracting the mean means shifting the range from [0, n] (centered at n/2) to [ n/2,n/2] (centered at 0). Dividing by the standard deviation means squashing the range from [ n/2,n/2] to [ n, n]. Indeed you will see that most of Y n s will lie in the interval [ 3,3]. This leads to the normalisation Y n = Sn n/2 n/4 approximation binomial lecture7.pdf. and is discusssed in In the general case that the success and failure probabilities are not the same and the probability of a success is π, we have the normalisation: Y n = Sn nπ. We normalise (or standardise) the distribution by subtracting the mean from S n (this centers it about zero) and squashing it (stopping it spreading out) by dividing by the standard deviation. 8 9 The distribution of Y n = S n nπ Suppose we plot the distribution of Y n = is Y n = S n nπ (that S n nπ against the probabilities like the plots in approximation binomial lecture7.pdf). deviation and plot this value against Y n, but don t worry too much about this. What we see is: When n is large, the plots have a very distinctive bell shape. (i) It is centered about zero and about 68% of the Y n s lie in the interval [ 1,1]. (ii) It closely approximates the standard normal distribution (which we define below). Aside: In the plots you need to multiple the probabilities by the standard 10 11

Aside: convergence of the distributions The normal distribution What is convergence: Suppose I walk one mile on day one, 1/2 a mile on day two (in total I have walked 1.5 miles), 1/4 mile on day three (in total I have walked 1.75 miles), 1/8 mile on day four (in total I have walked 1.875 miles), 1/16 mile on day five (in total I have walked 1.9735 miles) etc. As the days pass the total distance travelled does not change much and it gets closer and closer to two. We see that the total distance travelled converges to two. The same idea is true for the plots of Y n = Sn nπ against the probabilities P(Y n ). As n gets large the density plots do not change very much and in its limit converge to the normal distribution. We often find that the distribution of random variables that arise in nature have a distinctive shape. This distinctive shape of bell shape curve is called a normal distribution. The arises all over the place: The distribution of bullets when fired at a target. The outcomes of social surveys. The normal distribution is a family of densities which are different but have certain characteristics in common. The normal distribution (sometimes called the Gaussian) is the most commonly used distribution in statistics. 12 13 The normal distribution (cont.) The standard normal - page 1090 of Longnecker and Ott It is completely defined by two parameters, the mean and variance. The mean µ. The variance σ 2. Formally the density function of the normal distribution looks like: ( ) 1 y = f(x) = exp (x µ)2 2πσ 2 σ 2 (you don t have to remember this!) This is a symmetric curve which is centered about µ and with spread σ. See handout: normal distribution introduction.pdf. The normal tables give the probabilities P(Z < z) in the special case Z N(0,1) (the so called standard normal): mean is zero (µ = 0) variance is one σ 2 = 1. Look at the normal tables. Suppose we want to use it to evaluate the P(Z < b). The two sides of the table give together b, the inside of the table yields the probability P(Z < b). Suppose we want to evaluate P(Z 1.23), since 1.23 = 1.2 + 0.03, the first column gives the 1.2 values and first row gives the 0.03 value. We find the 1.2 and 0.03 values and locate the value in the inside of the table where this column and row intersect. 14 15

This intersection point is the probability, that is P(Z 1.23) = 0.8907. Examples - standard normal P(Z<b) (a) Evaluate P(0.6 < Z 1.3). (b) (i) P(Z 1.1), (ii) P(Z 0.6), (iii) P(Z 3.0), (iv) P(Z 2.12). (c) How to interprete P(Z 1.1) and P(Z 3.0)? (d) (i) P(Z > 1.1), (ii) P(Z > 0.6), (iii) P(Z > 3.0), (iv) P(Z > 2.12). 0 b (e) (i) P( 1.1 < Z 0.6), (ii) P( 2.12 < Z 3.0), (iii) P( 2.12 < Z 0) The area under the graph is the probability, which corresponds to the value given in the table. Look at the handout standard normal tables.pdf for the solutions. 16 17