3. Probability Distributions and Sampling

Size: px
Start display at page:

Download "3. Probability Distributions and Sampling"

Transcription

1 3. Probability Distributions and Sampling 3.1 Introduction: the US Presidential Race Appendix 2 shows a page from the Gallup WWW site. As you probably know, Gallup is an opinion poll company. The page reports the results of a poll, taken during the period October It shows 46% of likely voters supporting George Bush rather than one of the three other candidates (Al Gore, Ralph Nader and Pat Buchanan) or being undecided in how they will eventually vote. The poll was based on a sample of 769 people likely to vote, who were contacted by telephone. It is typical of polls carried out in the run up to an election, or of a survey on customer opinions of a product. Such polls are very useful, but we must be careful how we interpret them, since they always contain an error. This is especially important when they claim to show small changes in the underlying popularity of some person or item, or differences between the popularity of people or items when that difference is small. The page notes the existence of the error. Towards the bottom of the page it says "one can say with 95% confidence that the margin of sampling error is +/ 4 percentage points". Our aim is to understand: why the error occurs what the phrase "95% confidence" means how this value of 4 percentage points is calculated how it depends on the sample size used in the poll Though it is not our main topic here, you may also be interested in the detail of how Gallup carry out their polls, and can read about it in Appendix 1, which is also taken from the Gallup WWW site. 3.2 The Unreliability of a Sample Whenever we use sampling with a relatively small number of people to infer something about a very large number of people, we will make an error. To see this, imagine that we could repeat the poll of 769 people with a different set of 769 people. We take care that the second set is chosen in a similar way to the first set, and there is therefore no reason why it should produce a different result on average. But it almost certainly will in a single case. This is because the randomness in how a randomly-chosen person votes gets only partly averaged out by the sample. CE8: Quantitative Research Methods Ian Rudy 1

2 We can discover how much the percentage of people voting for George Bush varies from poll to poll even if there is no underlying change in his popularity by simulating polls using a computer. We will assume that the percentage of voters in the population who will vote for George Bush is 46%. If we simulate 100 polls, each involving 769 people, we find that the percentage of people in each sample who say they will vote for him is close to 46%, but fluctuates around it. We can draw a histogram to show this: Percent Voting Bush in a Sample Poll Figure 3.1: Histogram of the Percentage of People Voting for Bush in Polls of 769 People (Source: simulated by computer, assuming 46% of the population will vote Bush) The half-width of the histogram is about 5 percentage points. This is close to the value of "+/ 4 percentage points" that Gallup mentioned. So we can understand why the error occurs: it is just because we cannot predict in advance how a randomly chosen individual votes, and that randomness only gets partly averaged out when we average over a sample of 769 people. The only way to average out the randomness is to take a very, very large sample: in fact, we need to take the entire population. But we can see from our graphs what spread of poll results is likely, and hence from this get a rough idea of how large the error in a single poll could be. 3.3 How the Unreliability of a Sample Depends on Its Size It is interesting to ask how the width of the histogram depends on the size of the poll. We can simulate polls of any size using the computer, and if we do so, we get the results shown in Figure 3.2. The obvious point from these graphs is that they get narrower as the sample size increases. This is quite reasonable: the randomness in individuals is more effectively averaged out as the sample size increases. What is less obvious is how the graphs get narrower. It turns out that the width of the graph is approximately proportional to 1/ n, where n is the sample size. This can be proven mathematically: we do not need to know how, but we do need to know that it is true. It is an CE8: Quantitative Research Methods Ian Rudy 2

3 immensely valuable rule, since it allows us to predict the width of the graph without having to carry out many polls, or do the simulations with a computer. CE8: Quantitative Research Methods Ian Rudy 3

4 CE8: Quantitative Research Methods Ian Rudy 4

5 3.4 The Shape of the Graph It is also very useful to know what shape the graph has. The shapes of the graphs above are rather irregular. This is because we had only 100 polls. If we have a larger number, then the curve becomes smoother. Figure 3.3 shows the results of 5000 polls, with 769 people per poll Fraction Voting Bush in a Sample Poll Figure 3.3: Histogram of the Fraction of People Voting for Bush in 5000 Polls of 769 People (Source: simulated by computer, assuming 46% of the population will vote Bush) A problem about drawing these histograms with frequencies on the y-axis is that the values depend on the number of polls we ran. If we are only interested in the shape of the of histogram, we should do something to fix this. It is conventional to divide the height of each of the bars in such a way that the area under the graph becomes equal to 1.0. We then get what is known as a probability distribution: see Figure Fraction Voting Bush in a Sample Poll Figure 3.4: Probability Distribution of the Percentage of People Voting for Bush in Polls of 769 People (Source: simulated by computer, assuming 46% of the population will vote Bush) CE8: Quantitative Research Methods Ian Rudy 5

6 The reason it is known as a probability distribution is that the area of any one bar is equal to the probability that a randomly chosen poll lies within the range of the bar. The total area of any successive set of bars is equal to the probability that a randomly chosen poll lies within the range of the bars. Although it doesn't sound an especially useful property, it turns out that it is. The vertical axis does not any longer represent a frequency. Technically, it is called a probability density, but you do not need to worry about this. Note that it is common for people to talk about a "distribution", regardless of whether the area underneath it is equal to 1.0. You will also hear people talk of "normalising" the distribution. This just means altering the heights of the bars so that the total area underneath is equal to 1.0. It turns out that the shape of our probability distribution is approximately the same as that of an idealised distribution, called the Normal distribution. (This word Normal is nothing to do with the process of normalising the distribution, described above.) Figure 3.5 shows the appropriate Normal distribution for this case superimposed on our distribution of poll results. The Normal distribution has a smooth bell shape, and is symmetrical about its middle Fraction Voting Bush in a Sample Poll Figure 3.5: Probability Distribution of the Percentage of People Voting for Bush in Polls of 769 People, Together with Normal Distribution In fact, the distribution of the poll results would always tend towards a Normal distribution as long as we do enough polls, and as long as the sample size per poll is above about 30. This is why the Normal distribution is so-called: it is what "normally" happens. The Normal distribution is also a reasonable approximation when the sample size per poll is less than 30 but (if we are being fussy) the shape is slightly different. It is better described by what is known as a Binomial distribution. You can read about the Binomial distribution in Clare Morris' book if you wish (it is not part of this Quantitative Research Methods course itself). CE8: Quantitative Research Methods Ian Rudy 6

7 So we have now discovered that the distribution of the US Poll samples follows a Normal distribution. This is incredibly useful, since we can find anything we want to know about the Normal distribution from either from Tables or from a spreadsheet program such as Microsoft Excel. We therefore do not in real life have to repeat the poll 5000 times, or do the simulations by computer. We do, however, need to know how to specify the "half-width" in more precise terms than we have done so far. In practice, people do this by specifying a quantity known as the standard deviation. One standard deviation on either side of the mean accounts for about 66% of the area under the curve; two (more precisely, 1.96) standard deviations on either side of the mean accounts for 95% of the area under the curve. In the case of the US Poll example, the formula for the standard deviation of the samples is given by: P(100-P)/n In general, the standard deviation of samples is called the Standard Error, and Clare Morris uses the term STEP (STandard Error of Percentages) for the particular kind of standard error given by the above formula. 3.5 The Error in Polls We can now sum up what we have found. Let us denote the underlying popularity in the population of a person or item (expressed as a percentage of people who would support/buy the person or item) by P. Then the popularity of the person or item a single poll of n people (assuming n to be more than about 30) will be distributed according to a Normal distribution, with: mean = P standard deviation (or STEP) = P(100-P)/n. In the case above, George Bush's popularity was estimated to be 46%, so P=46. The poll size, n, was 769. We find that the STEP is 46(100-46)/769 = To find the error in a poll, we use the formula: error = Z*STEP Z is a number we choose: if we take Z=1.96, we cover 95% of the area under the Normal Distribution 1. Hence we would expect 95% of poll results to be within ±1.96(1.80) = ±3.53% of 1 You might find it helps to remember that Z is actually just a number of standard deviations, measured away from the centre of the Normal Distribution. Increasing Z increases the extent to which we cover the whole area under the curve. CE8: Quantitative Research Methods Ian Rudy 7

8 ours. The largest possible value of this error occurs when P=50 (when the underlying popularity of the person is 50%) and our value of P=46 is so close to this as to make the difference negligible. This is the sense in which Gallup said "+/ 4 percentage points". 3.6 A More General Note About Distributions The Normal Distribution is only one of many theoretical distributions that arise when we analyse data. You do not need to know about many of these distributions, but you should be aware of how people describe distributions in general. They usually specify three things: the Location of the Distribution In simple terms, this just means where on the horizontal axis the "middle" (in some rough sense) of the distribution is. We can calculate this for the set of poll results by taking the mean (that is, the average) of the polls. Say we do N polls, and the i th poll gives Bush a fraction of the vote equal to x i, then the mean is just: mean = i=n xi i=1 N The Σ sign just means a sum, that is, x 1 + x 2 + x x N the Spread (or "width") of the Distribution The spread is an indication of how wide the graph is. In principle, one could indicate the spread by calculating the range of poll results (that is, the biggest minus the smallest). This has the advantage that the calculation is simple and easy to understand. However, in practice using the range is not a good idea, for various reasons, not least of which is that it depends on only two values (the biggest and the smallest). If one of these is abnormally large or small for the particular data one has available, then the range can be significantly affected. In practice therefore people tend to use the standard deviation instead. The most general formula for the standard deviation is: i=n (x i - population mean) 2 i=1 N (NB: In practice, you will probably need to use (N-1) on the bottom of this formula instead of N. See the note below) There are good reasons for using what appears at first sight to be an unnecessarily complicated formula! All it is doing is finding the average squared difference between the data values and the CE8: Quantitative Research Methods Ian Rudy 8

9 mean data value, with a square root to make sure the units of the standard deviation are the same as those of the original data values. The size of the standard deviation tends to be around one quarter or one sixth of the full width of the distribution, though in detail this depends on the shape of the distribution and the particular sample one happens to be working with. The square of the standard deviation is known as the variance. It is worth noting here that the above formula for standard deviation is not used very often. Instead, one uses a formula with (N-1) on the bottom rather than N. You do not need to understand in detail reasons for the differences between these two versions of the formula. The key reason for the difference is that in practice we tend not to know the population mean in the formula above, and have to estimate it using the mean of the sample we have collected. This turns out to be OK as long as one changes the N into an (N-1). the Shape of the Distribution In principle describing the shape is tricky, as there might be lots of different shapes. But in practice, the Normal distribution matches many of the shapes well. The mean and standard deviation vary from situation to situation (in this case they depend upon the underlying popularity of George Bush amongst the US population, and the size of the sample in each poll), but once one has calculated or specified these, there is a single Normal distribution curve, and we can use this to predict the distribution of individual poll results. CE8: Quantitative Research Methods Ian Rudy 9

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 05 Normal Distribution So far we have looked at discrete distributions

More information

Statistical Methods in Practice STAT/MATH 3379

Statistical Methods in Practice STAT/MATH 3379 Statistical Methods in Practice STAT/MATH 3379 Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics Sam Houston State University Overview 6.1 Discrete

More information

ECON 214 Elements of Statistics for Economists

ECON 214 Elements of Statistics for Economists ECON 214 Elements of Statistics for Economists Session 7 The Normal Distribution Part 1 Lecturer: Dr. Bernardin Senadza, Dept. of Economics Contact Information: bsenadza@ug.edu.gh College of Education

More information

Statistical Intervals (One sample) (Chs )

Statistical Intervals (One sample) (Chs ) 7 Statistical Intervals (One sample) (Chs 8.1-8.3) Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to normally distributed with expected value µ and

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and

More information

On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2

On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2 Continuous Random Variable If I spin a spinner, what is the probability the pointer lands... On one of the feet? 1 2. On red? 1 4. Within 1 of the vertical black line at the top?( 1 to 1 2 )? 360 = 1 180.

More information

IOP 201-Q (Industrial Psychological Research) Tutorial 5

IOP 201-Q (Industrial Psychological Research) Tutorial 5 IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,

More information

Making Sense of Cents

Making Sense of Cents Name: Date: Making Sense of Cents Exploring the Central Limit Theorem Many of the variables that you have studied so far in this class have had a normal distribution. You have used a table of the normal

More information

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative STAT:10 Statistical Methods and Computing Normal Distributions Lecture 4 Feb. 6, 17 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowa.edu 1 2 Using density curves to describe the distribution of values of

More information

Probability. An intro for calculus students P= Figure 1: A normal integral

Probability. An intro for calculus students P= Figure 1: A normal integral Probability An intro for calculus students.8.6.4.2 P=.87 2 3 4 Figure : A normal integral Suppose we flip a coin 2 times; what is the probability that we get more than 2 heads? Suppose we roll a six-sided

More information

8.1 Estimation of the Mean and Proportion

8.1 Estimation of the Mean and Proportion 8.1 Estimation of the Mean and Proportion Statistical inference enables us to make judgments about a population on the basis of sample information. The mean, standard deviation, and proportions of a population

More information

Expected Value of a Random Variable

Expected Value of a Random Variable Knowledge Article: Probability and Statistics Expected Value of a Random Variable Expected Value of a Discrete Random Variable You're familiar with a simple mean, or average, of a set. The mean value of

More information

Business Statistics 41000: Probability 4

Business Statistics 41000: Probability 4 Business Statistics 41000: Probability 4 Drew D. Creal University of Chicago, Booth School of Business February 14 and 15, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office:

More information

Example: Histogram for US household incomes from 2015 Table:

Example: Histogram for US household incomes from 2015 Table: 1 Example: Histogram for US household incomes from 2015 Table: Income level Relative frequency $0 - $14,999 11.6% $15,000 - $24,999 10.5% $25,000 - $34,999 10% $35,000 - $49,999 12.7% $50,000 - $74,999

More information

Sampling Distributions and the Central Limit Theorem

Sampling Distributions and the Central Limit Theorem Sampling Distributions and the Central Limit Theorem February 18 Data distributions and sampling distributions So far, we have discussed the distribution of data (i.e. of random variables in our sample,

More information

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Statistics 431 Spring 2007 P. Shaman. Preliminaries Statistics 4 Spring 007 P. Shaman The Binomial Distribution Preliminaries A binomial experiment is defined by the following conditions: A sequence of n trials is conducted, with each trial having two possible

More information

Math 227 Elementary Statistics. Bluman 5 th edition

Math 227 Elementary Statistics. Bluman 5 th edition Math 227 Elementary Statistics Bluman 5 th edition CHAPTER 6 The Normal Distribution 2 Objectives Identify distributions as symmetrical or skewed. Identify the properties of the normal distribution. Find

More information

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s. STAT 515 -- Chapter 5: Continuous Distributions Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s. Continuous distributions typically are represented by

More information

Elementary Statistics

Elementary Statistics Chapter 7 Estimation Goal: To become familiar with how to use Excel 2010 for Estimation of Means. There is one Stat Tool in Excel that is used with estimation of means, T.INV.2T. Open Excel and click on

More information

Numerical Descriptive Measures. Measures of Center: Mean and Median

Numerical Descriptive Measures. Measures of Center: Mean and Median Steve Sawin Statistics Numerical Descriptive Measures Having seen the shape of a distribution by looking at the histogram, the two most obvious questions to ask about the specific distribution is where

More information

Chapter 7 1. Random Variables

Chapter 7 1. Random Variables Chapter 7 1 Random Variables random variable numerical variable whose value depends on the outcome of a chance experiment - discrete if its possible values are isolated points on a number line - continuous

More information

Lecture 6: Chapter 6

Lecture 6: Chapter 6 Lecture 6: Chapter 6 C C Moxley UAB Mathematics 3 October 16 6.1 Continuous Probability Distributions Last week, we discussed the binomial probability distribution, which was discrete. 6.1 Continuous Probability

More information

Spike Statistics. File: spike statistics3.tex JV Stone Psychology Department, Sheffield University, England.

Spike Statistics. File: spike statistics3.tex JV Stone Psychology Department, Sheffield University, England. Spike Statistics File: spike statistics3.tex JV Stone Psychology Department, Sheffield University, England. Email: j.v.stone@sheffield.ac.uk November 27, 2007 1 Introduction Why do we need to know about

More information

Chapter 9: Sampling Distributions

Chapter 9: Sampling Distributions Chapter 9: Sampling Distributions 9. Introduction This chapter connects the material in Chapters 4 through 8 (numerical descriptive statistics, sampling, and probability distributions, in particular) with

More information

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution Section 7.6 Application of the Normal Distribution A random variable that may take on infinitely many values is called a continuous random variable. A continuous probability distribution is defined by

More information

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Chapter 7 Sampling Distributions and Point Estimation of Parameters Chapter 7 Sampling Distributions and Point Estimation of Parameters Part 1: Sampling Distributions, the Central Limit Theorem, Point Estimation & Estimators Sections 7-1 to 7-2 1 / 25 Statistical Inferences

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 6 Normal Probability Distributions 6-1 Overview 6-2 The Standard Normal Distribution

More information

Chapter 6: Supply and Demand with Income in the Form of Endowments

Chapter 6: Supply and Demand with Income in the Form of Endowments Chapter 6: Supply and Demand with Income in the Form of Endowments 6.1: Introduction This chapter and the next contain almost identical analyses concerning the supply and demand implied by different kinds

More information

Introduction to Business Statistics QM 120 Chapter 6

Introduction to Business Statistics QM 120 Chapter 6 DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics QM 120 Chapter 6 Spring 2008 Chapter 6: Continuous Probability Distribution 2 When a RV x is discrete, we can

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.

More information

AP Statistics Chapter 6 - Random Variables

AP Statistics Chapter 6 - Random Variables AP Statistics Chapter 6 - Random 6.1 Discrete and Continuous Random Objective: Recognize and define discrete random variables, and construct a probability distribution table and a probability histogram

More information

The normal distribution is a theoretical model derived mathematically and not empirically.

The normal distribution is a theoretical model derived mathematically and not empirically. Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.

More information

MATH 264 Problem Homework I

MATH 264 Problem Homework I MATH Problem Homework I Due to December 9, 00@:0 PROBLEMS & SOLUTIONS. A student answers a multiple-choice examination question that offers four possible answers. Suppose that the probability that the

More information

Part V - Chance Variability

Part V - Chance Variability Part V - Chance Variability Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 1 / 78 Law of Averages In Chapter 13 we discussed the Kerrich coin-tossing experiment.

More information

The topics in this section are related and necessary topics for both course objectives.

The topics in this section are related and necessary topics for both course objectives. 2.5 Probability Distributions The topics in this section are related and necessary topics for both course objectives. A probability distribution indicates how the probabilities are distributed for outcomes

More information

STAB22 section 1.3 and Chapter 1 exercises

STAB22 section 1.3 and Chapter 1 exercises STAB22 section 1.3 and Chapter 1 exercises 1.101 Go up and down two times the standard deviation from the mean. So 95% of scores will be between 572 (2)(51) = 470 and 572 + (2)(51) = 674. 1.102 Same idea

More information

Normal Probability Distributions

Normal Probability Distributions Normal Probability Distributions Properties of Normal Distributions The most important probability distribution in statistics is the normal distribution. Normal curve A normal distribution is a continuous

More information

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny.

Random variables The binomial distribution The normal distribution Sampling distributions. Distributions. Patrick Breheny. Distributions September 17 Random variables Anything that can be measured or categorized is called a variable If the value that a variable takes on is subject to variability, then it the variable is a

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

Chapter 6. The Normal Probability Distributions

Chapter 6. The Normal Probability Distributions Chapter 6 The Normal Probability Distributions 1 Chapter 6 Overview Introduction 6-1 Normal Probability Distributions 6-2 The Standard Normal Distribution 6-3 Applications of the Normal Distribution 6-5

More information

CHAPTER 5 Sampling Distributions

CHAPTER 5 Sampling Distributions CHAPTER 5 Sampling Distributions 5.1 The possible values of p^ are 0, 1/3, 2/3, and 1. These correspond to getting 0 persons with lung cancer, 1 with lung cancer, 2 with lung cancer, and all 3 with lung

More information

Chapter 3. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition. Chapter 3 1. The Normal Distributions

Chapter 3. Density Curves. Density Curves. Basic Practice of Statistics - 3rd Edition. Chapter 3 1. The Normal Distributions Chapter 3 The Normal Distributions BPS - 3rd Ed. Chapter 3 1 Example: here is a histogram of vocabulary scores of 947 seventh graders. The smooth curve drawn over the histogram is a mathematical model

More information

1 Describing Distributions with numbers

1 Describing Distributions with numbers 1 Describing Distributions with numbers Only for quantitative variables!! 1.1 Describing the center of a data set The mean of a set of numerical observation is the familiar arithmetic average. To write

More information

Spike Statistics: A Tutorial

Spike Statistics: A Tutorial Spike Statistics: A Tutorial File: spike statistics4.tex JV Stone, Psychology Department, Sheffield University, England. Email: j.v.stone@sheffield.ac.uk December 10, 2007 1 Introduction Why do we need

More information

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management BA 386T Tom Shively PROBABILITY CONCEPTS AND NORMAL DISTRIBUTIONS The fundamental idea underlying any statistical

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao The binomial: mean and variance Recall that the number of successes out of n, denoted

More information

Chapter 8 Estimation

Chapter 8 Estimation Chapter 8 Estimation There are two important forms of statistical inference: estimation (Confidence Intervals) Hypothesis Testing Statistical Inference drawing conclusions about populations based on samples

More information

Focus Points 10/11/2011. The Binomial Probability Distribution and Related Topics. Additional Properties of the Binomial Distribution. Section 5.

Focus Points 10/11/2011. The Binomial Probability Distribution and Related Topics. Additional Properties of the Binomial Distribution. Section 5. The Binomial Probability Distribution and Related Topics 5 Copyright Cengage Learning. All rights reserved. Section 5.3 Additional Properties of the Binomial Distribution Copyright Cengage Learning. All

More information

Statistics, Measures of Central Tendency I

Statistics, Measures of Central Tendency I Statistics, Measures of Central Tendency I We are considering a random variable X with a probability distribution which has some parameters. We want to get an idea what these parameters are. We perfom

More information

A useful modeling tricks.

A useful modeling tricks. .7 Joint models for more than two outcomes We saw that we could write joint models for a pair of variables by specifying the joint probabilities over all pairs of outcomes. In principal, we could do this

More information

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions SGSB Workshop: Using Statistical Data to Make Decisions Module 2: The Logic of Statistical Inference Dr. Tom Ilvento January 2006 Dr. Mugdim Pašić Key Objectives Understand the logic of statistical inference

More information

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage 6 Point Estimation Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Point Estimation Statistical inference: directed toward conclusions about one or more parameters. We will use the generic

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Review of previous lecture: Why confidence intervals? Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Suppose you want to know the

More information

Frequency Distribution and Summary Statistics

Frequency Distribution and Summary Statistics Frequency Distribution and Summary Statistics Dongmei Li Department of Public Health Sciences Office of Public Health Studies University of Hawai i at Mānoa Outline 1. Stemplot 2. Frequency table 3. Summary

More information

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS

CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS CHAPTER 4 DISCRETE PROBABILITY DISTRIBUTIONS A random variable is the description of the outcome of an experiment in words. The verbal description of a random variable tells you how to find or calculate

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION In Inferential Statistic, ESTIMATION (i) (ii) is called the True Population Mean and is called the True Population Proportion. You must also remember that are not the only population parameters. There

More information

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage 7 Statistical Intervals Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Confidence Intervals The CLT tells us that as the sample size n increases, the sample mean X is close to

More information

Introduction to Statistics I

Introduction to Statistics I Introduction to Statistics I Keio University, Faculty of Economics Continuous random variables Simon Clinet (Keio University) Intro to Stats November 1, 2018 1 / 18 Definition (Continuous random variable)

More information

The Normal Distribution

The Normal Distribution 5.1 Introduction to Normal Distributions and the Standard Normal Distribution Section Learning objectives: 1. How to interpret graphs of normal probability distributions 2. How to find areas under the

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 12: Continuous Distributions Uniform Distribution Normal Distribution (motivation) Discrete vs Continuous

More information

Describing Data: One Quantitative Variable

Describing Data: One Quantitative Variable STAT 250 Dr. Kari Lock Morgan The Big Picture Describing Data: One Quantitative Variable Population Sampling SECTIONS 2.2, 2.3 One quantitative variable (2.2, 2.3) Statistical Inference Sample Descriptive

More information

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range.

MA 1125 Lecture 05 - Measures of Spread. Wednesday, September 6, Objectives: Introduce variance, standard deviation, range. MA 115 Lecture 05 - Measures of Spread Wednesday, September 6, 017 Objectives: Introduce variance, standard deviation, range. 1. Measures of Spread In Lecture 04, we looked at several measures of central

More information

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations MLLunsford 1 Activity: Central Limit Theorem Theory and Computations Concepts: The Central Limit Theorem; computations using the Central Limit Theorem. Prerequisites: The student should be familiar with

More information

Measure of Variation

Measure of Variation Measure of Variation Variation is the spread of a data set. The simplest measure is the range. Range the difference between the maximum and minimum data entries in the set. To find the range, the data

More information

Central Limit Theorem

Central Limit Theorem Central Limit Theorem Lots of Samples 1 Homework Read Sec 6-5. Discussion Question pg 329 Do Ex 6-5 8-15 2 Objective Use the Central Limit Theorem to solve problems involving sample means 3 Sample Means

More information

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table: Chapter8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables tthe value of the result of the probability experiment is a RANDOM VARIABLE. Example - Let X be the number

More information

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19) Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19) Mean, Median, Mode Mode: most common value Median: middle value (when the values are in order) Mean = total how many = x

More information

Lecture 9. Probability Distributions. Outline. Outline

Lecture 9. Probability Distributions. Outline. Outline Outline Lecture 9 Probability Distributions 6-1 Introduction 6- Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7- Properties of the Normal Distribution

More information

Sampling Distributions Chapter 18

Sampling Distributions Chapter 18 Sampling Distributions Chapter 18 Parameter vs Statistic Example: Identify the population, the parameter, the sample, and the statistic in the given settings. a) The Gallup Poll asked a random sample of

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

Software Tutorial ormal Statistics

Software Tutorial ormal Statistics Software Tutorial ormal Statistics The example session with the teaching software, PG2000, which is described below is intended as an example run to familiarise the user with the package. This documented

More information

The following content is provided under a Creative Commons license. Your support

The following content is provided under a Creative Commons license. Your support MITOCW Recitation 6 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make

More information

Lecture 9. Probability Distributions

Lecture 9. Probability Distributions Lecture 9 Probability Distributions Outline 6-1 Introduction 6-2 Probability Distributions 6-3 Mean, Variance, and Expectation 6-4 The Binomial Distribution Outline 7-2 Properties of the Normal Distribution

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://wwwstattamuedu/~suhasini/teachinghtml Suhasini Subba Rao Review of previous lecture The main idea in the previous lecture is that the sample

More information

CHAPTER 6 Random Variables

CHAPTER 6 Random Variables CHAPTER 6 Random Variables 6.1 Discrete and Continuous Random Variables The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Discrete and Continuous Random

More information

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Statistics (This summary is for chapters 18, 29 and section H of chapter 19) Statistics (This summary is for chapters 18, 29 and section H of chapter 19) Mean, Median, Mode Mode: most common value Median: middle value (when the values are in order) Mean = total how many = x n =

More information

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course).

4: Probability. Notes: Range of possible probabilities: Probabilities can be no less than 0% and no more than 100% (of course). 4: Probability What is probability? The probability of an event is its relative frequency (proportion) in the population. An event that happens half the time (such as a head showing up on the flip of a

More information

Lecture 2 Describing Data

Lecture 2 Describing Data Lecture 2 Describing Data Thais Paiva STA 111 - Summer 2013 Term II July 2, 2013 Lecture Plan 1 Types of data 2 Describing the data with plots 3 Summary statistics for central tendency and spread 4 Histograms

More information

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw MAS1403 Quantitative Methods for Business Management Semester 1, 2018 2019 Module leader: Dr. David Walshaw Additional lecturers: Dr. James Waldren and Dr. Stuart Hall Announcements: Written assignment

More information

Descriptive Statistics

Descriptive Statistics Chapter 3 Descriptive Statistics Chapter 2 presented graphical techniques for organizing and displaying data. Even though such graphical techniques allow the researcher to make some general observations

More information

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Math 2311 Bekki George bekki@math.uh.edu Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment Class webpage: http://www.math.uh.edu/~bekki/math2311.html Math 2311 Class

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 4 Random Variables & Probability Distributions Content 1. Two Types of Random Variables 2. Probability Distributions for Discrete Random Variables 3. The Binomial

More information

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes. Introduction In the previous chapter we discussed the basic concepts of probability and described how the rules of addition and multiplication were used to compute probabilities. In this chapter we expand

More information

CH 5 Normal Probability Distributions Properties of the Normal Distribution

CH 5 Normal Probability Distributions Properties of the Normal Distribution Properties of the Normal Distribution Example A friend that is always late. Let X represent the amount of minutes that pass from the moment you are suppose to meet your friend until the moment your friend

More information

Web Extension: Continuous Distributions and Estimating Beta with a Calculator

Web Extension: Continuous Distributions and Estimating Beta with a Calculator 19878_02W_p001-008.qxd 3/10/06 9:51 AM Page 1 C H A P T E R 2 Web Extension: Continuous Distributions and Estimating Beta with a Calculator This extension explains continuous probability distributions

More information

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5)

ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5) ECO220Y Estimation: Confidence Interval Estimator for Sample Proportions Readings: Chapter 11 (skip 11.5) Fall 2011 Lecture 10 (Fall 2011) Estimation Lecture 10 1 / 23 Review: Sampling Distributions Sample

More information

4.2 Probability Distributions

4.2 Probability Distributions 4.2 Probability Distributions Definition. A random variable is a variable whose value is a numerical outcome of a random phenomenon. The probability distribution of a random variable tells us what the

More information

DATA HANDLING Five-Number Summary

DATA HANDLING Five-Number Summary DATA HANDLING Five-Number Summary The five-number summary consists of the minimum and maximum values, the median, and the upper and lower quartiles. The minimum and the maximum are the smallest and greatest

More information

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s. STAT 515 -- Chapter 5: Continuous Distributions Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s. Continuous distributions typically are represented by

More information

23.1 Probability Distributions

23.1 Probability Distributions 3.1 Probability Distributions Essential Question: What is a probability distribution for a discrete random variable, and how can it be displayed? Explore Using Simulation to Obtain an Empirical Probability

More information

11.5: Normal Distributions

11.5: Normal Distributions 11.5: Normal Distributions 11.5.1 Up to now, we ve dealt with discrete random variables, variables that take on only a finite (or countably infinite we didn t do these) number of values. A continuous random

More information

A Derivation of the Normal Distribution. Robert S. Wilson PhD.

A Derivation of the Normal Distribution. Robert S. Wilson PhD. A Derivation of the Normal Distribution Robert S. Wilson PhD. Data are said to be normally distributed if their frequency histogram is apporximated by a bell shaped curve. In practice, one can tell by

More information

MAS187/AEF258. University of Newcastle upon Tyne

MAS187/AEF258. University of Newcastle upon Tyne MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................

More information

5.7 Probability Distributions and Variance

5.7 Probability Distributions and Variance 160 CHAPTER 5. PROBABILITY 5.7 Probability Distributions and Variance 5.7.1 Distributions of random variables We have given meaning to the phrase expected value. For example, if we flip a coin 100 times,

More information

Section 0: Introduction and Review of Basic Concepts

Section 0: Introduction and Review of Basic Concepts Section 0: Introduction and Review of Basic Concepts Carlos M. Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1 Getting Started Syllabus

More information

3.1 Measures of Central Tendency

3.1 Measures of Central Tendency 3.1 Measures of Central Tendency n Summation Notation x i or x Sum observation on the variable that appears to the right of the summation symbol. Example 1 Suppose the variable x i is used to represent

More information

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x) Section 6-2 I. Continuous Probability Distributions A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x) to represent a probability density

More information

Business Statistics 41000: Probability 3

Business Statistics 41000: Probability 3 Business Statistics 41000: Probability 3 Drew D. Creal University of Chicago, Booth School of Business February 7 and 8, 2014 1 Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office: 404

More information

BUSINESS MATHEMATICS & QUANTITATIVE METHODS

BUSINESS MATHEMATICS & QUANTITATIVE METHODS BUSINESS MATHEMATICS & QUANTITATIVE METHODS FORMATION 1 EXAMINATION - AUGUST 2009 NOTES: You are required to answer 5 questions. (If you provide answers to all questions, you must draw a clearly distinguishable

More information