Probability distributions

Similar documents
Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

x is a random variable which is a numerical description of the outcome of an experiment.

The normal distribution is a theoretical model derived mathematically and not empirically.

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

Lean Six Sigma: Training/Certification Books and Resources

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

The following content is provided under a Creative Commons license. Your support

Chapter 7 1. Random Variables

PSYCHOLOGICAL STATISTICS

Part V - Chance Variability

Statistical Methods in Practice STAT/MATH 3379

Lecture 9. Probability Distributions. Outline. Outline

DATA HANDLING Five-Number Summary

Lecture 9. Probability Distributions

The binomial distribution

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

22.2 Shape, Center, and Spread

The topics in this section are related and necessary topics for both course objectives.

Example. Chapter 8 Probability Distributions and Statistics Section 8.1 Distributions of Random Variables

Stats CH 6 Intro Activity 1

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Binomial and Normal Distributions

A continuous random variable is one that can theoretically take on any value on some line interval. We use f ( x)

Math 227 Elementary Statistics. Bluman 5 th edition

Probability. An intro for calculus students P= Figure 1: A normal integral

Expected Value of a Random Variable

Probability Distribution Unit Review

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

DATA SUMMARIZATION AND VISUALIZATION

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Statistics (This summary is for chapters 17, 28, 29 and section G of chapter 19)

Chapter 7. Sampling Distributions

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Statistics (This summary is for chapters 18, 29 and section H of chapter 19)

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Descriptive Statistics

ECON 214 Elements of Statistics for Economists 2016/2017

Discrete Probability Distributions

Lecture 2 Describing Data

GETTING STARTED. To OPEN MINITAB: Click Start>Programs>Minitab14>Minitab14 or Click Minitab 14 on your Desktop

5.2 Random Variables, Probability Histograms and Probability Distributions

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

2011 Pearson Education, Inc

Description of Data I

Probability Models.S2 Discrete Random Variables

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

2017 Fall QMS102 Tip Sheet 2

Chapter 4 and 5 Note Guide: Probability Distributions

A.REPRESENTATION OF DATA

Monte Carlo Simulation (General Simulation Models)

Chapter 4. The Normal Distribution

The Binomial and Geometric Distributions. Chapter 8

Data screening, transformations: MRC05

23.1 Probability Distributions

Statistics for Business and Economics

STAT 113 Variability

Edexcel past paper questions

ECON 214 Elements of Statistics for Economists

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

6683/01 Edexcel GCE Statistics S1 Gold Level G2

Unit 2 Measures of Variation

Chapter 7. Random Variables

MATH 118 Class Notes For Chapter 5 By: Maan Omran

Chapter 2: Descriptive Statistics. Mean (Arithmetic Mean): Found by adding the data values and dividing the total by the number of data.

Lesson 97 - Binomial Distributions IBHL2 - SANTOWSKI

Opening Exercise: Lesson 91 - Binomial Distributions IBHL2 - SANTOWSKI

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

MAS187/AEF258. University of Newcastle upon Tyne

Section M Discrete Probability Distribution

Binomial population distribution X ~ B(

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Class 13. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Statistical Intervals (One sample) (Chs )

Numerical Descriptive Measures. Measures of Center: Mean and Median

4: Probability. What is probability? Random variables (RVs)

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1

GOALS. Discrete Probability Distributions. A Distribution. What is a Probability Distribution? Probability for Dice Toss. A Probability Distribution

Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes

Discrete Probability Distributions Chapter 6 Dr. Richard Jerz

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Chapter 5: Probability models

Probability and distributions

STAB22 section 1.3 and Chapter 1 exercises

Section Introduction to Normal Distributions

Normal Probability Distributions

Theoretical Foundations

Part 1 In which we meet the law of averages. The Law of Averages. The Expected Value & The Standard Error. Where Are We Going?

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Panchakshari s Professional Academy CS Foundation: Statistic Practice Sheet

#MEIConf2018. Before the age of the Calculator

CHAPTER 2 Describing Data: Numerical

Chapter 6: Random Variables. Ch. 6-3: Binomial and Geometric Random Variables

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

Basic Principles of Probability and Statistics. Lecture notes for PET 472 Spring 2010 Prepared by: Thomas W. Engler, Ph.D., P.E

5.4 Normal Approximation of the Binomial Distribution

Session Window. Variable Name Row. Worksheet Window. Double click on MINITAB icon. You will see a split screen: Getting Started with MINITAB

30 Wyner Statistics Fall 2013

Transcription:

Probability distributions Introduction What is a probability? If I perform n eperiments and a particular event occurs on r occasions, the relative frequency of this event is simply r n. his is an eperimental observation that gives us an estimate of the probability there will be some random variation above and below the actual probability. If I do a large number of eperiments, the relative frequency gets closer to the probability. We define the probability as meaning the limit of the relative frequency as the number of eperiments tends to infinity. Usually we can calculate the probability from theoretical considerations (number of beads in a bag, number of faces of a dice, tree diagram, any other kind of statistical model). What is a random variable? A random variable (r.v.) is the value that might be obtained from some kind of eperiment or measurement process in which there is some random uncertainty. A discrete r.v. takes a finite number of possible values with distinct steps between them. A continuous r.v. takes an infinite number of values which vary smoothly. We talk not of the probability of getting a particular value but of the probability that a value lies between certain limits. Random variables are given names which start with a capital letter. An r.v. is a numerical value (eg. a head is not a value but the number of heads in 0 throws is an r.v.) Eg the score when throwing a dice (discrete); the air temperature at a random time and date (continuous); the age of a cat chosen at random.

he binomial distribution (Statistics book p98) When we have a tree diagram, we can calculate the probability of each combination of outcomes (e.g. a head then a tail when throwing a coin twice). Special case: If each fork has the same two choices he probabilities are the same at each fork We can then use a formula to calculate the probabilities, without needing the tree diagram. e.g. probability of getting a head and tails in 3 throws of a coin: 8 P ( ) P ( ) here are three equivalent combinations of events (arrangements,, ) so the probability 3 is P(one head) 3 8 8 In general, for any eperiment in which the desired outcome can be obtained by a number of equivalent routes through the tree diagram that all have the same probability, we can define: Probability = (probability for one route) (number of arrangements) he binomial distribution describes the probability of getting r successes out of n trials in the kind of eperiment with just two possible results ( success or failure ) and in which successes are independent and have a constant probability p in all trials. he formula for getting r successes out of n trials is n r n r P X r C p q r p is the probability of each trial being a success, q p is the probability of each trial being a failure r n r pq is the probability of one path with r successes through a tree diagram

n choose r n C r is the number of possible paths through the tree. You can use the button on your calculator to find n C. r Eample A random variable X could be the number of heads in n throws of a coin. he probability of getting 3 heads in 7 throws of a coin is 7 3 7 C 3 P X 3 0.5 0.5 0.734 Geogebra calculates this nicely: 3 Autograph draws this as a bar graph: 0.3 0. 0. p 3 4 5 6 7 r - but won t tabulate the probabilities like Geogebra does. Eample. he probability of a die landing on a 4 is 6 so the probability of it not landing on 4 is 5 q p. 6 6 3

he probability of getting 5 fours in 30 throws will be P X 5 305 30 5 5 C5 0.9 6 6 (defining X as the number of sies in 30 throws). You can see this in Geogebra (menu ools/special Object ools/probability Calculator). 4

What is Standard Deviation? (Statistics book p45) By definition, in any list of numbers the mean of the difference between each value and the mean will be zero (some positive, some negative). o get a sensible measure of spread we square these differences (so all positive) and then average them to find the variance his gives the basic definition variance where n is the number of data values, n is each value in the data and (pronounced mu ) is their mean. he formula can also be re-arranged as n (easier to use), often described as: mean of ( ) (mean of ) the variance (called ) is the mean value of the standard deviation is the square root of the variance (so its units are the same as the data, not data ). Standard deviation is a measure of the spread of the data each side of the mean, just like range and inter-quartile range. In some ways it is better because it is (like the mean) based on every data value, not just the two at the 5% and 75% positions used for the IQR. Simple eample 5 4 3 3 Consider a data set of just two values, = 5, 5 he mean 3 n 4 (variance). 8 n We now square root this to get the standard deviation 4 5

Volts Everyday eample Electricity from power stations is alternating current (AC). he voltage is a sine wave that repeats 50 times per second. he voltage used in houses is 40 volts rms ( root mean square ) meaning its standard deviation is 40 volts. he mean is 0 volts. If we measured the mains voltage every 000 second and plotted the values it would look like this: 400 00 0 40 0-00 -400 0 0.0 0.0 0.03 0.04 ime (seconds) If the distribution of values follows a Normal distribution (below), we find for instance that a measurement of an item picked at random has a 95% chance of being within.96 standard deviations from the mean. You can find standard deviation in many ways: Calculate it on paper or using a calculator In Ecel, use a formula and specify a cell range e.g. =SDEV.P(A3:A4) In Autograph: (a) For -D data, having entered the data set, right-click on the data set name in the bottom bar and pick the Show Statistics menu; in the net pop-up window ransfer to results bo : Statistics for Raw Data Number in sample, n: 40 Mean, : 00.83 Standard Deviation, : 0.579 Range, : 73 Lower Quartile: 086.5 Median: 096 Upper Quartile: 4 Semi I.Q. Range: 8.875 (b) For an -y data set, with the data input window open click on Show statistics. Number of points, n: 40 Mean, : 0 Mean, y: 3548 Standard Deviation, : 0.58 Standard Deviation, y: 4.0 Correlation Coeff, r: -0.8706 Spearman's Ranking Coeff: -0.94 y-on- Regression Line: y=-.06+4666 -on-y Regression Line: =-0.746y+3748 6

f() F() = P(X<=) his gives you mean and standard deviation (but for D data, not the IQR) for both the and y-data which you can paste into Word. he Normal distribution (Statistics book p30) he Normal distribution is a curve that often models the histogram shape for continuous variables. It is defined such that the area under the curve between any two -values, gives you the probability that the random variable will lie in this range. he total area under the curve =. Normal distributions occur naturally in situations where the data value is the sum of mean of many parts, each having its own random variation. It is a symmetrical distribution with mean = median It models a continuous variable that can take values from infinity to +infinity (but in practice, beyond 3 standard deviations from the mean the probability is so small that it can be ignored). 0.4 0.35 0.3 Normal distribution, = 0, = Cumulative normal distribution, = 0, = 0.8 0.5 0. 0.5 0. 0.05 0-4 - 0 4 0.6 0.4 0. 0-4 - 0 4 You can look up probabilities in Geogebra or via a formula in Ecel. 700 is one standard deviation above the mean. 84.3% of the data will be below this in a Normally-distributed data set (equivalent to Ecel s formula =NORM.DIS(700, 600, 00, RUE) ). 7

Eample: Normal distribution: 0. 0.05 f() Airline passengers have luggage, mean 0 kg, standard deviation 5 kg. What proportion have luggage over 5 kg in weight? 0 0 30 40 area P 5 84.3% are below s.d. above the mean so 6.87% are beyond this value For data with a normal distribution, the: first quartile is 0.675 standard deviations below the mean third quartile is 0.675 standard deviations above the mean ence the IQR =.35 standard deviations, standard deviation = 0.74 IQR 95% of the data is within standard deviations (above and below) the mean 99.7% of the data is within 3 standard deviations (above and below) the mean Not all symmetrical distributions are Normal! A distribution could be more like a rectangle: 0.3 f() 0. 0. 3 3 (IQR =.73 standard deviation, standard deviation = 0.58 IQR, all the data within.73 standard deviations of the mean) Or more like a triangle: (IQR =.43 standard deviation, standard deviation = 0.7 IQR, all the data within.45 standard deviations of the mean) Both these lack the long tails of the Normal distribution so their s.d. is smaller, as a fraction of IQR. 8