Statistics, Measures of Central Tendency I

Similar documents
ECON 214 Elements of Statistics for Economists 2016/2017

Counting Basics. Venn diagrams

2011 Pearson Education, Inc

Chapter 6. The Normal Probability Distributions

Business Statistics 41000: Probability 4

Lecture 9. Probability Distributions. Outline. Outline

Lecture 9. Probability Distributions

ECON 214 Elements of Statistics for Economists

MAKING SENSE OF DATA Essentials series

Section Random Variables and Histograms

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

Homework: Due Wed, Feb 20 th. Chapter 8, # 60a + 62a (count together as 1), 74, 82

Section Distributions of Random Variables

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

Section Introduction to Normal Distributions

Statistics for Business and Economics

Section Distributions of Random Variables

CH 5 Normal Probability Distributions Properties of the Normal Distribution

Statistical Methods in Practice STAT/MATH 3379

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

4 Random Variables and Distributions

The normal distribution is a theoretical model derived mathematically and not empirically.

Basic Procedure for Histograms

Math 227 Elementary Statistics. Bluman 5 th edition

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

MATH 118 Class Notes For Chapter 5 By: Maan Omran

Example - Let X be the number of boys in a 4 child family. Find the probability distribution table:

The graph of a normal curve is symmetric with respect to the line x = µ, and has points of

MATH 104 CHAPTER 5 page 1 NORMAL DISTRIBUTION

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Theoretical Foundations

Consider the following examples: ex: let X = tossing a coin three times and counting the number of heads

The Normal Probability Distribution

Data Analysis and Statistical Methods Statistics 651

Chapter 6 Continuous Probability Distributions. Learning objectives

Examples: Random Variables. Discrete and Continuous Random Variables. Probability Distributions

Section 7.5 The Normal Distribution. Section 7.6 Application of the Normal Distribution

Chapter 8 Homework Solutions Compiled by Joe Kahlig. speed(x) freq 25 x < x < x < x < x < x < 55 5

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Statistics 6 th Edition

11.5: Normal Distributions

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Chapter 3 - Lecture 5 The Binomial Probability Distribution

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Discrete Random Variables

MATH 264 Problem Homework I

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Statistics, Their Distributions, and the Central Limit Theorem

4.3 Normal distribution

AMS7: WEEK 4. CLASS 3

Chapter 5. Continuous Random Variables and Probability Distributions. 5.1 Continuous Random Variables

The topics in this section are related and necessary topics for both course objectives.

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Part V - Chance Variability

CHAPTER 8 PROBABILITY DISTRIBUTIONS AND STATISTICS

The Normal Distribution

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 5. Sampling Distributions

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Consumer Guide Dealership Word of Mouth Internet

Chapter 5 Discrete Probability Distributions. Random Variables Discrete Probability Distributions Expected Value and Variance

Chapter 4 Continuous Random Variables and Probability Distributions

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Class 12. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Business Statistics 41000: Probability 3

Chapter 3: Probability Distributions and Statistics

Unit 5: Sampling Distributions of Statistics

Unit 5: Sampling Distributions of Statistics

Elementary Statistics Lecture 5

In a binomial experiment of n trials, where p = probability of success and q = probability of failure. mean variance standard deviation

Probability. An intro for calculus students P= Figure 1: A normal integral

No, because np = 100(0.02) = 2. The value of np must be greater than or equal to 5 to use the normal approximation.

STAT Chapter 5: Continuous Distributions. Probability distributions are used a bit differently for continuous r.v. s than for discrete r.v. s.

Chapter 4. The Normal Distribution

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Statistics for Managers Using Microsoft Excel 7 th Edition

Introduction to Statistics I

Central Limit Theorem, Joint Distributions Spring 2018

Lecture 12. Some Useful Continuous Distributions. The most important continuous probability distribution in entire field of statistics.

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

Probability Distributions

Statistics for Business and Economics: Random Variables:Continuous

MA131 Lecture 8.2. The normal distribution curve can be considered as a probability distribution curve for normally distributed variables.

Lecture 6: Chapter 6

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

5.7 Probability Distributions and Variance

Lecture 5 - Continuous Distributions

Statistics 431 Spring 2007 P. Shaman. Preliminaries

MANAGEMENT PRINCIPLES AND STATISTICS (252 BE)

A.REPRESENTATION OF DATA

Homework Assignments

Elementary Statistics Blue Book. The Normal Curve

IEOR 3106: Introduction to OR: Stochastic Models. Fall 2013, Professor Whitt. Class Lecture Notes: Tuesday, September 10.

Transcription:

Statistics, Measures of Central Tendency I We are considering a random variable X with a probability distribution which has some parameters. We want to get an idea what these parameters are. We perfom an experiment n times and record the outcome. This means we have X 1,..., X n i.i.d. random variables, with probability distribution same as X. We want to use the outcome to infer what the parameters are. Mean The outcomes are x 1,..., x n. The Sample Mean is x := x 1+ +x n n. Also sometimes called the average. The expected value of X, EX, is also called the mean of X. Often denoted by µ. Sometimes called population mean. Median The number so that half the values are below, half above. If the sample is of even size, you take the average of the middle terms. Mode The number that occurs most frequently. There could be several modes, or no mode. Dan Barbasch Math 1105 Chapter 9 Week of September 25 1 / 24

Statistics, Measures of Central Tendency II Example You have a coin for which you know that P(H) = p and P(T ) = 1 p. You would like to estimate p. You toss it n times. You count the number of heads. The sample mean should be an estimate of p. EX = p, and E(X 1 + + X n ) = np. So ( ) X1 + + X n E = p. n Dan Barbasch Math 1105 Chapter 9 Week of September 25 2 / 24

Descriptive Statistics I Frequency Distribution Divide into a number of equal disjoint intervals. For each interval count the number of elements in the sample occuring. Histogram see the next slide Grouped Data Mean Essentially calculate the mean of the frequency distribution. Intervals are used, rather than single values. It is assumed that all these values are located at the midpoint of the interval. The letter x M is used to represent the midpoints and f represents the frequencies: fi x M,i Frequency Polygon Connect the middles of the tops of each interval. n Dan Barbasch Math 1105 Chapter 9 Week of September 25 3 / 24

Histogram A histogram is a graphical representation of the distribution of numerical data. It is a kind of bar graph. To construct a histogram, the first step is to bin the range of values, that is, divide the entire range of values into a series of intervals, and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size. Mean: Bin Count 3.5 2.51 9 2.5 1.51 32 1.5 0.51 109 0.5 0.49 180 0.5 1.49 132 1.5 2.49 34 2.5 3.49 4 ( 3) 9+( 2) 32+( 1) 109+ (0)180+1 132+2 34+3 4 500 Dan Barbasch Math 1105 Chapter 9 Week of September 25 4 / 24

Example The table on the next page gives the number of days in June and July of recent years in which the temperature reached 90 degrees or higher in New Yorks Central Park. Source: The New York Times and Accuweather.com. a. Prepare a frequency distribution with a column for intervals and frequencies. Use seven intervals, starting with [0 4]. b. Sketch a histogram and a frequency polygon, using the intervals in part a. c. Find the mean for the original data. d. Find the mean using the grouped data from part a. e. Explain why your answers to parts c and d are different. f. Find the median and the mode for the original data. Dan Barbasch Math 1105 Chapter 9 Week of September 25 5 / 24

Temperature Data om- om- ful. mn ng. ) of fol- New York Times and Accuweather.com. Year Days Year Days Year Days 1972 11 1985 4 1998 5 1973 8 1986 8 1999 24 1974 11 1987 14 2000 3 1975 3 1988 21 2001 4 1976 8 1989 10 2002 13 1977 11 1990 6 2003 11 1978 5 1991 21 2004 1 1979 7 1992 4 2005 12 1980 12 1993 25 2006 5 1981 12 1994 16 2007 4 1982 11 1995 14 2008 10 1983 20 1996 0 2009 0 1984 7 1997 10 2010 20 Dan Barbasch Math 1105 Chapter 9 Week of September 25 6 / 24

Measures of Variation Summary of Section 9.2 Range The difference Largest Data - Smallest Data in a Sample. Deviation from the Mean 1 Variance σ 2 = s 2 = x 2 i nx 2 n 1 = (xi x) 2 n 1 2 Standard Deviation σ = s = s 2 These are random variables called Sample Variance and Sample Standard Deviation. For a random variable X, µ = E(X ) is called the mean. The variance Var(X ) is σ 2 = Var(X ) = E((X µ) 2 ). Main Property/ Explanation for dividing by n 1: If X i are (Xi X ) 2 i.i.d with distribution X, then if you set S 2 = n 1 expected value is E(S 2 ) = σ 2. This is not true for the standard deviation, E(S) σ. fi xm,i 2 Grouped Data s = nx 2. n 1, its Dan Barbasch Math 1105 Chapter 9 Week of September 25 7 / 24

Examples I Example (Range) Data 15, 3, 4, 7, 18. The smallest is 3, the largest 18 so Range = 18 ( 3) = 21. Always a nonnegative number. Example (Deviation from the Mean) In the previous example, x = 15 3+4+7+18 5 = 8.2. So 15 8.2 = 6.8, 3 8.2 = 11.2, 4 8.2 = 3.8, 7 8.2 = 1.2, 18 8.2 = 9.8. Example (Variance and Standard Deviation) s 2 = 6.82 +11.2 2 +3.8 2 +1.2 2 +9.8 2 4 = 152 +3 2 +4 2 +7 2 +18 2 5 8.2 2 4 s = s 2. Dan Barbasch Math 1105 Chapter 9 Week of September 25 8 / 24

Examples II Example (Binomial Distribution) P(X = 1) = p, P(X = 0) = 1 p. Then µ = E(X ) = p, and σ 2 = E((X p) 2 ) = (1 p) 2 p + (0 p) 2 (1 p) = p(1 p). This is the same as E(X 2 p 2 ) = (1 p 2 )p + ( p 2 )(1 p) = (1 p)p. Remark: Note that the formula for variance and standard deviation only holds for n > 2. Otherwise, for n = 1, you would be dividing by 0. For one random variable, the variance is defined as Var(X ) = E((X E(X )) 2 ). For X 1, X 2,, two independent random variables, Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ). Suppose X is a random variable. We can write a table X a 1 a 2... a n P(X ) p 1 p 2... p n Dan Barbasch Math 1105 Chapter 9 Week of September 25 9 / 24

Examples III For the expected value µ = E(X ), you multiply the two terms in each column, and add a i p n = a 1 p 1 + + a n p n. i In a spreadsheet program, the data would be in columns and you would add over the products from the rows. You use a command like sumproduct to perform the operation. If you have some other variable like (X µ) 2, you would use the values (a i µ) 2 and the same p i. Dan Barbasch Math 1105 Chapter 9 Week of September 25 10 / 24

Examples IV Example X 2 3 1 1 X 2 4 9 1 1 (X µ) 2 (2 1/4) 2 (3 1/4) 2 ( 1 1/4) 2 (1 1/4) 2 P(X ) 1/2 1/8 1/4 1/8 Computing the expected values is below. µ = E(X ) = (2) (1/2) + (3) (1/8) + ( 1) (1/4) + (1) (1/8) = 1/4. Var(X ) =(2 1/4) 2 (1/2) + (3 1/4) 2 (1/8) + ( 1 1/4) 2 (1/4)+ +(1 1/4) 2 (1/8) = 47/16. Dan Barbasch Math 1105 Chapter 9 Week of September 25 11 / 24

Grouped Data Example (Grouped Data) Interval Frequency Midpoint x M 30-39 1 34.5 40-49 6 44.5 50-59 13 54.5 60-69 22 64.5 70-79 17 74.5 80-89 13 84.5 90-99 8 94.5 Find the standard deviation of these grouped data. In this case you must sum the xm 2 multiplied by the frequencies, and subtract 80 x where x is for the full sample, (which is not in the table, you must get it from the full data). Dan Barbasch Math 1105 Chapter 9 Week of September 25 12 / 24

Chebyshev s and Markov s Inequality I P(X a) E(X ) a P ( X µ kσ) 1 k 2 Markov. Chebyshev. In words, the probability that X is more than k standard deviations away from the mean is less than 1/k 2. Dan Barbasch Math 1105 Chapter 9 Week of September 25 13 / 24

Chebyshev s and Markov s Inequality II Example (from the practice prelim) 8. (14 points) Assume that the height in inches of American women follows a normal distribution with mean Mu = 64 (5 4 ) and standard deviation σ = 3. (a) (3 points) How many standard deviations above or below the mean is a height of 72 (6 0 )? (b) (4 points) What fraction of women are taller than 6 feet? (c) (4 points) In a room with 30 women, what is the probability that at least one of them is taller than 6 feet? (d) (3 points) What assumptions did you make when answering part (c)? Are there circumstances under which those assumptions would not be justified? Dan Barbasch Math 1105 Chapter 9 Week of September 25 14 / 24

Chebyshev s and Markov s Inequality III Answer. Say we don t know what distribution it is. We can still use Markov s and Chebyshev s inequality., so closer to 3. Use 73 to get 3. (b) Markov s inequality says P(X 73) 64 73. To use Chebyshev s inequality we must write X 64 3σ. Then (a) 72 64 3 = 8 3 P( X 64 3σ) 1 9. In other words, k = 3. This includes not just X 73, but also X 55. Still we can say the probability is less than 1/9, because X 64 9 is larger than X 64 9. (c) 1 P( none are taller than 6 ) = 1 (1 P( one is not taller than 6 )) 30. Dan Barbasch Math 1105 Chapter 9 Week of September 25 15 / 24

Continuous Random Variables I In many applications it useful to assume that the random varuiable X may take any real value. The probability distribution for the case of finitely many values does not work. We assume that the sampe space is S = R all real number. The typical event (subset of S) is restricted to sets of the form A = {x x a} and complements and intersections of such sets. For us they will be at most sets of the form A = {a x b}. The probability distribution is given as numbers P(X a); in other words a function which takes nonnegative values only, and we allow a = in which case the value is 0, and, in which case the number is 1. For a continuous reandom variable, P(X = a) = 0 always, but the situation is not trivial. Dan Barbasch Math 1105 Chapter 9 Week of September 25 16 / 24

Continuous Random Variables II Example (the uniform distribution of the interval (0, 1)) Let 0 if a 0 f (x) = 1 if a < 0 < 1 0 if 1 a Define P(X a) = area between the x axis and f (x), and before the vertical line x = a. See the picture in class, or in the text for the normal distribution. So 0 if a 0 P(X a) = a if 0 < a < 1 1 if 1 a Dan Barbasch Math 1105 Chapter 9 Week of September 25 17 / 24

Continuous Random Variables III Exercise Do the same for 0 if x 0 f (x) = 2x if 0 x 1 0 if 1 < x. You need the formula for the area of the triangle, Area = (base) (height)/2. Dan Barbasch Math 1105 Chapter 9 Week of September 25 18 / 24

Normal Distribution I Definition Data are said to be normally distributed if the rate at which the frequencies fall off is proportional to the distance of the score from the mean, and to the frequencies themselves. This definition requires Calculus. We don t assume or do Calculus in this course. We will however learn how to work with this distribution. It is very useful in that many phenomena can be modeled by this. We will see how the binomial distribution is related to the normal distribution later in the chapter. Suppose you have a random variable X, and you would like to know about its mean µ. So you perform many n independent trials, and draw a histogram. The larger the n, the closer the outcome will look like the curve f (x) = 1 2πσ e (x µ)2 2σ 2. The pictures in the text show what it looks like. The resulting probability is called N(µ, σ 2 ), normal with mean µ and Dan Barbasch Math 1105 Chapter 9 Week of September 25 19 / 24

Normal Distribution II variance σ 2. There is a precise statement called the Central Limit Theorem which says that for large n, n(s n µ) looks like a normal distribution N(0, σ 2 ). it is used in practice to model large populations and errors. There are many examples that can be approximated by normal distributions. Heights of people, and scores on tests are examples. This is not a finite distribution. For a random variable that is normally distributed, we write N(µ, σ 2 ), P(X a) = the area under the normal curve from to a. This is tabulated for µ = 0 and σ = 1. The rest is computed by simple formulas involving arithmetic. Dan Barbasch Math 1105 Chapter 9 Week of September 25 20 / 24

Height Example I Example (from the practice prelim) 8. (14 points) Assume that the height in inches of American women follows a normal distribution with mean mu = 64 (5 4 ) and standard deviation σ = 3. (a) (3 points) How many standard deviations above or below the mean is a height of 73 (6 1 )? (b) (4 points) What fraction of women are taller than 73 inches? (c) (4 points) In a room with 30 women, what is the probability that at least one of them is taller than 73? (d) (3 points) What assumptions did you make when answering part (c)? Are there circumstances under which those assumptions would not be justified? Dan Barbasch Math 1105 Chapter 9 Week of September 25 21 / 24

Height Example II Answer. (a) same as before 3 standard deviations away. (b) P(X 73) = P(X 64 73 64 = 9 = 3σ) = P( X µ σ =1 P( X µ 3) = 1 0.999 = 0.001. σ 3) = This is 1/1000. The random variable X has probability distribution N(64, 17). The probability P(X 73) comes from this normal distribution. To actually look it up in the tables, you rewrite it in terms of Z = X 64 3 which has probability distribution N(0, 1). This is the one in the tables. (c) P(at least 1/30 73) =1 P(30/30 73) = 1 P(X 73) 30 = =1 (0.998) 30. Dan Barbasch Math 1105 Chapter 9 Week of September 25 22 / 24

z value The principle is X normal N(µ, σ 2 ) Z = X µ σ normal N(0, 1). So P(X a) = P(Z a µ σ ). z = a µ σ is called the z value. This is what you look up in the tables. Dan Barbasch Math 1105 Chapter 9 Week of September 25 23 / 24

Example with Grades Example A professor (not this one!) of a course wants to give grades so that A top 8% F bottom 8% B next 20% below A D next 20% above the F C the rest The mean is µ = 67 and the standard deviation is σ = 17. Find the cutoffs. Answer. P( A) = 0.92 z = 1.41 a = µ + zσ = 67 + 17 1.41 = 91 P( B) = 0.72 z = 0.58 a = µ + zσ = 67 + 17 0.58 = 77 P( C) = 0.28 z =.59 a = µ + zσ = 67 + 17 (.59) = 57 P( D) = 0.08 z = 1.39 a = µ + zσ = 67 + 17 ( 1.39) = 43 from the tables. In Excel or alike you can write norminv(0.92, 67, 17) = 91. Dan Barbasch Math 1105 Chapter 9 Week of September 25 24 / 24