Basic Procedure for Histograms

Similar documents
Some Characteristics of Data

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Simple Descriptive Statistics

Chapter 3. Numerical Descriptive Measures. Copyright 2016 Pearson Education, Ltd. Chapter 3, Slide 1

3.1 Measures of Central Tendency

Numerical Descriptions of Data

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

2 Exploring Univariate Data

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

Measures of Dispersion (Range, standard deviation, standard error) Introduction

DATA SUMMARIZATION AND VISUALIZATION

Fundamentals of Statistics

2 DESCRIPTIVE STATISTICS

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Monte Carlo Simulation (Random Number Generation)

Chapter 3 Descriptive Statistics: Numerical Measures Part A

Measures of Central tendency

Frequency Distribution and Summary Statistics

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Dot Plot: A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

Counting Basics. Venn diagrams

1 Exercise One. 1.1 Calculate the mean ROI. Note that the data is not grouped! Below you find the raw data in tabular form:

Lecture 2 Describing Data

ECON 214 Elements of Statistics for Economists 2016/2017

MAS1403. Quantitative Methods for Business Management. Semester 1, Module leader: Dr. David Walshaw

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

The normal distribution is a theoretical model derived mathematically and not empirically.

Description of Data I

NCSS Statistical Software. Reference Intervals

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

2011 Pearson Education, Inc

CHAPTER 6. ' From the table the z value corresponding to this value Z = 1.96 or Z = 1.96 (d) P(Z >?) =

1 Describing Distributions with numbers

ECON 214 Elements of Statistics for Economists

Statistics, Measures of Central Tendency I

Measures of Center. Mean. 1. Mean 2. Median 3. Mode 4. Midrange (rarely used) Measure of Center. Notation. Mean

AP Statistics Chapter 6 - Random Variables

The topics in this section are related and necessary topics for both course objectives.

Descriptive Statistics

Copyright 2005 Pearson Education, Inc. Slide 6-1

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

IOP 201-Q (Industrial Psychological Research) Tutorial 5

STAT 157 HW1 Solutions

appstats5.notebook September 07, 2016 Chapter 5

PSYCHOLOGICAL STATISTICS

Statistics 114 September 29, 2012

MidTerm 1) Find the following (round off to one decimal place):

A.REPRESENTATION OF DATA

Descriptive Statistics (Devore Chapter One)

Descriptive Statistics

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 3: April 25, Abstract

Data Distributions and Normality

Random Variables and Probability Distributions

CSC Advanced Scientific Programming, Spring Descriptive Statistics

A LEVEL MATHEMATICS ANSWERS AND MARKSCHEMES SUMMARY STATISTICS AND DIAGRAMS. 1. a) 45 B1 [1] b) 7 th value 37 M1 A1 [2]

PROBABILITY AND STATISTICS CHAPTER 4 NOTES DISCRETE PROBABILITY DISTRIBUTIONS

Elementary Statistics Blue Book. The Normal Curve

Chapter 4 Probability Distributions

MVE051/MSG Lecture 7

A probability distribution shows the possible outcomes of an experiment and the probability of each of these outcomes.

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

MAKING SENSE OF DATA Essentials series

Lecture Week 4 Inspecting Data: Distributions

4. DESCRIPTIVE STATISTICS

Categorical. A general name for non-numerical data; the data is separated into categories of some kind.

Empirical Rule (P148)

Introduction to Statistical Data Analysis II

Numerical summary of data

Exploring Data and Graphics

MATHEMATICS APPLIED TO BIOLOGICAL SCIENCES MVE PA 07. LP07 DESCRIPTIVE STATISTICS - Calculating of statistical indicators (1)

DESCRIBING DATA: MESURES OF LOCATION

Unit 2 Statistics of One Variable

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

Consumer Guide Dealership Word of Mouth Internet

Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 6 Normal Probability Distribution QMIS 120. Dr.

Lecture 9. Probability Distributions

Lecture Data Science

34.S-[F] SU-02 June All Syllabus Science Faculty B.Sc. I Yr. Stat. [Opt.] [Sem.I & II] - 1 -

Averages and Variability. Aplia (week 3 Measures of Central Tendency) Measures of central tendency (averages)

Statistics 431 Spring 2007 P. Shaman. Preliminaries

MEASURES OF DISPERSION, RELATIVE STANDING AND SHAPE. Dr. Bijaya Bhusan Nanda,

Descriptive Analysis

CHAPTER 2 Describing Data: Numerical

Overview. Definitions. Definitions. Graphs. Chapter 4 Probability Distributions. probability distributions

Handout 4 numerical descriptive measures part 2. Example 1. Variance and Standard Deviation for Grouped Data. mf N 535 = = 25

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

STAT 113 Variability

Lecture 9. Probability Distributions. Outline. Outline

The Normal Probability Distribution

Data Analysis. BCF106 Fundamentals of Cost Analysis

Chapter ! Bell Shaped

Statistics vs. statistics

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

32.S [F] SU 02 June All Syllabus Science Faculty B.A. I Yr. Stat. [Opt.] [Sem.I & II] 1

Continuous Probability Distributions

2018 CFA Exam Prep. IFT High-Yield Notes. Quantitative Methods (Sample) Level I. Table of Contents

Transcription:

Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that divides evenly into the range of values and is still within the 6 12 class guideline) 3. Compute the class interval = range / number of classes round the precise range to the nearest convenient number (preferably an integer, adjusting as necessary) 4. Select a starting value for the classes that is less than or equal to the lowest value in the observations

Basic Procedure for Histograms 5. Adjust the range, width, and starting point if necessary 6. Compute the midpoint of each class (this is particularly useful if plotting a line plot histogram rather than the bar chart variety) 7. The actual bounds will depend on the precision and accuracy of the data (e.g. class limits of 1-2, 3-4, etc. might have actual limits of 0.5-2.5, 2.5-4.5 etc. because we have rounded) 8. Plot the data At this point, you re not likely to be required to create a histogram by hand very often (as it easily done using software), but it s good to know the theory

Simple Descriptive Statistics Descriptive statistics provide an organization and summary of a dataset A small number of summary measures replaces the entirety of a dataset You re likely already familiar with some simple descriptive summary measures: 1. Ratios 2. Proportions 3. Percentages 4. Rates of Change 5. (Location Quotients)

Simple Descriptive Summary Measures 1. Ratios - # of observations in A # of observations in B e.g. A - 6 overcast, B 24 mostly cloudy days 2. Proportions Relates one part or category of data to the entire set of observations, e.g. a box of marbles that contains 4 yellow, 6 red, 5 blue, and 2 green gives a yellow proportion of 4/17 or a color ={yellow, red, blue, green} a a count ={4, 6, 5, 2} proportion = i Σ a i

Simple Descriptive Summary Measures 2. Proportions cont. Sum of all proportions = 1 These are useful for comparing two sets of data w/ different sizes and category counts e.g. a different box of marbles gives a yellow proportion of 2/23, and in order for this to be a reasonable comparison we need to know the totals for both samples 3. Percentages Calculated by proportions x 100, e.g. 2/23 = 8.696%, use of these should be restricted to larger sample sizes, perhaps 20+ observations

Simple Descriptive Summary Measures 4. Rates of Change Expressing the change in a variable with respect to its original value, e.g. Z = x(t 2) x(t 1 ) x(t 1 ) = change in x original value of x e.g. if we had 20 marbles and then added 10, the rate of change = (30-20)/20 = 10/20 = 0.5 5. Location Quotients An index of relative concentration in space, a comparison of a region s share of something to the total

Simple Descriptive Summary Measures 5. Location Quotients cont. For example, suppose we have a region of 1000 km 2 which we subdivide into three smaller areas of 200, 300, and 500 km 2 respectively (labeled A, B, and C) The region has an influenza outbreak with 150 cases in the first region, 100 in the second, and 350 in the third (a total of 600 flu cases): Proportion of Area Proportion of Cases Location Quotient A 200/1000=0.2 150/600=0.25 0.25/0.2=1.25 B 300/1000=0.3 100/600=0.17 0.17/0.3=0.57 C 500/1000=0.5 350/600=0.58 0.58/0.5=1.17 Location Quotient = Prop. of Cases / Prop. of Area

Simple Descriptive Statistics These are ways to summarize a number set quickly and accurately The most common way of describing a variable distribution is in terms of two of its properties: Central tendency describes the central value of the distribution, around which the observations cluster Dispersion describes how the observations are distributed First we ll look at measures of central tendency

Measures of Central Tendency - Review 1. Mode This is the most frequently occurring value in the distribution 2. Median This is the value of a variable such that half of the observations are above and half are below this value i.e. this value divides the distribution into two groups of equal size 3. Mean a.k.a. average, the most commonly used measure of central tendency

Measures of Central Tendency - Review 1. Mode This is the most frequently occurring value in the distribution Procedure for finding the mode of a data set: 1) Sort the data, putting the values in ascending order 2) Count the instances of each value (if this is continuous data with a high degree of precision and many decimal places, this may be quite tedious) 3) Find the value that has the most occurrences this is the mode (if more than one value occurs an equal number of times and these exceed all other counts, we have multiple modes) Use the mode for multi-modal or nominal data sets

Measures of Central Tendency - Review 2. Median - ½ of the values are above & ½ below this value Procedure for finding the median of a data set: 1) Sort the data, putting the values in ascending order 2) Find the value with an equal number of values above and below it (if there are an even number of values, you will need to average two values together): Odd number of observations [(n-1)/2]+1 values from the lowest, e.g. n=19 [(19-1)/2]+1 = 10 th value Even number of observations average the (n/2) and [(n/2)+1] values, e.g. n=20 average the 10 th and 11 th Use the median with assymetric distributions, when you suspect outliers are present, or with ordinal data

Measures of Central Tendency - Review 3. Mean a.k.a. average, the most commonly used measure of central tendency Procedure for finding the mean of a data set: 1) Sum all the values in the data set 2) Divide the sum by the number of values in the data set x = i=n Σ x i i=1 n Use the mean when you have interval or ratio data sets with a large sample size, few (or no?) outliers, and a reasonably symmetric unimodal distribution

Measures of Central Tendency - Mean 3. Mean cont. We can also calculate a weighted mean using some weighting factor: x = i=n Σ w i x i i=1 i=n Σ w i i=1 e.g. What is the average income of all people in cities A, B, and C: City Avg. Income Population A $23,000 100,000 B $20,000 50,000 C $25,000 150,000 Weighted mean Here, population is the weighting factor w i and the average income is the variable of interest x i

Measures of Central Tendency - Mean 3. Mean cont. We can also calculate a grouped mean using the mid-points and frequencies of groups: x = i=n Σ m i f i i=1 i=n Σ f i i=1 Grouped mean e.g. Suppose we had grouped some data in a frequency table and wanted to calculate the grouped mean: Group Freq. 1-2 3 1.5 3-4 4 3.5 5-6 6 5.5 7-8 5 7.5 9-10 2 9.5 Midpoint

Measures of Central Tendency - Mean 3. Mean cont. A standard geographic application of the mean is to locate the center (a.k.a. centroid) of a spatial distribution by assigning to each member of the spatial distribution a gridded coordinate and calculating the mean value in each coordinate direction Bivariate mean or mean center For a set of (x,y) coordinates, the mean center (x,y) is computed using: x = i=n Σ x i i=1 n y = i=n Σ y i i=1 n

Measures of Central Tendency - Mean 3. Mean cont. We can also calculate a weighted mean center in much the same way, but by using weights: For a set of (x,y) coordinates, the weighted mean center (x,y) is computed using: x = i=n Σ w i x i i=1 Σ i=n w i i=1 y = i=n Σ w i y i i=1 Σ i=n w i i=1 e.g. suppose we had the centroids and areas of 3 polygons Here we weight by area, but other weightings possible

Measures of Dispersion In addition to measures of central tendency, we can also summarize data by characterizing its variability Measures of dispersion are concerned with the distribution of values around the mean in data: 1. Range 2. Quartile range etc. 3. Mean deviation 4. Variance, standard deviation and z-scores 5. Coefficient of variation

Measures of Dispersion - Range 1. Range this is the most simply formulated of all measures of dispersion Given a set of measurements x 1, x 2, x 3,,x n-1, x n, the range is defined as the difference between the largest and smallest values: Range = x max x min This is another descriptive measure that is vulnerable to the influence of outliers in a data set, which result in a range that is not really descriptive of most of the data

Measures of Dispersion Quartile Range etc. 2. Quartile Range etc. We can divide distributions into a number of parts each containing an equal number of observations: Quartiles each contains 25% of all values Quintiles each contains 20% of all values Deciles each contains 10% of all values Percentiles each contains 1% of all values A standard application of this approach for describing dispersion involves calculating the interquartile range (a.k.a. quartile deviation)

Measures of Dispersion Quartile Range etc. 2. Quartile Range etc. cont. Rogerson (p. 6) defines the interquartile range as being the difference between the values of the 25 th and 75 th percentiles (i.e. the minimum value of the 2 nd quartile and the maximum value of the 3 rd quartile) This is well applied to skewed distributions, since it measures deviation around the median The interquartile range provides 2 of the 5 values displayed in a box plot, which is a convenient graphical summary of a data set

Measures of Dispersion Quartile Range etc. 2. Quartile Range etc. cont. A box plot graphically displays the following five values: Median Minimum value Maximum value 25 th percentile value max. median 75 th percentile value min. Rogerson, p. 8. 75 th %-ile 25 th %-ile Under some circumstances, the whiskers are not used for the min. and max. because of outliers

Measures of Dispersion Mean Deviation 3. Mean Deviation Once we have calculated the mean value for a data set, we can assess the difference between any observation and that mean, and this is termed the statistical distance: Statistical distance = x i - x If we take the absolute values of these, and sum for all observations, we have calculated the mean i=n deviation: Mean deviation = Σ x i x i=1 n

Measures of Dispersion Mean Deviation 3. Mean Deviation cont. Why is it necessary to take absolute values of statistical distances (x i x) before summing them to get the mean deviation? Because the statistical distances would be both positive and negative, and when summed using the mean deviation formula (without absolute values), they would sum to zero Mean deviation = i=n Σ x i x i=1 n

Measures of Dispersion Review 4. Standard Deviation Standard deviation is calculated by taking the square root of variance: σ = i=n Σ (x i µ) 2 i=1 N Population standard deviation S = i=n Σi=1 (x i x) 2 n - 1 Sample standard deviation Why do we prefer standard deviation over variance as a measure of dispersion? Magnitude of values and units match means.

Measures of Dispersion - Review 4. Standard Deviation This is the most frequently used measure of dispersion because it has the same units as the values and their mean (unlike variance) Procedure for finding the standard deviation of a data set: 1) Calculate the mean 2) Calculate the statistical distances (x i x) for each value 3) Square each of the statistical distances (x i x) 2 4) Sum the squared statistical distances, the sum of squares 5) Divide the sum of squares by N for a population or by (n-1) for a sample this gives you the variance 6) Take the square root of the variance to get the standard deviation

Measures of Dispersion - Review 4. Z-scores These express the difference from the mean in terms of standard deviations of an individual value, and thus can be compared to z-scores drawn from other data sets or distributions Procedure for finding the z-score of an observation: 1) Calculate the mean 2) Calculate the statistical distances (x i x) for each value where we wish find the z-score 3) Calculate the standard deviation 4) Calculate the z-score using the formula Z-score = x - x S

Measures of Dispersion - Review 5. Coefficient of Variation This is an overall measure of dispersion that is normalized with respect to the mean from the same distribution, and thus is comparable to coefficients of variation from other data sets because it is a normalized measure of dispersion Procedure for finding the coef. of variation for a data set: 1) Calculate the mean 2) Calculate the standard deviation 3) Calculate the coefficient of variation using the formula S σ Coefficient of variation = or (*100%) x µ

Skewness and Kurtosis - Review 1. Skewness This statistic measures the degree of asymmetry exhibited by the data (i.e. whether there are more observations on one side of the mean than the other) 2. Kurtosis This statistic measures the degree to which the distribution is flat or peaked

Skewness and Kurtosis - Review 1. Skewness This statistic measures the degree of asymmetry exhibited by the data (i.e. whether there are more observations on one side of the mean than the other): Skewness = i=n Σi=1 (x i x) 3 ns 3 Because the exponent in this moment is odd, skewness can be positive or negative; positive skewness has more observations below the mean than above it (negative vice-versa)

Skewness and Kurtosis - Review 1. Skewness This statistic measures the degree of asymmetry exhibited by the data Procedure for finding the skewness of a data set: 1) Calculate the mean 2) Calculate the statistical distances (x i x) for each value 3) Cube each of the statistical distances (x i x) 3 4) Sum the cubed statistical distances, the sum of cubes (i.e. this is the numerator in the skewness formula) 5) Divide the sum of cubes by the sample size multiplied by the standard deviation cubes (i.e. the denominator is n*s 3 in [Σ (x i x) 3 ] / [ n*s 3 ])

Skewness and Kurtosis - Review 2. Kurtosis This statistic measures how flat or peaked the distribution is, and is formulated as: i=n Σi=1 (x i x) 4 Kurtosis = ns 4-3 The 3 is included in this formula because it results in the kurtosis of a normal distribution to have the value 0 (this condition is also termed having a mesokurtic distribution)

Skewness and Kurtosis - Review 2. Kurtosis This statistic measures how flat or peaked the distribution is Procedure for finding the kurtosis of a data set: 1) Calculate the mean 2) Calculate the statistical distances (x i x) for each value 3) Raise each of the statistical distances to the 4 th power, i.e. (x i x) 4 4) Sum the statistical distances to the 4 th power Σ (x i x) 4 5) Divide the sum by the sample size multiplied by the standard deviation raised to the 4 th power (i.e. the denominator is n*s 4 in [Σ (x i x) 4 ] / [ n*s 4 ]) 6) Subtract 3 from [Σ (x i x) 4 ] / [ n*s 4 ]

Probability An Example, Part II Here, the values of x i are drawn from the four outcomes, and their probabilities are the number of events with each outcome divided by the total number of events: City # of Malls 1 1 Outcome #1 2 4 3 4 Outcome #2 4 4 5 2 Outcome #3 6 3 Outcome #4 x i P(x i ) 1 1/6 = 0.167 2 1/6 = 0.167 3 1/6 = 0.167 4 3/6 = 0.5 The probability of an outcome P(x i ) = # of times an outcome occurred Total number of events

Discrete Random Variables We can calculate the mean of a discrete probability distribution by taking all possible values of the variable, multiplying them by their probability, and summing them over the values: µ = i=k Σ x i *P(x i ) i=1 The symbol µ is used here rather than x because the basic idea of a probability distribution is to use a large number of values to approach a stable estimate of the parameter

Discrete Random Variables We can also calculate the variance of a discrete probability distribution by calculating the sum of squares for all possible values of the variable, multiplying them by their probability, and summing them over the values: σ 2 = i=k Σ (x i x)2 *P(x i ) i=1 These formulae are only useful for discrete probability distributions, for continuous probability dists. a different method is required

Probability Rules If the sets A and B do overlap in the Venn diagram, the sets are not mutually exclusive, and this represents a case of independent, but not exclusive events The union of sets A and B here is P(A Β) = P(A) + P(B) - P(A Β) because we do not wish to count the intersection area twice, thus we need to subtract it from the sum of the areas of A and B when taking the union of a pair of overlapping sets The intersection of sets A and B here is calculated by taking the product of the two probabilities, a.k.a. the multiplication rule: P(A Β) = P(A) * P(B) A P(A Β) = P(A) + P(B) - P(A Β) A B B P(A Β) = P(A) * P(B)

Probability Rules Consider set A to give the chance of precipitation at P(A)=0.4 and set B to give the chance of below freezing temperatures at P(B)=0.7 The complement of set A is P(A ) = 1 - P(A) P(A ) = 1 0.4 = 0.6 This expresses the chance of it not raining or snowing at P(A ) = 0.6 The complement of the union of sets A and B is P(A Β) = 1 [P(A) + P(B) - P(A Β)] P(A Β) = 1 [0.4 + 0.7 0.28] = 0.18 This expresses chance of it neither raining nor being below freezing at P(A Β) = 0.18 P(A Β) A A P(A ) = 1 - P(A) A B P(A Β) = 1 [P(A) + P(B) - P(A Β)]

Bernoulli Trials We can provide a general formula for calculating the probability of x successes, given n trials and a probability p of success: P(x) = C(n,x) * p x * (1 - p) n - x where C(n,x) is the number of possible combinations of x successes and (n x) failures: C(n,x) = n! x! * (n x)! where n! = n * (n 1) * (n 2) * * 3 * 2 * 1

The Poisson Distribution Procedure for finding Poisson probabilities and expected frequencies: 1. Set up a table with five columns as on the previous slide 2. Multiply the values of x by their observed frequencies (x * f obs ) 3. Sum the columns of f obs (observed frequency) and x * f obs 4. Compute λ = Σ (x * f obs ) / Σ f obs 5. Compute P(x) values using the eqn. or a table 6. Compute the values of F exp = P(x) * Σ f obs

The Normal Distribution You will recall the z-score (a.k.a. standard normal variate, standard normal deviate, or just the standard score), which is calculated by subtracting the sample mean from the observation, and then dividing that difference by the sample standard deviation: Z-score = x - µ σ The z-score is the means that is used to transform our normal distribution into a standard normal distribution, which is simply a normal distribution that has been standardized to have µ = 0 and σ = 1

Standard Normal Tables Using our example z-score of -1.75, we find the position of 1.75 in the table and use the value found there; because the normal distribution is symmetric the table does not need to repeat positive and negative values

Standard Normal Tables This table is defined to give the area under the curve between the specified value through to the rest of the tail of the distribution (theoretically to an infinite z-score): Looking up z = -1.75 has given a P(x) = 0.0401 for the tail below the value of z = -1.75, and using this sort of information, we can retrieve the probability of any interval (to 3.09 z where the table ends, and up to 0.01 in precision)

Finding the P(x) for Various Intervals 1. P(0 Z a) = [0.5 (table value)] a Total Area under the curve = 1, thus the area above x is equal to 0.5, and we subtract the area of the tail 2. P(Z a) = [1 (table value)] a Total Area under the curve = 1, and we subtract the area of the tail a 3. P(Z a) = (table value) Table gives the value of P(x) in the tail above a

Finding the P(x) for Various Intervals 4. P(a Z 0) = [0.5 (table value)] a This is equivalent to P(0 Z a) when a is positive 5. P(Z a) = [1 (table value)] a This is equivalent to P(Z a) when a is positive 6. P(Z a) = (table value) a Table gives the value of P(x) in the tail below a, equivalent to P(Z a) when a is positive

Finding the P(x) for Various Intervals 7. b a P(a Z b) if a < 0 and b > 0 = [0.5 (table value for a)] + [0.5 (table value for b)] = [1 {(table value for a) + (table value for b)}] Since the table gives us the value of P for the tail beyond the specified z- score, simply subtract the area of the two tails from the total area = 1 With this set of building blocks, you should be able to calculate the probability for any interval using a standard normal table

Standard Error The standard deviation of the sample sampling distribution X is formulated as: σ X = σ n This is the standard error, and it is the unit of measurement of a confidence interval, used to express the closeness of a statistic to a parameter When we construct a confidence interval we are finding how many standard errors away from the mean we have to go to find the area under the curve equal to the confidence level

Constructing a Confidence Interval 1. Select our desired level of confidence 2. Transform that confidence level into a probability, usually denoted by the symbol α 3. Calculate α/2 since we want an interval that is symmetric about the mean of the distribution and has our α to equally partitioned into the two tails 4. Look up the corresponding z-score in a standard normal table 5. Multiply the z-score by the standard error to find the interval by adding and subtracting this product from the mean

Constructing a Confidence Interval - Steps 1. Select our desired level of confidence Let s suppose we want to construct an interval using the 95% level of confidence 2. Transform that confidence level into a probability The value of α is calculated using 100 * (1 - α) %, i.e. if the confidence level is 95%, then α is 5% or α = 0.05 3. Calculate α/2 We re going to find the interval on either side of the mean, so we need a z-score for α/2, here 0.025

Constructing a Confidence Interval - Steps 4. Look up the corresponding z-score We use the a standard normal table (like Table A.2 on page 214 of Rogerson) to find the z-score that corresponds with our α/2. In this case, we have selected α/2 = 0.025 to a z-score of 1.96 5. Multiply the z-score by the standard error etc. We are finding an interval which corresponds to 95% of the area under the curve, which is the interval from z = -1.96 to z = 1.96, which is equal to [µ -(Z α/2 * std. error), µ + (Z α/2 * std. error)] when we have the population µ and σ

Constructing a Confidence Interval - Steps 5. Multiply the z-score by the standard error cont. I.e. if the population µ and σ are known because we are working with a known distribution, the standard error is σ X = σ n and the interval can then be expressed as: [µ -(Z α/2 * σ/ n), µ + (Z α/2 * σ/ n)] Using sample statistics, we can make the interval: [x - (Z α/2 * s/ n), x + (Z α/2 * s/ n)]