Chapter 7. Inferences about Population Variances

Similar documents
Chapter 11: Inference for Distributions Inference for Means of a Population 11.2 Comparing Two Means

χ 2 distributions and confidence intervals for population variance

Normal Probability Distributions

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Key Objectives. Module 2: The Logic of Statistical Inference. Z-scores. SGSB Workshop: Using Statistical Data to Make Decisions

Copyright 2005 Pearson Education, Inc. Slide 6-1

Statistics for Business and Economics

The Two-Sample Independent Sample t Test

Hypothesis Tests: One Sample Mean Cal State Northridge Ψ320 Andrew Ainsworth PhD

Statistics 431 Spring 2007 P. Shaman. Preliminaries

On Performance of Confidence Interval Estimate of Mean for Skewed Populations: Evidence from Examples and Simulations

Learning Objectives for Ch. 7

Two Populations Hypothesis Testing

STAT Chapter 6: Sampling Distributions

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

7.1 Comparing Two Population Means: Independent Sampling

Tests for One Variance

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Probability & Statistics

Conover Test of Variances (Simulation)

Basic Procedure for Histograms

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

DATA SUMMARIZATION AND VISUALIZATION

Two-Sample T-Test for Superiority by a Margin

Two-Sample T-Test for Non-Inferiority

Statistics 13 Elementary Statistics

1) 3 points Which of the following is NOT a measure of central tendency? a) Median b) Mode c) Mean d) Range

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

MVE051/MSG Lecture 7

Lecture 1: Review and Exploratory Data Analysis (EDA)

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Chapter 8 Statistical Intervals for a Single Sample

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

appstats5.notebook September 07, 2016 Chapter 5

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 7.4-1

Unit 2 Statistics of One Variable

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 42

Applied Statistics I

10/1/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Independent-Samples t Test

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

Statistical Intervals (One sample) (Chs )

Chapter 7 Sampling Distributions and Point Estimation of Parameters

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

1 Describing Distributions with numbers

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Chapter 9: Sampling Distributions

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

Simple Descriptive Statistics

Experimental Design and Statistics - AGA47A

An approximate sampling distribution for the t-ratio. Caution: comparing population means when σ 1 σ 2.

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Standardized Data Percentiles, Quartiles and Box Plots Grouped Data Skewness and Kurtosis

Some Characteristics of Data

UNIVERSITY OF VICTORIA Midterm June 2014 Solutions

Commonly Used Distributions

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

FV N = PV (1+ r) N. FV N = PVe rs * N 2011 ELAN GUIDES 3. The Future Value of a Single Cash Flow. The Present Value of a Single Cash Flow

Statistics Class 15 3/21/2012

Lecture Data Science

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

Data Analysis. BCF106 Fundamentals of Cost Analysis

Previously, when making inferences about the population mean, μ, we were assuming the following simple conditions:

Distribution. Lecture 34 Section Fri, Oct 31, Hampden-Sydney College. Student s t Distribution. Robb T. Koether.

Technology Support Center Issue

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

Section3-2: Measures of Center

8.1 Estimation of the Mean and Proportion

Lecture 6: Chapter 6

Much of what appears here comes from ideas presented in the book:

AP Stats Review. Mrs. Daniel Alonzo & Tracy Mourning Sr. High

Standard Deviation. Lecture 18 Section Robb T. Koether. Hampden-Sydney College. Mon, Sep 26, 2011

Confidence Intervals. σ unknown, small samples The t-statistic /22

Uniform Probability Distribution. Continuous Random Variables &

Name PID Section # (enrolled)

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

STAT 113 Variability

Data Distributions and Normality

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Chapter 8 Estimation

Lecture 2. Probability Distributions Theophanis Tsandilas

Chapter 7. Sampling Distributions

σ e, which will be large when prediction errors are Linear regression model

CIVL Confidence Intervals

22S:105 Statistical Methods and Computing. Two independent sample problems. Goal of inference: to compare the characteristics of two different

1 Inferential Statistic

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

The normal distribution is a theoretical model derived mathematically and not empirically.

STRESS-STRENGTH RELIABILITY ESTIMATION

One sample z-test and t-test

IOP 201-Q (Industrial Psychological Research) Tutorial 5

Frequency Distribution and Summary Statistics

Chapter Seven: Confidence Intervals and Sample Size

Business Statistics 41000: Probability 3

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Transcription:

Chapter 7. Inferences about Population Variances

Introduction () The variability of a population s values is as important as the population mean. Hypothetical distribution of E. coli concentrations from two analytical methods: petrifilm HEC test and hydrophobic grid membrane filtration (HGMF).

Introduction () To compare both the means and standard deviation. To use HEC and HGMF procedures to 4 pure culture samples each. To apply both procedures to artificially contaminated beef samples. Table 7. E. coli readings (log0 (CFU/ml) from HGMF and HEC methods.

Introduction (3) The two procedures appear to be very similar with respect to the width of box and length of whiskers, but HEC has a larger median than HGMF. Also, the variability in conecntration readings for HEC appears to be slightly greater than that of HGMF.

Introduction (4) The initial conclusion might be that the two procedures yield different distributions of readings for their determination of E. coli concentrations. However, we need to determine whether the differences in their sample means and standard deviations infer a difference in the corresponding population values. Inferential problems about population variances are similar to the problems addressed in making inferences about the population mean. We must construct point estimators, confidence intervals, and the test statistics from the randomly sampled data to make inferences about the variability in the population values.

Estimation and Tests for a Population Variance () Unbiased estimator s ( y y) = n s is an unbiased estimator of σ. If the population distribution is normal, then the sampling distribution of s can have a shape similar to those depicted in Figure 7.3. ( n ) s It can be shown that the statistic σ follows a chi-square distribution with df = n-. The mathematical formula for the chisquare ( χ ) probability distribution is very complex so it is not displayed here.

Estimation and Tests for a Population Variance ()

Properties of Chi-Square Distribution () The chi-square distribution is positively skewed with values between 0 and. There are many chi-square distributions and they are labeled by the parameter degrees of freedom (df). The mean and variance of the chi-square distribution has df = 30, then the mean and variance of that distribution are = 30 and σ = 60. Because the chi-square distribution is not symmetric, the confidence intervals based on this distribution do not have the usual form, estimate error, as we saw for and µ. µ

Properties of Chi-Square Distribution () The 00(- )% confidence interval for σ is obtained by dividing the estimator of σ, s, by the lower and upper / percentile, and. χ L χ U ( n ) s χ ( n ) s < σ < U χ L

Statistical Test for σ

Note When sample sizes are moderate to large (n 30), the t distributionbased procedures can be used to make inferences about even when the normality condition does not hold, because for moderate to large sample sizes the Central Limit Theorem provide that he sampling distribution of the sample mean is approximately normal. Unfortunately, the same type of result does not hold for the chi-squarebased procedures for making inference about ; that is, if the population distribution is distinctly nonnormal, then these procedures for are not appropriate even if the sample size is large. If a boxplot or normal probability plot of the sample data shows substantial skewness or a substantial number of outliers, the chisquare-based inference procedures should not be applied.

Example 7. To test whether the potency of a specific pesticide meets the potency claimed by the manufacturer --- the drop in potency from 0 to 6 months will vary in the interval from 0% to 8%. A random sample of 0 containers of pesticides from the manufacturer.

Answer to Example 7. () The manufacturer claimed that the population of potency reductions has a range of 8%. Dividing the range by 4, we obtain an approximate population standard deviation of =%. The approximate null and alternative hypotheses are: H H 0 a : σ : σ 4 > 4 (i.e., the manufacturer's claim is correct.) (i.e., there is more variability than claimed by the manufacturer.)

Answer to Example 7. () Normal probability plot for potency data: The variance of the potency data is s = 5.45. The test statistic and rejection region are as follows: χ n ) s 9 5.45 = = = 5.88 [ P( χ σ 4 ( 9 > 5.88) = 0.4 ]

Example 7.3 A simulation study was conducted to investigate the effect on the level of the chi-square test of sampling from heavy-tailed and skewed distribution rather than the required normal distribution. The five distributions were normal, uniform (short-tailed), t distribution with df=5 (heavy-tailed), and two gamma distributions, one slightly skewed and other heavily skewed. Some summary statistics about the distributions are given in the following table. Note that each of the distributions has the same variance, σ = 00, but the skewness and kurtosis of the distributions vary. From each of the distributions, 500 random samples of sizes 0, 0, and 50 were selected and a test of H 0 : σ 00 were conducted using α=0.05 for the hypothesis. A chisquare test of variance was performed for each of the 500 samples of the various sample sizes from each of the five distributions. What do the results indicate that the sensitivity of the test to sampling from a nonnormal population? Distribution Summary Statistics Normal Uniform t (df=5) Gamma (shape= ) Gamma (shape = 0.) Mean 0 7.3 0 0 3.6 Variance 00 00 00 00 00 Skewness 0 0 0 6.3 Kurtosis 3.8 9 9 63

Answer to Example 7. When the population distribution is symmetric with shorter trail than a normal distribution, the actual probabilities are smaller than 0.05, whereas for a symmetric distribution with heavy tails, the Type I error probabilities are much greater than 0.05. There is strong evidence that the claimed α value of the chi-square test of population variance is very sensitive to nonnormality.

Estimation and Tests for Comparing Two Population Variances Application of a test for the equality of two population variances is for evaluating the validity of the equal variance conditions for a two-sample t test. When random samples of sizes n and n have been independently drawn from two normally distributions, s the ratio σ s s possesses a probability distribution s σ = σ σ in repeated sampling referred to as an F distribution.

Properties of the F Distribution () F can assume only positive values. The F distribution, unlike the normal distribution or the t distribution but like the χ distribution, is nonsymmetrical. There are many F distributions, and each one has a different shape. We specify a particular one by designating the degrees of freedom associated with s and s. We denote these quantities by d f and df, respectively.

Statistical Test Comparing σ and σ and σ A statistical test comparing utilizes the test statistic s s. When σ, and = σ σ σ = s s follows an F distribution with and. s F = s The lower-tail values are obtained from the following relationship: Let be the upper α percentile and F α F α, df df,, df, df be the lower α percentile of an F distribution with df and df. F = α, df, df F α, df, df σ df = n df = n s s F L σ σ s s F U

Example 7.7 To test hypotheses about the means and standard deviation of HEC and HGMF E. coli concentrations.

Answer to Example 7.7 () Normal probability plots for HGMF and HEC.

Answer to Example 7.7 () Null hypothesis : Summary statistics: Procedure Sample Size H : a σ σ 0 σ = σ vs. H : Mean Accept H 0 and conclude that HEC appears to have a similar degree of variability as HGMF in its determination of E. coli concentration. Standard Deviation HEC 4 7.346 0.9 HGMF 4 6.959 0.096 0.9 F0 = =.9 ( < F0.05,3,3 =.3) 0.096

Answer to Example 7.7 (3) Both the HEC and HGMF E. coli concentration readings appear to be independently random samples from normal populations with a common standard deviation, so we can use a pooled t test to evaluate. y y t = =.87 > t0.05, 46 S p + n n = H : a µ µ 0 µ = µ vs. H :.0 Reject H 0 and conclude that there is significant evidence that the average HEC E. coli concentration readings different from the average HGMF readings.

Effect on the Level of F test of Sampling from Non-normal Distributions () A simulation study was conducted to investigate the effect on the level of the F test of sampling from heavy-tailed and skewed distribution rather than the required normal distribution. The five distributions were normal, uniform (short-tailed), t distribution with df=5 (heavytailed), and two gamma distributions, one slightly skewed and other heavily skewed. For each pair of sample sizes (n, n) = (0,0), (0,0), or (0,0), random samples of the specified sizes were selected from one of the five distributions. A test of H 0 : σ = σ vs. H a : σ σ was conducted using F test with =0.05.

Effect on the Level of F test of Sampling from Non-normal Distributions () Proportion of times H 0 : σ = σ was rejected (α=0.05). Distribution Sample Sizes Normal Uniform t (df=5) Gamma (shape = ) Gamma (shape=0.) (0,0) 0.054 0.00 0. 0.5 0.693 (0,0) 0.056 0.0068 0.40 0.36 0.67 (0,0) 0.050 0.0044 0.50 0.64 0.673 When the population distribution is a symmetric short-tailed distribution like the uniform distribution, the value of α is much smaller than the specified value of 0.05. Thus, the probability of Type II errors of the F test would most likely be much larger than what would occur when sampling from normally distributed populations.

Tests for Comparing More Than Two Population Variances Hartley F max test for homogeneity of population variances

Example 7.8 It is thought that the temperature can be manipulated to target the power (the strength of the lens) in the manufacture of soft contact lenses. So interest is in comparing the variability in power. The data are coded deviations from target power using monomers from three different suppliers. We wish to test H : σ = σ = σ. 0 3

Answer to Example 7.8 Boxplot of deviation from target power for three suppliers R. R.: Reject H S S F min max max = = = if min(8.69, 6.89, 80.) min(8.69, 6.89, 80.) S S max min = 0 80. 6.89 F max F max,0.05 = 6.89 = 80. =.64 > 6.00 = 6.00 Reject H 0 and conclude that the variances are not all equal.

An Issue of Hartley F max Test The Hartley F max test is quite sensitivity to departures from normality. If the population distributions we are sampling from have a somewhat nonnormal distribution but the variances are equal, the F max will reject H 0 and declare the variances to be unequal. The test is detecting the nonnormality of the population distribution, not the unequal variances. An alternative approach that does not require the population to have normal distribution is the Levine test. However, the Levine test involves considerably more calculation than the Hartley test. Also, when the populations have a normal distribution, the Hartley test is more powerful than the Levine test.

Levine s Test for Homogeneity of Population Variances ()

Levine s Test for Homogeneity of Population Variances ()