Averages and Variability. Aplia (week 3 Measures of Central Tendency) Measures of central tendency (averages)

Chapter 4 Averages and Variability Aplia (week 3 Measures of Central Tendency) Chapter 5 (omit 5.2, 5.6, 5.8, 5.9) Aplia (week 4 Measures of Variability) Measures of central tendency (averages) Measures of variability Two critical types of descriptive statistics for data from quantitative variables 1

Mode Measures of Central Tendency most frequent score in a set of data not the frequency of that score example: stress ratings from control group 4 2 5 4 6 3 4 2 4 5 3 5 4 6 2 7 3 5 7 6 mode = 4 Control Yoga Rating Freq. Rating Freq. 1 0 1 1 2 3 2 5 3 3 3 5 4 5 4 6 5 4 5 1 6 3 6 1 mode =? 7 2 7 1 2

Median Measures of Central Tendency middle score in a rank-ordered list of scores 50% of scores above and 50% below the median example: stress ratings from control group (N = 20) 2 2 2 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 7 7 median =? Control Yoga Rating Freq. Rating Freq. 1 0 1 1 2 3 2 5 3 3 3 5 4 5 4 6 5 4 5 1 median =? 6 3 6 1 7 2 7 1 3

Measures of Central Tendency Mean typical concept of average X = M = X N example: stress ratings from control group 2 2 2 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 7 7 M = 87 20 = 4.35 example: stress ratings from yoga group 4 5 2 4 6 4 2 7 2 3 4 3 3 4 3 2 1 2 3 4 M = 68 20 = 3.40 4

Measures of Central Tendency Mean balance point analogy (mean = fulcrum) scores arrayed on a long board according to score value, as in a histogram mean represents the balance point, or fulcrum Group 1 Group 2 0 1 2 3 4 5 6 7 8 4.35 0 1 2 3 4 5 6 7 8 3.40 5

Mean Measures of Central Tendency implications of balance point analogy sensitivity to extreme score example: change a 7 to a 1 Group 1 Group 1 0 1 2 3 4 5 6 7 8 4.35 0 1 2 3 4 5 6 7 8 M = 81 20 = 4.05 6

Mean Measures of Central Tendency implications of balance point analogy sensitivity to extreme score example: change a 7 to a 1 Group 1 Group 1 0 1 2 3 4 5 6 7 8 4.35 mode and median do not change in this example 0 1 2 3 4 5 6 7 8 4.05 M = 81 20 = 4.05 7

Measures of Central Tendency Mean implications of balance point analogy deviations from mean sum to zero example: first 10 scores from yoga group 4 5 2 4 6 4 2 7 2 3 M = 39 10 = 3.90 4 3.9 = 0.1 5 3.9 = 1.1 2 3.9 = 1.9 4 3.9 = 0.1 6 3.9 = 2.1 4 3.9 = 0.1 2 3.9 = 1.9 7 3.9 = 3.1 2 3.9 = 1.9 3 3.9 = 0.9 Σ(X M) = 0 8

Measures of Central Tendency Comparing measures of central tendency similar value for mean, median, and mode with symmetrical, unimodal distribution example: normal distribution mean, median, mode 9

Measures of Central Tendency Comparing measures of central tendency the three measures take on different values with skewed distributions example: positively skewed distribution 10

Measures of Central Tendency Comparing measures of central tendency the three measures take on different values with skewed distributions example: positively skewed distribution illustration with response time data from congruent color naming trials 12 Congruent with synaesthesia 10 case Frequency 8 6 4 2 0 524.5 624.5 724.5 824.5 924.5 1024.5 1124.5 1224.5 Response Time 11

Measures of Central Tendency Data from a positively skewed distribution 582 589 592 596 597 601 614 624 631 633 636 638 647 651 652 653 655 666 670 673 680 681 687 692 704 719 722 726 733 735 739 741 745 746 748 760 763 768 782 789 795 811 822 829 883 928 938 Frequency 12 10 8 6 4 2 0 524.5 624.5 724.5 824.5 924.5 1024.5 1124.5 1224.5 median? mean? Congruent Response Time 12

Measures of Central Tendency Data from incongruent condition stronger skew 579 591 594 595 600 613 613 652 662 668 674 677 679 682 690 691 707 726 743 753 755 756 758 761 769 778 787 801 801 805 814 835 853 859 860 867 872 893 893 897 898 951 973 1038 1156 1234 12 10 8 6 4 2 0 499.5 699.5 899.5 1099.5 1299.5 Response Time median? mean? 759.5 779.4 Incongruent 14

Measures of Central Tendency Using measures of central tendency emphasis on mean algebraically tractable when deriving statistical procedures most accurate sample-based estimate of its population value median is the best option for rank ordered data mode is the only measure that can be used with nominal data 15

Variability Variation among scores in a distribution variation due to individual differences, imperfect reliability in measurement variation provides important information in addition to central tendency compare two sets of scores with identical means ratings on a 10-point scale of 5 independent judges of aggressiveness exhibited by a child situation 1: 5 5 6 6 6 M = 5.6 situation 2: 2 3 6 7 10 M = 5.6 16

Variability Variation among scores in a distribution situation 1: 5 5 6 6 6 M = 5.6 situation 2: 2 3 6 7 10 M = 5.6 Measures of variability are based on distances between scores Range (highest lowest) situation 1: 6 5 = 1 situation 2: 10 2 = 8 17

Variability Variance average squared deviation from the mean population variance σ 2 = (X µ)2 N sample variance s 2 = (X M )2 N 1 18

Variability Variance: application to aggressiveness rating data situation 1 situation 2 5 5 6 6 6 M = 5.6 2 3 6 7 10 M = 5.6 X M (X M) (X M) 2 X M (X M) (X M) 2 5 5.6 0.6 0.36 5 5.6 0.6 0.36 6 5.6 0.4 0.16 6 5.6 0.4 0.16 6 5.6 0.4 0.16 (X M ) 2 2 5.6 3.6 12.96 3 5.6 2.6 6.76 6 5.6 0.4 0.16 7 5.6 1.4 1.96 10 5.6 4.4 19.36 = 41.20 = 1.20 (X M ) 2 s 2 = 1.20 4 = 0.30 s 2 = 41.20 4 = 10.30 19

Variability Standard deviation variance is a measure in squared units take square root to obtain a measure in original units σ = σ 2 (X µ) 2 = s = s 2 (X M ) 2 = N N 1 situation 1 situation 2 s 2 = 0.30 s 2 = 10.30 s = 0.30 = 0.55 s = 10.30 = 3.21 rough concept of standard deviation average distance of scores from mean 20

Comprehension check Variability relative to a normal distribution, is variance smaller, larger, or about the same in a uniform distribution with the same mean and range? Note: σ 2 = (X µ)2 N 21

Comprehension check Variability same question for bimodal distribution relative to a normal distribution 22

Comprehension check Variability combine two normal distributions what is the variance of the resulting distribution relative to the variance of the original distributions? Case 1: identical distributions 23

Variability Rationale for using N 1 in formula for s 2 objective is to obtain an unbiased estimate of σ 2 s 2 based on deviations from M, not µ within a sample, an extreme score will pull M toward it, generating a deviation between M and that extreme score that is somewhat smaller than the deviation between µ and that score the use of M instead of µ is the reason division by N would produce an underestimate of σ 2 25

Variability Illustration of what happens when variance is estimated from a sample suppose we have a normally distributed population with µ = 10 and σ 2 = 9 a random sample of 5 scores is drawn: 7, 8, 8, 9, 14 compute the sum of squared deviations from µ and from M 26

Variability X µ (X µ) 2 7 10 9 8 10 4 8 10 4 9 10 1 14 10 16 Σ(X µ) 2 = 34 X M (X M) 2 7 9.2 4.84 8 9.2 1.44 8 9.2 1.44 9 9.2 0.04 14 9.2 23.04 Σ(X M) 2 = 30.8 a sample of deviations based on a sample mean usually will be smaller than when based on µ demonstration with R 27

Variability R code set.seed(378) vars=null vars=replicate(100000,c(vars,var(rnorm(5,10,3)))) head(vars) hist(vars,100,xlim=c(0,80)) mean(vars) set.seed(378) nvars=null nvars=replicate(100000,c(nvars,((4/5)*var(rnorm(5,10,3))))) head(nvars) hist(nvars,100,xlim=c(0,80)) mean(nvars) 28

Variability Variance computed with N 1 > mean(vars) [1] 9.005985 Variance computed with N > mean(vars) [1] 7.204788 29