Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

Today: Fiish Chapter 9 (Sectios 9.6 to 9.8 ad 9.9 Lesso 3) ANNOUNCEMENTS: Quiz #7 begis after class today, eds Moday at 3pm. Quiz #8 will begi ext Friday ad ed at 10am Moday (day of fial). There will be clicker questios i all lectures ext week. The last homework assigmet (#8) will be from the lectures o Mo ad Wed ad will be due ext Friday. Problems to be assiged ext Friday are already o the web, with solutios, ad are to help you review that material (ot to had i). Review for the fial exam is posted ad will be covered i discussio sectios ext Friday (ot i class). Two files: o Material sice 2 d midterm o Cocepts from the quarter that eed extra review HOMEWORK: (Due Moday, March 11) Chapter 9: #68, 72, 146

Update o the five situatios we will cover for the rest of this quarter: Parameter ame ad descriptio Populatio parameter Sample statistic For Categorical Variables: [Doe!] Oe populatio proportio (or probability) p pˆ Differece i two populatio proportios p 1 p 2 pˆ ˆ 1 p2 For Quatitative Variables: [Today, M, W] Oe populatio mea µ x Populatio mea of paired differeces (depedet samples, paired) µ d d Differece i two populatio meas µ (idepedet samples) 1 µ 2 x1 x2 For each situatio will we: Lear about the samplig distributio for the sample statistic Lear how to fid a cofidece iterval for the true value of the parameter Test hypotheses about the true value of the parameter

Recall, geeral format for all samplig distributios i Ch. 9: Assumig sample size coditios are met, the samplig distributio of the sample statistic: Is approximately ormal Mea = populatio parameter (p, p 1 p 2, μ, etc.) Stadard deviatio = stadard deviatio of ; the blak is filled i with the statistic ( p ˆ, pˆ pˆ 1 2, x etc.) Ofte the stadard deviatio must be estimated, ad the it is called the stadard error of. See summary table o page 353 for all details!

Today: Samplig distributios for meas of quatitative data: oe mea mea differece for paired data differece betwee meas for idepedet samples Remember, two samples are called idepedet samples whe the measuremets i oe sample are ot related to the measuremets i the other sample. Could come from: Separate samples Oe sample, divided ito two groups by a categorical variable (such as male or female) Radomizatio ito two groups where each uit goes ito oly oe group Paired data occur whe two measuremets are take o the same idividuals, or idividuals are paired i some way.

Samplig Distributio for a Sample Mea (Sectio 9.6) Suppose we take a radom sample of size from a populatio ad measure a quatitative variable. Notatio for Populatio (uses Greek letters): μ = mea of the populatio of measuremets. σ = stadard deviatio of the populatio of measuremets. Notatio for Sample: x = sample mea of a radom sample of idividuals. s = sample stadard deviatio of the radom sample

The samplig distributio of the sample mea x is: approximately ormal Mea = populatio parameter = μ Stadard deviatio = stadard deviatio of x = sd..( x) Ofte the stadard deviatio of x must be estimated, ad the it is called the stadard error of x. Replace populatio σ with sample stadard deviatio s, so se..( x) s

Cosider the mea weight loss for the populatio of people who atted weight loss cliics for 10 weeks. Suppose the populatio of idividual weight losses is approximately ormal, μ = 8 pouds, σ= 5 pouds. (Empirical rule: see picture) Populatio of idividual weight losses 0.09 Weight losses for 10 week cliic Normal, Mea=8, StDev=5 0.08 0.07 0.06 Desity 0.05 0.04 0.03 0.02 0.01 0.00-7 -2 3 8 13 18 Number of pouds lost (or gaied, if egative value) 23

We pla to take a radom sample of 25 people from this populatio ad record weight loss for each perso, the fid sample mea x. We kow the value of the sample mea will vary for differet samples of = 25. How much will they vary? Where is the ceter of the distributio of possibilities? Results for four possible radom samples of 25 people, with the correspodig sample mea x ad sample stadard deviatio s: Sample 1: x = 8.32 pouds, s = 4.74 pouds. Sample 2: x = 6.76 pouds, s = 4.73 pouds. Sample 3: x = 8.48 pouds, s = 5.27 pouds. Sample 4: x = 7.16 pouds, s = 5.93 pouds.

Note: Each sample had a differet sample mea, which did ot always match the populatio mea of 8 pouds. Although we caot determie whether oe sample mea will accurately reflect the populatio mea, statisticias have determied what to expect for all possible sample meas. μ = mea of populatio of iterest = 8 pouds σ = stadard deviatio of populatio of iterest = 5 pouds. x = sample mea of a radom sample of idividuals. The the samplig distributio of x is approximately ormal, with Mea = μ Stadard deviatio = s.d.( x ) =

Example: Mea of 25 weight losses, the distributio of possible values is approximately ormal with: mea = 8 pouds 5 stadard deviatio = = 25 = 1 poud What if = 100 istead of 25? Compare: idividual weight loss, x for = 25, x for = 100 Idividuals (wt loss) Mea of 25 Mea of 100 Mea 8 pouds 8 pouds 8 pouds St. Dev. 5 pouds 1 poud ½ poud Coditios for samplig distributio of x to be approximately ormal: Populatio (idividual values) are approx. bell-shaped OR Sample size is large (at least 30, more if outliers)

Comparig origial populatio with samplig distributio of x : Weight loss for idividuals, ad for mea of 25 idividuals Normal, Mea=8 0.4 StDev 5 1 0.3 Desity 0.2 0.1 0.0-7 -2 3 8 13 18 Weight loss or average weight loss 23 From the empirical rule: 68% 95% 99.7% Idividuals 3 to 13 lbs 2 to 18 lbs 7 to 23 lbs Mea of = 25 7 to 9 lbs 6 to 10 lbs 5 to 11 lbs

Note that larger sample size will result i smaller s.d.( x ) Compare samplig distributio of x for = 25 ad = 100: 0.9 0.8 0.7 Samplig distributio of the sample mea, = 25 ad = 100 Normal, Mea=8 StDev 1 0.5 0.6 = 100 Desity 0.5 0.4 0.3 0.2 = 25 0.1 0.0 5 6 7 8 9 Mea weight loss 10 11 I other words, for larger samples, the sample mea x will be closer to μ i geeral, ad thus will be a better estimate for μ.

Example where the origial populatio is ot bell-shaped: A bus rus every 10 miutes. Whe you show up at the bus stop, it could come immediately, or ay time up to 10 miutes. So the time you wait for it is uiform, from 0 to 10 miutes, ad idepedet from day to day. 10 Populatio mea = μ = 5 miutes, populatio s.d. = σ = 2 12 = 2.9 What is the samplig distributio of x for = 40 days? Eve though the origial times are uiform (flat shape), the possible values of the sample mea x are: Approximately ormal Mea of x = 5 miutes 2.9 Stadard deviatio of x = 40 = 0.46 miutes

0.9 0.8 0.7 0.6 Origial values ad samplig distributio of mea for = 40 Distributio Mea StDev Normal 5 0.46 Distributio Lower Upper Uiform 0 10 Desity 0.5 0.4 0.3 0.2 0.1 0.0 0 2 4 5 6 8 10 Waitig time or mea waitig time for = 40 Examples of possible samples: 6.9 6.2 8.8 8.3 7.1 6.5 7.3 9.5 3.4 9.9 5.8 1.4 3 4.1 4.4 0.6 2.7 1.2 7.4 0.7 6.8 7.7 6.2 6.1 3.3 1 5.3 9.4 1 0.8 1 9.4 8.1 3.9 7.2 8.6 1.1 0.4 9.9 9.2 0.6 3.2 0.8 0.2 8.5 1.4 4.7 0.5 9.7 8.9 6.3 3.3 0.8 4.1 2.6 3.7 5.7 3.2 8.9 2.3 1.1 9.9 3 0.8 7.9 0.8 5.9 2.5 7.9 7.6 4 2.2 0.6 0.1 6.1 6.9 8.1 2.6 9.6 5.3 x = 5.29 x = 4.3

Sectios 9.7 ad 9.8: Samplig distributios for mea of paired differeces, ad for differeces i meas for idepedet samples Need to lear to distiguish betwee these two situatios. Notatio for paired differeces: d i = differece i the two measuremets for idividual i = 1, 2,..., µ d = mea of the populatio of differeces, if all possible pairs were to be measured σ d = the stadard deviatio of the populatio of differeces d = the mea of the sample of differeces s d = the stadard deviatio of the sample of differeces Example: IQ measured after listeig to Mozart ad to silece d i = differece i IQ for studet i for the two coditios µ d = populatio mea differece, if all studets measured (ukow) d = the mea of the sample of differeces = 9 IQ poits Based o sample, we wat to estimate mea populatio differece

Notatio for differece i meas for idepedet samples: µ 1 = populatio mea of the first populatio µ 2 = populatio mea of the secod populatio Parameter of iterest is µ 1 µ 2 = the differece i populatio meas x 1 = sample mea of the sample from the first populatio x 2 = sample mea of the sample from the secod populatio The sample statistic is x1 x2 = the differece i sample meas σ 1 = populatio stadard deviatio of the first populatio σ 2 = populatio stadard deviatio of the secod populatio s 1 = sample stadard deviatio of the sample from the 1 st populatio s 2 = sample stadard deviatio of the sample from the 2 d populatio 1 = size of the sample from the 1 st populatio 2 = size of the sample from the 2 d populatio

Examples where paired data might be used: Estimate average differece i icome for husbads ad wives Compare SAT scores before ad after a traiig program Differece betwee what you eared i 2012 ad what you hope to have as your startig salary whe you graduate. Note that paired differeces are similar to the oe mea situatio, except special otatio tells us that the meas are of the differeces. Examples where idepedet samples might be used: Compare hours of study for me ad wome i our class. Compare umber of sick days off from work for people who had a flu shot ad people who did t Compare chage i blood pressure for people radomly assiged to a meditatio program or to a exercise program for 3 moths.

Coditios for the samplig distributios for these two situatios are the same as for a sigle mea, with a slight twist: For paired differeces, populatio of differeces must be bellshaped OR sample must be large. For differece i meas for idepedet samples, both populatios must be bell-shaped OR both sample sizes must be large. I both cases, the samplig distributio of the sample statistic is approximately ormal, with mea = populatio parameter of iterest. For paired differeces: d mea = μ d, s.d.( d ) = (same as oe mea, but with d s) For differece i two meas: mea = μ 1 μ 2, s.d.( x1 x 2 ) = 2 1 1 2 2 2

Stadardized Statistics: For all 5 cases i Chapter 9, as log as the coditios are satisfied for the samplig distributio to be approximately ormal, the stadardized statistic for a sample statistic is: z sample statistic - populatio parameter s.d.(sample statistic) Note that the deomiator has s.d., ot s.e. For oe mea: z x x ( x ) s. d.( x)

Ex: Stat 7, Witer 2011, Hours of study per week for the class Speculatio over the log ru is that studets study a average of about 5 hours per week for Statistics 7. Have data from 264 studets i Witer 2011, with sample mea of 5.36 hours. Suppose for the populatio of all possible Statistics 7 studets (ot just i Witer 2011), populatio mea = μ = 5 hours a week, ad populatio stadard deviatio σ = 4 hours a week (they are ot bell-shaped defiitely skewed to the right). For survey, = 264. What are possible values of x (samplig distributio of x )? Approximately ormal Mea = 5 hours 4 sd..( x) 0.25 264

I our sample, = 264 ad x 5.36 5.36 5.36 Stadardized statistic for 5.36 hours is z = 1.44 0.25.25 If populatio mea is really 5 hours, with σ = 4 hours, how ulikely is a sample mea of 5.36 hours or more for = 264? Possible sample meas for 264 studets Normal, Mea=5, StDev=0.25 0.07493 5 5.36 Possible values of x-bar

How to compute this aswer if give μ, σ,, ad x : Sample meas for = 264 are: approximately ormal with mea of 5 hours ad s.d. of 4 hours. So, the stadardized score for 5.36 is: ( x ) 264(5.36 5) z 1.44 4 Area above z-score of 1.44 is about.075. So it is feasible that the true populatio mea for all Stat 7 studets (ot just Witer 2011) is ideed 5 hours.

Ukow Populatio Stadard Deviatio Whe σ is ot kow, we must use the sample stadard deviatio s istead. Stadard deviatio of x : Stadard error of x : s sd..( x) s. e.( x ) Major cosequece: Whe usig stadard error i situatios ivolvig meas, stadardized statistic has a t-distributio istead of a z- distributio; also called Studet s t distributio.

Studet s t distributio I 1908 William Sealy Gossett figured out the formula for the t distributio. Called Studet s t because explaied i class!

Stadardized Statistic Usig Stadard Error Usually we do t kow σ (populatio stadard deviatio), so we eed to use s (sample stadard deviatio). I that case, the stadardized statistic for x is t x x ( x ) se..( x) s/ s This has a Studet s t distributio with degrees of freedom = 1 It looks almost exactly like the ormal distributio It is completely specified by kowig the df It gets closer ad closer to the ormal distributio, ad whe degrees of freedom = ifiity, it is exactly the ormal distributio.

Compariso of t distributio with df = 5 ad stadard ormal distributio Stadard ormal distributio t distributio with df = 5-5 -4-3 -2-1 0 1 Stadardized statistic For example, middle 95% for t with df = 5 is 2.57 to +2.57 For stadard ormal, it is about 2 to + 2 I Chapter 11 (Moday) we will lear how to fid probabilities. 2 3 4

Summary of samplig distributios for the 5 parameters (p. 353): The statistic has a samplig distributio. It is approximately ormal if the sample(s) is (are) large eough. The mea of the samplig distributio = the parameter. The stadard deviatio of the samplig distributio is i the table below, i the colum stadard deviatio of the statistic. Sometimes it eeds to be estimated, the stadard error is used. Parameter Statistic Stadard Deviatio of the Statistic Oe proportio p pˆ p ( 1 p) Differece Betwee Proportios p (1 p1) p2 (1 p p 1 p 2 pˆ ˆ 1 p2 1 2 1 2 ) pˆ Stadard Error of the Statistic p ˆ(1 pˆ ) 1 (1 p1) p2 (1 p2 ) 1 ˆ ˆ 2 ˆ Stadardized Statistic with s.e. z z Oe Mea Mea Differece, Paired Data Differece Betwee Meas d x d 1 2 x1 x2 d s s d 2 2 2 2 1 2 s 1 s2 1 2 1 2 t t t

The three situatios ivolvig meas: Oe Mea Mea Differece, Paired Data Differece Betwee Meas Parameter Statistic Stadard Deviatio of the Statistic x d d d 1 2 x 1 x 2 Stadard Error of the Statistic s s d 2 2 2 2 1 2 s 1 s2 1 2 1 2 z or t? (with s.e.) t t t