Linear Combinations of Random Variables and Sampling (100 points)

Economcs 30330: Statstcs for Economcs Problem Set 6 Unversty of Notre Dame Instructor: Julo Garín Sprng 2012 Lnear Combnatons of Random Varables and Samplng 100 ponts 1. Four-part problem. Go get some coffee before startng these. a On the throw of a far de, the expected value of the number showng s 3.5 and the standard devaton s 1.71. What s the expected value and standard devaton of the sum of the values from the throw of a par of dce? Let D be the value on de, and T be the sum of both dce,.e. T = D 1 +D 2. Therefore, ET = ED 1 + ED 2 = 3.5 + 3.5 = 7 For the varance we know that both events are ndependent so CovD 1, D 2 = 0. By applyng the varance for lnear combnatons of r.v. covered n class you should know how to derve that by now, V art = V ard 1 + D 2 = 5.85 = V ard 1 + V ard 2 = 1.71 2 + 1.71 2 So the standard devaton s 2.42 5.85. b Suppose Y 1 and Y 2 are ndependent, VarY 1 = VarY 2 = σy 2 and Z 1 = Y 1 + Y 2. What s VarZ 1? How does ths compare to the result found n part a? If Y 1 and Y 2 are ndependent, then CovY 1, Y 2 = 0, and followng the same procedure as part a V arz 1 = V ary 1 + V ary 2 = 2σ 2 Y c Generalze the prevous results. Suppose Y 1, Y 2,, Y n are ndependent, VarY 1 = VarY 2 = = VarY n = σ 2 Y and Z 2 = Y 1 + Y 2 + + Y n. What s VarZ 2? V arz 2 = V ary 1 + Y 2 +... + Y n = V ary 1 + V ary 2 +... + V ary n = σ 2 Y + σ 2 Y +... + σ 2 Y = nσ 2 Y d Agan, let s keep generalzng these results. Suppose Y 1, Y 2,, Y n are ndependent, VarY 1 = VarY 2 = = VarY n = σy 2 and Z 2 = Y 1 + Y 2 + + Y n, and Z 3 = 1

1 n Y1 + Y 2 + + Y n. What s VarZ 3? [ ] 1 V arz 3 = V ar n Y 1 + Y 2 +... + Y n = 1 n 2 V ary 1 + 1 n 2 V ary 2 +... + 1 n 2 V ary n = 1 n 2 σ2 Y + 1 n 2 σ2 Y +... + 1 n 2 σ2 Y 1 = n n 2 σ2 Y = σ2 n Congratulatons, you just derved the formula for the varance of the sample mean! 2. In 2003, the average annual salary 10 years after graduaton was $168,000 for men and $117,000 for women. The standard devaton for male graduate salary s $40,000 and for female salares s $25,000. a What s the probablty that a random sample of 40 males wll gve a sample mean wthn $10,000 of $168,000? Usng the CLT, the samplng dstrbuton of x s, x N 168000, 400002 40 Hence the probablty can be obtan as usually, 158000 168000 P 158000 x 178000 = P 40000/ 178000 168000 z 40 40000/ 40 = P 1.58 z 1.58 = 1 2Φ 1.58 = 1 20.0571 = 0.8858 b What s the probablty that a random sample of 40 females wll gve a sample mean wthn $10,000 of $117,000? Now the samplng dstrbuton s gven by x N 117000, 250002 40 107000 117000 P 107000 x 127000 = P 25000/ 127000 117000 z 40 25000/ 40 = P 2.52 z 2.52 = 1 2Φ 2.52 = 1 20.00059 = 0.9882 c What do you prefer: graffes or rhnos? Rhnos. 2

d What s the probablty that a random sample of 100 males wll gve a sample mean less than $164,000? The samplng dstrbuton s x N 168000, 400002 100 P x 164000 = P z = P z 1 = Φ 1 = 0.1587 164000 168000 40, 000/ 100 3. Suppose n a populaton the math M and verbal V SAT scores have the followng moments: EM = 510, EV = 475, VarM = 750, VarV = 610, and ρ M,V = 0.4. What s the expected value and varance of the total SAT, T = M + V? ET = EM + V = EM + EV = 510 + 475 = 985 V art = V arm + V arv + 2CovM, V = σ 2 M + σ 2 V + 2ρ MV σ M σ V = 750 + 610 + 20.4 750 610 = 1901.2 4. A random sample of sze N s selected from a populaton wth σ = 10. What s the standard error of the mean f a N = 500? b N = 5, 000? σ x = σ x = σ n = 10 500 = 0.447 σ n = 10 5000 = 0.1414 c N =? Compare ths result wth the prevous parts. As n, σ x 0. Ths was llustrated n the parts a and b, where the standard error decreased as the sample sze ncreased. Put dfferently, our estmate for the populaton mean becomes more precse as the sze of the sample s ncreases. 3

5. Fnal grades n a class are a weghted average of the mdterm 25% and fnal 75% exams. Each exam has 100 possble ponts. Suppose the average and standard devaton of scores on the mdterm were 71 and 19 respectvely, whle the values for the fnal exam were 69 and 23. Suppose further that the correlaton coeffcent between the two exams s 0.50. a What s the mean and standard devaton on the fnal grades n class? We have that G = 0.25M + 0.75F, hence the expected value s gven by Smlarly, for the varance, EG = E0.25M + 0.75F = 0.25EM + 0.75EF = 0.25µ M + 0.25µ F = 0.2571 + 0.7569 = 69.5 V arg = 0.25 2 σ 2 M + 0.75 2 σ 2 F + 20.250.75ρ F,M Although we do not know CovF, M, we know that σ F M = ρ F M σ F σ M. Therefore we can substtute for those values: σ 2 G = 0.25 2 19 2 + 0.75 2 23 + 20.250.750.51923 = 402 Whch gves us a standard devaton of σ G = σ 2 G = 402 = 20.049. b Suppose the fnal grades are normally dstrbuted wth mean and varance found n part a. What fracton of students wll get an A f they need more than 93 ponts to obtan that grade? Snce a lnear combnaton of r.v. that are normally dstrbuted s also normally dstrbuted, we have that the dstrbuton of fnal grades, X, s gven by X N 69.5, 402 The fracton can be, hence, calculated as a probablty: P x 93 = 1 P x 93 93 69.5 = 1 P z 402 = 1 Φ1.172 = 1 0.8790 = 0.1210 12.1% of students wll get an A n ths class f 93 s the score needed for that. 6. Show that the sample varance s an unbased estmator of the populaton varance. We want to show that Es 2 = σ 2, where s 2 = x x 2 n 1. 4

[ Es 2 = E x x 2 ] n 1 [ ] = n 1Es 2 = E x x 2 [ ] = E x 2 2x x + x 2 [ = E x 2 2 x x + [ ] = E x 2 2n x 2 + n x 2 = E = [ x 2 ] 2nE x 2 Ex 2 2nE x 2 x 2 ] Recall that V arx = E[X 2 ] [EX] 2 = E[X 2 ] = V arx + [EX] 2. It follows that, E[x 2 ] = V arx + [Ex ] 2 = σ 2 + µ 2 because V arx = σ 2 and Ex = µ. Smlarly, E[ x 2 ] = V ar x + [E x] 2 = σ2 n + µ2 because V ar x = σ2 n and E x = µ. σ σ 2 + µ 2 2 2n n + µ2 = n 1Es 2 = σ = nσ 2 + µ 2 2 2n n = Es 2 = σ 2 + nµ 2 = nσ 2 + nµ 2 2n σ2 n nµ2 = n 1σ 2 Therefore, s 2 s an unbased estmator of the populaton varance. 7. A pant manufacturer advertses that ther exteror pant wll last 5 years. Assume pant lfe s normally dstrbuted wth a standard devaton of 0.5 years. a Suppose a local TV reporter tests ths clam, pants one house, and notces that the house pant only lasts 4.5 years. Would you consder ths evdence aganst the manufacturer s clam? We frst need to descrbe the samplng or probablty dstrbuton of the sample mean: x N 5, 0.15/ 1 5

Now we are ready to obtan the probablty: P x 4.5 = P z 4.5 5 0.5/ 1 = P z 1 = Φ 1 = 0.1587 There s a 15.85% chance that the pant on one one house, chosen randomly, wll last less than 4.5 years. Ths s not strong evdence aganst the manufacturer. b Suppose nstead of the TV report s test, a consumer magazne pants 10 houses and fnds the average lfe of the pant s 4.5 years. Would you consder ths evdence aganst the manufacturer s clam? Snce the sample sze changed, so dd the samplng dstrbuton whch s now more precse gven the reducton n the standard error, σ x, due to the larger sample. We have that x N 5, 0.15/ 10 P x 4.5 = P z 4.5 5 0.5/ 10 = P z 3.16 = 0.0007 The manufacturer s clam s probably false, snce t s very unlkely the pant on 10 houses, selected at random, wll last less than 5 years. 8. A water bottler sells sprng water n 1 lter bottles. The machnes are set such that on average, 1.02 lters are dspensed wth a known standard devaton of 0.06 lters. The frm routnely collects a random sample of bottles and tests whether the machne s dspensng correctly. Suppose the machne s workng properly µ and σ are known. What s the chance that n a random sample of 16 bottles, the sample mean wll be wthn 0.05 lters of the specfed level? We know that accordng to the central lmt theorem, x s normally dstrbuted wth a mean of µ and a varance of σ 2 /n. When the machne s workng properly, µ = 1.025 and σ = 0.06. In ths case, n = 16 so n = 4. In other words, by the CLT, x N 1.05, 0.06 2 /16. P 0.97 x 1.07 = P z 1.07 1.05 0.06/4 = Φ3.33 Φ 3.33 = 0.9996 0.004 = 0.9992 P z 0.97 1.05 0.06/4 9. A student surveys 60 undergraduates to determne the average number of drnks consumed over the past two weeks x. Based on prevous surveys, the student beleves the standard devaton of drnks per week n the populaton s 7. The researcher would lke to reduce the standard error of by ncreasng the sample sze. 6

a If the student s belefs are correct, what wll be the standard error of x? σ 2 x = σ n = 7 60 = 0.904 b How many students would the student have to survey to cut the standard error of n half? We would lke to reduce the standard error to 0.452 and snce the populaton standard devaton remans the same we have that 7 n = 0.452 So that 7/ n = 0.452 = n = 7/0.452 = n = 7/0.452 2. Hence, n = 239.8 The researcher wll have to survey at least 240 students to cut the standard error n half. c How many students would the research have to survey to cut the standard error by 75%? The reducton of 75% s equvalent to sayng that the new standard error has to be 25% of the orgnal value, or 0.250.904 = 0.226. As n part b, 7/ n = 0.226 = n = 7/0.226 = n = 7/0.226 2. Hence, n = 959.35 The student needs to survey at least 960 students to cut the standard error by 75%. 10. The covarance between x and y represents the average of the products of the devatons of x and y from ther respectve means. In other words, t represents the average of the sum of the cross-products. Show that the sum of cross-product can be wrtten ether as x x y or y ȳ x. x xy x = = = = x y x ȳ xy + xȳ x y ȳ x y + x y ȳ n xȳ + n xȳ x y ȳ xȳ 7

Extra Problems 1. In 2000, a Tme/CNN polled 589 voters. If the populaton proporton for a canddate s p = 0.5 and p s the sample proporton of lkely voters that favor ths canddate, a Show the samplng dstrbuton of p σ 2 p = σ2 p1 p = = 0.50.5 n n 589 = p N 0.5, 0.00042 = 0.00042 b What s the probablty that the poll wll provde a sample proporton wthn ±0.04 of the populaton proporton? 0.4 P 0.5 0.4 p 0.5 + 0.4 = P σ/ n z 0.4 σ/ n = P 1.942 z 1.942 = 1 2P z 1.942 = 1 2Φ 1.942 = 1 20.0262 = 0.9476 2. A random sample of sze N = 1, 000 s drawn from a populaton wth p = 0.4, where p s the proporton of the populaton that has a certan characterstc. a What s the expected value of p and ts standard error? b Illustrate the samplng dstrbuton of p. E p = p = 0.4 p1 p σ p = n 0.40.6 = 1000 = 0.01549 p N 0.4, 0.01549 2 8