Which of the following provides the most reasonable approximation to the least squares regression line? (a) y=50+10x (b) Y=50+x (d) Y=1+50x

Whch of the followng provdes the most reasonable approxmaton to the least squares regresson lne? (a) y=50+10x (b) Y=50+x (c) Y=10+50x (d) Y=1+50x (e) Y=10+x In smple lnear regresson the model that s begn assumed relates the dependent varable, y, to the ndependent varable,x, accordng to the followng relatonshp: y x For testng hypothess about based on the least squares estmates, t s necessary to the followng assumptons about the errors: (a) They have mean of zero (b) They are normally dstrbuted (c) They have a common varance (d) All of the above (e) Least squares s purely mathematcal technque so no assumptons are requred. 1

A regresson of the amount of calores n a servng of breakfast cereal versus the amount of fat gave the followng results: Calores =97.1053+9.6535Fat. Whch of the followng s FALSE: (a) It s estmated that for every addtonal gram of fat n the cereal, the number of calores ncreases by about 9. (b) It s estmated that n cereals wth no fat, the total amount of calores s about 97. (c) If a cereal has g of fat, then t s estmated that the total number of calores s about 115. (d) If a cereal has about 145 calores then ths equaton ndcates that t has about 5 grams of fat. (e) One cereal has 140 calores and 5 g of fat. Its resdual s about 5cal. The age and salary of the chef executve offcers (CEOs) of small companes were determned. Small companes were defned as those wth annual sales greater than fve and less than $350 mllon. Companes were ranked accordng to 5-year average return on nvestment. Ths data covers the frst 60 ranked frms. Varables: Age The age of the CEO n years. Salary The salary of chef executve offcer (ncludng bonuses) n thousands of dollars.

The data gve the average prce receved by fshermen and vessel owners for several speces of fsh n 1970 and 1980. Varables: Fsh The speces of fsh. Prce.1970 Prce n cents per pound n 1970. Prce.1980 Prce n cents per pound n 1980. Statstcs 111 - Lecture 17 Testng Relatonshps between Varables 3

Example: Educaton and Mortalty T = b SE(b) = 37.6 8.307 = 4.53 Confdence Intervals for Coeffcents JMP output also gves the nformaton needed to make confdence ntervals for slope and ntercept 100 C % confdence nterval for slope : b t SE(b) The multple t * comes from a t dstrbuton wth n- degrees of freedom 100 C % confdence nterval for ntercept : a t SE(a) Usually, we are less nterested n ntercept but t mght be needed n some stuatons 4

Confdence Intervals for Example We have n = 60, so our multple t * comes from a t dstrbuton wth d.f. = 58. For a 95% C.I., t * =.00 95 % confdence nterval for slope : b t SE(b) = ( 37.6.0 8.31) = ( 54., 1.0) Note that ths nterval does not contan zero! 95 % confdence nterval for ntercept : a t SE(a) = (1353.0 91.4) = (1170,1536) Another Example: Draft Lottery Is the negatve lnear assocaton we see between brthday and draft order statstcally sgnfcant? T = b SE(b) = 0.6 0.051 = 4.4 p-value 5

Another Example: Draft Lottery p-value < 0.0001 so we reject null hypothess and conclude that there s a statstcally sgnfcant lnear relatonshp between brthday and draft order Statstcal evdence that the randomzaton was not done properly! 95 % confdence nterval for slope : b t SE(b) = ( 0.3 ± 1.98 0.05) = ( 0.33, 0.13) Multple t * = 1.98 from t dstrbuton wth n- = 363 d.f. Confdence nterval does not contan zero, whch we expected from our hypothess test Educaton Example Dataset of 78 seventh-graders: relatonshp between IQ and GPA Clear postve assocaton between IQ and grade pont average 6

Educaton Example Is the postve lnear assocaton we see between GPA and IQ statstcally sgnfcant? T = b SE(b) = 0.101 0.0141 = 7.14 p-value Educaton Example p-value < 0.0001 so we reject null hypothess and conclude that there s a statstcally sgnfcant postve relatonshp between IQ and GPA 95 % confdence nterval for slope : b t SE(b) = (0.101 ± 1.99 0.014) = (0.073, 0.19) Multple t * = 1.98 from t dstrbuton wth n- = 76 d.f. Confdence nterval does not contan zero, whch we expected from our hypothess test 7

What s RMSE? Root mean square error The RMSE s the estmated standard devaton of the resduals. Remember that we assumed that the errors are normally dstrbuted wth mean 0 and varance RMSE s the estmated for Why? RMSE n 1 ( y yˆ ) n n 1 ( e 0) n Root mean square error Recall how we estmated the sample standard devaton : s n 1 ( y y) n 1 The sample standard devaton s a measure of how much the values devate from the mean value. It s the average devaton from the average. 8

Root mean square error RMSE n 1 ( y yˆ ) n In a regresson each y has a dfferent mean (the lne value s the mean value). So we need to subtract from each y the lne value, yˆ a b x at that pont. That s the devaton of each pont from the mean value. Then we square t. Why? Then we sum t over all the observatons and dvde by n- That gves us the average devaton from the lne RMSE! Predcton Interval We want to predct a future value of the response y: 1. We can provde the average predcted value usng the lne. We can provde the 95% predcton nterval. The predcton nterval provdes an nterval where 95% of the data values wll le n. yˆ a b x yˆ RMSE, yˆ RMSE Ths s an approxmaton to the true nterval but t s a good one! 9

Predcton Interval What s the predcted GPA value for a person wth IQ of 110? What s the 95% predcton nterval? Example! Lets look at some examples! Housng prce data Undergraduate tuton Physcs scores 10