MgtOp 5 Chapter 3 Dr Ahn Consder two random varables X and Y wth,,, In order to study the relatonshp between the two random varables, we need a numercal measure that descrbes the relatonshp The covarance between two random varables whch are observed n pars s such measure, and s defned as follows, Notet that mples, on average,, that s, that s, values of X larger (smaller) than ts mean tend to be assocated wth values of Y larger (smaller) than ts mean Also mples, on average,, that s, values of X larger (smaller) than ts mean tend to be assocated wth values of Y smaller (larger) than ts mean Therefore, a postve covarance mples a postve relaton between two random varables, and a negatve covarance a negatve relaton The sample covarance s when a random sample conssts of n pars of observatons, for =,,, n Example Fnd the sample covarance between the square footage n thousands (X) and the annual sales n mllons of dollars (Y) 7 37 69 6 39 64 3 8 67 876 4 56 95 53 5 3 34 44 6 56 3 7 3 37 48 8 7 97 9 3 55 76 5 9 435 5 7 5564 46 76 3496 3 58 8 6844 4 3 4 3 49 88 33
C omework: Do the followng problem (You may use MS Excel) Problem Usng the data n the above example, obtan the covarance between the square footage and the annual sales n Euros and compare t wth the covarance obtaned n Example To compute the sample covarance usng MS Excel, do Tools>Data Analyss>Covarance, and enter the range of x- and y- varables For the data n Example we get X Y X 7887 Y 453367 8353878 Note: Unfortunately, ths verson of MS Excel used n (sample sze) as denomnator nstead of n Therefore, we need to multply the numbers by n (n ths example 4) and dvde by n n order to get the correct covarance and varances: X Y X 9798 Y 48739 8996484 There are two Excel functons COVARIANCEP and COVARIANCES for the populaton and the sample covarance, respectvely To make a scatter plot of the above data, hghlght the data and Chart Wzard>XY (Scatter) Then for the above data you wll get 4 Y 8 6 Y 4 3 4 5 6 7 As you wll see n Problem, the magntude of covarance depends on the unts of varables quoted Therefore, the magntude of varable cannot effectvely represent the strength of the relatonshp
The correlaton coeffcent between two random varables X and Y s,, It turns out the correlaton coeffcent s the covarance between the standardzed varable of X and the standardzed varable of Y Snce the standardzed varables are unt free, so s the correlaton coeffcent The sample correlaton coeffcent s Example Usng the data n Example, fnd the sample correlaton coeffcent To compute the sample covarance usng MS Excel do Tools>Data Analyss>Correlaton, and enter the range of x- and y- varables For the above data we get X Y X Y 95883 You may use the Excel functon CORREL C omework: 346 on p 37 and do the followng problem (You may use MS Excel) Problem Usng the data n Example, obtan the correlaton between the square footage and the annual sales n Euros and compare t wth the correlaton obtaned n Example MSL omework: 344 on p 36 (You may use MS Excel) Propertes of the correlaton coeffcent represents a perfect postve lnear assocaton 3 represents a perfect negatve lnear assocaton 4 closer to represents stronger lnear assocaton 5 represents no lnear assocaton, but the varables can have some relaton such as quadratc relaton MSL omework: 3, 3, 33 When two varable are correlated, we are often nterested n fndng the precse relatonshp usng a mathematcal model and n predctng one varable of man nterest, whch s called the dependent varable or response varable and denoted by Y, usng the other varable, whch s called the ndependent varable or predctor varable and denoted by X There are two types of relatonshp consdered n ths chapter Determnstc relatonshp: each value of X s pared wth one and only one value of Y, and Y can be predcted wth certanly for a gven value of X 3
Stochastc (Statstcal) relatonshp: each value of X s assocated wth a whole probablty dstrbuton of values of Y, and Y cannot be predcted wth certanly for a gven value of X, but the knowledge of a value of X helps n predctng Y A smple and popular mathematcal model to descrbe a stochastc relatonshp s the followng Smple Lnear Regresson Model, where Y s the dependent varable and X s the ndependent varable, and the random varable s called the error satsfyng and The purposes of regresson analyss nclude ) to better understand the relatonshp between the dependent varable and the ndependent varable(s) through mathematcal models; and ) to predct the values of the dependent varable wth gven values of the ndependent varables The assumptons about the n turn yeld, that s, the mean of Y s a lnear functon of X and thus the knowledge about the value of X s useful n predctng the mean value of Y (as well as ndvdual value of Y) The lne equaton s called the regresson lne (and n general regresson functon) They also yeld, whch means the varablty of Y s constant regardless of the level of X Fore regresson analyss we have a random sample of n pars of observatons, for =,,n and for these observatons we have, where the are ndependent In the above smple lnear regresson model we have three (unknown) parameters that need to be estmated They are and, whch are called the model parameters, and, whch s the varance of the error Estmaton of these parameters are done by the method of least squares, whch, roughly speakng, fnds a ftted lne that goes through the ponts on a scatter plot such that the lne s as "close" as possble to all the ponts The least squares estmators of and, denoted by and, respectvely are Note that and If we replace the unknown parameters wth ther estmates, we obtan the estmated regresson lne, also called, ftted lne Techncally and are chosen to mnmze, the resdual sum of squares 4
ˆ b b X Y MSL omework: 35 C omework: 39 (Use Excel/PStat) For the -th observaton ( X, Y ), we can obtan the correspondng ftted value Yˆ b b X The dfference between the observed value Y and the ftted value Yˆ, that s, Y Yˆ s called the resdual e Y Yˆ One of the propertes of the resduals s e n The least squares estmator of, denoted by s n ˆ e, n whch s also called the mean squared error : Ths measures the proporton of the total varaton n Y (SST) explaned by the regresson model(ssreg), and s an overall measure of goodness of ft For regresson analyss usng MS Excel do Tools>Data Analyss>Regresson, and enter the range of y- and x- varables If you want confdence ntervals other than 95%, check off Confdence Level and enter the confdence level For the data n Example, we get an output on the next page 5
SUMMARY OUTPUT Regresson Statstcs Multple R 95883 R Square 9479 Adusted R Square 89694 Standard Error 96638 Observatons 4 ANOVA df SS MS F Sgnfcance F Regresson 57476 57476 3335 83E-7 Resdual 668 93389 Total 3 69543 Coeffcents Standard Error t Stat P-value Lower 95% Upper 95% Lower 9% Intercept 964474 5693 8397 977-83 954 664589 X 66986 5695 64 8E-7 37953 7733 39767 MSL omework: 3, 3, 37 C omework: 3 (Use Excel/PStat) ( )% confdence nterval for b t s e( ) /, n b ( )% confdence nterval for b t s e( ) /, n b Testng hypothess about the regresson slope coeffcent : : : : : : b Test statstc: T s e( b ) Reect, f t t t t, n, n t t, n p-value P( T t ) P( T t) P( T t) 6
Testng hypothess about the regresson ntercept coeffcent Replace wth on the prevous page Note the degrees of freedom n regresson analyss s the number of observaton mnus the number of parameters n the model MSL omework: 34, 34, 343, 349 C omework: 347 (Use Excel/PStat) One of the purposes of regresson analyss s to predct the mean value of Y, that s E(Y) gven a value of X, say x A pont estmate of E(Y) gven a value x s Yˆ ( x x) b b X wth standard error ˆ h, where h s called the leverage n ( n ) sx Therefore the ( )% confdence nterval for E(Y) s Yˆ t ˆ, n h Note n ( ) s x SSX Another purpose of regresson analyss s to predct an ndvdual value of Y gven a value of X, say, x Then, a pont estmate s,agan, Yˆ b b X wth standard error ˆ h Therefore ( )% confdence nterval for an ndvdual value of Y, whch s often called the predcton nterval, s Yˆ t ˆ h, n Example 3 Suppose n Example, you want to estmate the mean annual sales of all stores wth the sze of 5 thousand square feet Then t s 9645+66995=934 (mllon dollars) Notng that the sample mean and standard devaton of X-varable are 94 and 78, respectvely, (whch are also computed from MS Excel) we calculate the standard devaton for as 9664 85 46 notng that the leverage s (5 94) 854 And the standard devaton for the ndvdual value s 4 (4 ) (78) 9664 854 5 Snce t 788, to get the 95% confdence nterval for E(Y) at x=5, we compute 5, Yˆ t, n ˆ h 9335 788 46 9335 966 To get the 95% confdence nterval for Y at x=6, we compute Yˆ t ˆ h 9335 788 5 9335 95, n The above 95% confdence nterval for E(Y) at x=5 s nterpreted as Wth 95% confdence the mean annual sales of all stores wth the sz of 5 thousand square feet s between $8469 mllon and $ mllon The above 95% predcton nterval for Y at x=5 s nterpreted as Wth 95% confdence the annual sales of a store wth thte szs of 5 thousand square feet s between $7 mllon and $66 mllon 7
MSL omework: 357 C omework: 3 6 (Use Excel/PStat) When there are more than one ndependent varable are consdered, we have the multple regresson model For example f two ndependent varables, X and W are consdered, the model s Y X W Statstcal nference of the multple regresson model wll be dscussed n MgtOp 4: Statstcal Methods for Management or ECONS 3: Introductory Econometrcs Example: In fnance, t s of nterest to look at the relatonshp between a stock s average return n percent (Y) and the overall market return n percent (X) From randomly selected stocks we obtan the followng data Obs market return (X) stock s return (Y) 37 5 7 3 86 4 6 5 5 34 9 6 39 7 7 8 8 3 4 9 6 3 4 6 3 97 7 73 Fnd the sample correlaton coeffcent between X and Y ow would you decde f a smple lnear regresson model s approprate for the relatonshp between X and Y? 3 If a smple lnear regresson model s ndeed approprate for the relatonshp, fnd the estmated regresson lne 4 Fnd the predcted average return of a stock wth the overall market return of 3% 5 Is there strong evdence that the average return of a stock s lnearly related to the overall market return? Justfy your answer 6 Fnd a 95% confdence nterval for the slope parameter Note n fnance the slope coeffcent s called the stock s beta by nvestment analysts 7 A beta greater than one ndcates that the stock s relatvely senstve to changes n the market, whle a beta less than one ndcates that the stock s relatvely nsenstve For the data analyzed, test f the estmated beta s sgnfcantly greater than one Use =5 8 Fnd an estmate for the varance of the error n the smple lnear regresson model 9 Fnd the 95% confdence nterval for the mean of the average returns of stocks wth the market return of 8% and nterpret the CI Fnd the 95% predcton nterval for the average return of a stock wth the market return of 8% and nterpret the CI Fnd a 9% confdence nterval for the mean of the average returns of stocks wth the market return of 8% Fnd a 9% predcton nterval for the average return of a stock wth the market return of 8% 8