APM 63 Regresson Analyss Project Transformaton and Weghted Least Squares. INTRODUCTION Yanjun Yan yayan@syr.edu Due on 4/4/5 (Thu.) Turned n on 4/4 (Thu.) Ths project ams at modelng the peak rate of flow Q of water from sx watersheds followng storm epsodes. The storm epsodes have been chosen from a large data set to gve a range of storm ntenstes. The ndependent varables used n ths study nclude the area of watersheds (m) X, the average slope of watershed (%) X 3, the estmated sol storage capacty (nches of water) X 6, the ranfall (nches) X 8 and the tme perod durng whch ranfall exceeded ¼ nch/hour X 9. From the scatter plots of the dependent varable Q wth each ndependent varable, certan nonlnear patterns may be observed. Therefore the logarthm transform s mplemented to make the transformed varables n better lnear relatonshp. If wthout the logarthm transform, the ordnary least square (OLS) modelng on the raw data may volate the assumptons of normalty or heteroscedastcty based on the resdual analyss results. If wth the logarthm transform, both the OLS and the weghted least square (WLS) modelng are ftted on the transformed varables to compare the dfference between the OLS and WLS. Resdual analyss plays an mportant role n comparng the OLS and WLS models. Now that the varable n concern s Q tself, but not ts logarthm transform, the model of the logarthm transformed varables needs nverse logarthm transform and further explanaton to comprehend the physcal relaton between the orgnal dependent varable Q and the ndependent varables X s.. METHODS, RESULTS AND DISCUSSION () Show basc statstcal descrptve parameters and show the scatter plots for the relatonshps between Q and X, X 3, X 6, X 8 and X 9. By Proc CORR, the smple statstcs and the correlaton between the varances can be obtaned as shown on the next page. The SAS output shows the mean, summaton, the standard devaton, mnma, maxma of all the varables n concern and the correlaton between them. Proc GPLOT or PLOT may produce the scatter plots, but the plots are bg and are not very clear f they are coped nto ths document. In order to save space and acheve better mage qualty, the scatter plots are generated by the MATLAB scrpt as shown n Fgure.
After studyng Fgure and observng the correlaton matrx agan, Q seems to be hghly correlated to X, X 8 and X 9, whch s consstent wth the scatter plots. Further from the scatter plots, we can see that there s roughly a curvlnear relatonshp between Q and the selected X s such as X 8 and X 9. And the varaton of Q for dfferent range of certan X seems to be dfferent, especally on the scatter plot between Q and X, X 3 and X 6. Ths pattern ndcates that we should use some transform to lnearze the relatons. Smple Statstcs Varable N Mean Std Dev Sum Mnmum Maxmum Q 3 9 97 38737 8. 479 X 3.43.78337 7.96.3 7. X3 3 7.5 4.6 5. 3. 5. X6 3.3.5669 39..5. X8 3.837.657 85..75 5.5 X9 3 3.6333.57775 94.9.7 6.5 Pearson Correlaton Coeffcents, N = 3 Prob > r under H: Rho= Q X X3 X6 X8 X9 Q..7834.54.456.3337.858 <..77.83.79.33 X.7834. -.785.79.795.986 <..688.5378.368.7 X3.54 -.785..466 -.73 -.886.77.688.449.79.4974 X6.456.79.466..758 -.35.83.5378.449.79.9484 X8.3337.795 -.73.758..88745.79.368.79.79 <. X9.858.986 -.886 -.35.88745..33.7.4974.9484 <.
5 Scatter Plots of Q vs X's 5 5 4 4 4 3 3 3 Q Q Q 4 6 8 5 5.5.5 X X 3 X 6 5 5 4 4 3 3 Q Q 3 4 5 X 8 4 6 X 9 Fgure. Scatter Plot between the dependent varable Q and several ndependent varables X s. () Apply Ordnary Least Squares (OLS) to the model [] as follows and conduct a resdual analyss on ths model: [] Q= β + β *X + β *X 3 + β 3 *X 6 + β 4 *X 8 + β 5 *X 9 + e. Model [] OLS resdual - - 3 Predcted Q Scatter Plots of Model [] OLS resduals vs Q & X's Model [] OLS resdual - - 4 6 8 X Model [] OLS resdual - - 5 5 X 3 Model [] OLS resdual - -.5.5 Model [] OLS resdual - - 4 Model [] OLS resdual - - 4 6 X 6 X 8 X 9 Fgure. Scatter Plot between the resdual and Q or X s. 3
In constructng the model [], there are several assumptons as lsted below. If these assumptons are volated, the lnear regresson fttng or testng may not be vald any more. Thus people have proposed some methods to fx the volaton of these assumptons.. Normalty. The resduals are assumed to be normally dstrbuted. It s not necessary for estmaton of the regresson parameters, but the normalty s needed for tests of sgnfcance and constructon of confdence nterval estmates of the parameters. If the normalty condton s volated, the parameter estmates are stll the best lnear unbased estmates f other assumptons are met, but the probablty levels assocated wth the tests of sgnfcance or the confdence coeffcents wll not be correct. Transformaton of the dependent varable to a form that s more nearly normally dstrbuted s the usual recourse to non-normalty.. Homogenety. The assumpton of common varance plays a key role n ordnary least squares. All observatons n OLS receve the same weght. But f the varance s dfferent for dfferent observaton, a ratonal use of the data would requre that more weght be gven to those observatons that contan the most nformaton. The drect mpact of heterogeneous varances n OLS s a loss of precson n the estmates compared to the precson that would have been realzed f the heterogeneous varances had been taken nto account. Both the transformaton of dependent varable and the use of weghted least squares (WLS) can help handle the heterogeneous varances. 3. Uncorrelated errors. The assumpton on un-correlaton s easer for mathematcal trackng, but the correlaton among resduals occur from many sources. The mpact of correlated errors on the OLS s loss n precson n the estmates, smlar to the heterogeneous varances. The remedy to the problem of correlated errors s to utlze a model that takes nto account the correlaton structure n the data such as the Generalzed Least Squares. The OLS regresson on model [] s constructed by Proc REG wth opton SPEC. The scatter plot for the resduals wth Q and X s are smlarly generated by MATLAB as shown n Fgure. The OLS results are as followng: The REG Procedure Model: MODELOLS Dependent Varable: Q Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 356444 63888 5.5 <. Error 4 3693 468 Corrected Total 9 456835 Root MSE 645.664 R-Square.7593 Dependent Mean 9.3333 Adj R-Sq.79 Coeff Var 49.99998 4
Parameter Estmates Parameter Standard Varable DF Estmate Error t Value Pr > t Intercept -48.7655 447.64478 -.9.378 X 345.963 44.885 7.7 <. X3 8.956 3.347.74.3 X6-74.59 9.3965 -.5.35 X8 478.683.77.8.3 X9-4.5589 7.975 -.4.7 Test of Frst and Second Moment Specfcaton DF Ch-Square Pr > ChSq 9.59.4836 Model [] s now ftted by OLS as followng: Q = -48.7655 + 345.963 * X + 8.956 * X3 + (-74.59) * X6 + 478.683 * X8 + (-4.5589) * X9. (eq. ) From the ANOVA tself, we can not see clearly whether there s any assumpton volaton or model nadequacy, snce the model fttng doesn t check those assumptons. If the data satsfy all the model assumptons, then we can be confdent n the t tests lsted n the ANOVA. But f these assumptons are volated, those tests wll be msleadng. Ths s why the resdual analyss s mportant to check those assumptons. From the SPEC opton, the probablty of non-constant varance test s.4836. The null hypothess s of constant varance. If the sgnfcance level s set to be.5, then the null hypothess of constant varance can not be rejected. Therefore the varance for dfferent range of X s may not dffer too much, thus we can use the common varance of a certan range of X s n later WLS analyss. Further the Proc UNIVARIATE s used to analyze the resdual statstcs and check ts normalty. Even though t s notable that the skewness of the resdual s -.664666, and the kurtoss s.473, nether of whch s small, t s also notced that none of the normalty tests are sgnfcant, and the normalty plot of the resduals s mostly concdent wth the deal normalty lne except the left-lowest pont. Therefore the resduals are stll mostly normal. Meanwhle consderng the fact that the t test s knd of robust to the non-normalty assumpton, currently the most prortzed task s to lnearze the relaton nstead of normalze the resdual. Moments N 3 Sum Weghts 3 Mean Sum Observatons Std Devaton 587.38649 Varance 344954.94 Skewness -.664666 Kurtoss.473 Uncorrected SS 3693.3 Corrected SS 3693.3 Coeff Varaton. Std Error Mean 7.35Tests for Normalty 5
Test --Statstc--- -----p Value------ Shapro-Wlk W.96879 Pr < W.493 Kolmogorov-Smrnov D.89847 Pr > D >.5 Cramer-von Mses W-Sq.4564 Pr > W-Sq >.5 Anderson-Darlng A-Sq.776 Pr > A-Sq >.5 Normal Probablty Plot + ++* *+*+ +*+ ++** ****** ***+ ***** -3+ **++ +**+ *+*+* +*++ +++ +++ ++ -7+ * +----+----+----+----+----+----+----+----+----+----+ - - + + (3) Take natural logarthm transforms for all varables (Q, X ~X 9 ). Compute the correlaton matrx usng the transformed varables. Whch varables are most lkely to contrbute sgnfcantly to the varaton n ln(q)? Are there hghly correlated ndependent varables? The opton CORR n Proc REG s called to compute the correlaton matrx between the logarthm transformed varables. The correlaton matrx s shown on next page. From the correlaton matrx, t can be seen that LnQ s hghly correlated wth LnX wth the correlaton coeffcent as hgh as.94. Besdes ths, LnQ s also strongly correlated wth LnX3 wth the correlaton coeffcent at.643. Except these two, the correlatons between LnQ and other LnX s are not very sgnfcant: The correlaton coeffcent between LnQ and LnX6 s.648. The correlaton coeffcent between LnQ and LnX8 s.94. The correlaton coeffcent between LnQ and LnX9 s.385. Therefore LnX s most lkely to contrbute sgnfcantly to the varaton n LnQ, LnX3 s second only to LnX and t may also contrbute a lot on the varaton of LnQ. Among the ndependent varables themselves, LnX8 and LnX9 are hghly correlated wth the correlaton coeffcent as hgh as.863. LnX and LnX3 are moderately correlated wth the correlaton coeffcent at.574. The correlaton between other pars s not very bg. 6
Correlaton Varable LNX LNX3 LNX6 LNX..574.48 LNX3.574..643 LNX6.48.643. LNX8.94 -.974.35 LNX9.385 -.694 -. LNQ.94.643.54 Correlaton Varable LNX8 LNX9 LNQ LNX.94.385.94 LNX3 -.974 -.694.643 LNX6.35 -..54 LNX8..863.49 LNX9.863..55 LNQ.49.55. (4) Apply OLS to the model [] and Conduct a resdual analyss for ths model. [] LnQ= β + β *LnX + β *LnX 3 + β 3 *LnX 6 + β 4 *LnX 8 + β 5 *LnX 9 + e Model [] dffers from model [] n that t uses all the logarthm transformed varables. The scatter plot of the LnQ v.s LnX s s shown n fgure 3 on next page, the relatonshps between LnQ and LnX s seems to be more lnear than the relatonshps between Q and X s n fgure. Wth the smlar procedure to the resdual analyss as n step () on model, the Proc REG wth opton SPEC generates the followng results: The REG Procedure Model: modelols Dependent Varable: LNQ Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 7.848 4.5696 34.56 <. Error 4.773.466 Corrected Total 9 7.3955 7
Root MSE.484 R-Square.9845 Dependent Mean 6.36677 Adj R-Sq.983 Coeff Var 3.37437 Parameter Estmates Parameter Standard Varable DF Estmate Error t Value Pr > t Intercept 5.8759. 6.68 <. LNX.688.343 9.7 <. LNX3.3757.9489 3.95.6 LNX6 -.33576.85-4..4 LNX8.7499.6768.9 <. LNX9 -.45653.4975-9.73 <. Test of Frst and Second Moment Specfcaton DF Ch-Square Pr > ChSq.63.898 Scatter Plots of LnQ vs LnX's 8 8 8 LnQ 6 LnQ 6 LnQ 6 4 4 4-4 - 3 - LnX LnX 3 LnX 6 8 8 LnQ 6 LnQ 6 4 4.5.5 LnX 8 LnX 9 Fgure 3. Scatter Plot between LnQ and LnX s. Model [] s now ftted by OLS as followng: LnQ = 5.8759 +.688 * LnX +.3757 * LnX3 + (-.33576) * LnX6 +.7499 * LnX8 + (-.45653) * LnX9. (eq. ) 8
The LnX s seem to be more sgnfcant to LnQ than the X s to Q accordng to the t tests n ANOVA. Meanwhle the constant-varance test seems to support the constant varance hypothess on the resduals of LnQ n model [] more than the constant varance hypothess on the resduals of Q n model []. The scatter plots of the resduals of LnQ vs. LnQ and LnX s are shown n fgure 4. Except one seemly outler at the lower part of each fgure, the scatter plots of the resduals of model [] doesn t assume any promnent nonlnear trend pattern, but rather random, whch ndcates that the logarthm transformaton s effectve to get rd of the nonlnear relaton between Q and X s. However, the varance of the resduals seems to ncrease wth the ncrease of the predcted LnQ, whch calls the usage of WLS n the secton later on. Model [] OLS resdual.5 -.5-4 6 8 Predcted LnQ Scatter Plots of Model [] OLS resduals vs LnQ & LnX's.5.5 Model [] OLS resdual -.5 - -4 - LnX Model [] OLS resdual -.5-3 LnX 3.5.5.5 Model [] OLS resdual -.5 - - Model [] OLS resdual -.5 - -.5.5.5 Model [] OLS resdual -.5 - LnX 6 LnX 8 LnX 9 Fgure 4. Scatter Plot of the resdual of LnQ to LnQ and Ln X s. However, from the normalty check as above by Proc UNIVARIATE as shown as followng, the magntude of both the skewness (now -.876655 from -.664666) and the kurtoss (now 5.6776995 from.473) ncreases a lot. The normalty tests become more sgnfcant to reject the null hypothess of normalty. The normal probablty plot of the logarthm transformed resduals swng more away from the normalty lne too. All the evdence ndcates that the logarthm transformaton worsens the non-normalty of the orgnal data. In another word, the transformaton to cure one certan volaton of the assumpton may not mprove the volaton on the other assumpton. The UNIVARIATE Procedure Varable: resd (Resdual) 9
Moments N 3 Sum Weghts 3 Mean Sum Observatons Std Devaton.9544 Varance.389758 Skewness -.876655 Kurtoss 5.6776995 Uncorrected SS.77975 Corrected SS.77975 Coeff Varaton. Std Error Mean.356867 Tests for Normalty Test --Statstc--- -----p Value------ Shapro-Wlk W.84647 Pr < W.5 Kolmogorov-Smrnov D.635 Pr > D.4 Cramer-von Mses W-Sq.8367 Pr > W-Sq.8 Anderson-Darlng A-Sq.778 Pr > A-Sq <.5 Normal Probablty Plot.5+ +++*+* * ******* * ******++ ****++ * * ****+ -.5+ +++++ ++++* +++++ -.75+ * +----+----+----+----+----+----+----+----+----+----+ - - + + (5) Conduct Weghted Least Squares (WLS) for the model []. Suggestons: (5.) use the levels of X6 (.5,.,.5 and.) to group the data, (5.) compute the varance (VAR) of the resduals from the model [] usng OLS for each group, (5.3) use /VAR as a weght for each group, (5.4) prnt out the varance and weght for each group, and (5.5) conduct WLS for the model. Followng the suggestons (5. - 4), the data are frst grouped by X6 nto 4 groups. And then the varance s calculated wthn each group. Each group of observatons uses the same weght /VAR among ts group members. The weghts are summarzed as followng, where Obs s the Observaton Indces, G s the Group Index, var s the Varance among that group, w s the weght for that group. For nstance, observatons to 6 belong to group wth common varance.4646, and the resduals correspondng to these data ponts are weghted by the common weght.5396. Obs G var w -6.4646.5396
7-5.868.4738 Obs G var w 6-3.68 6.46-3 4.393 7.839 The WLS regresson results by Proc REG wth opton SPEC and WEIGHT w are shown as followng: The REG Procedure Model: modelwls Dependent Varable: LNQ Weght: w Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 38.363 66.636 56.8 <. Error 4 6.3667.9836 Corrected Total 9 37.67699 Root MSE.483 R-Square.995 Dependent Mean 6.497 Adj R-Sq.9898 Coeff Var 6.36783 Parameter Estmates Parameter Standard Varable DF Estmate Error t Value Pr > t Intercept 5.944.64 36.64 <. LNX.68397.7 4. <. LNX3.3854.7587 5. <. LNX6 -.334.74-4.6. LNX8.678.4665.4 <. LNX9 -.43948.36 -.67 <. Test of Frst and Second Moment Specfcaton DF Ch-Square Pr > ChSq 4.7.863 Model [] s now ftted by WLS as followng: LnQ = 5.944 +.68397 * LnX +.3854 * LnX3 + (-.334) * LnX6 +.678 * LnX8 + (-.43948) * LnX9. (eq. 3) <Dscusson on how the OLS and WLS resduals should be compared> Ths dscusson s based on the page -3 of the book Peter J. Bckel, Kjell A. Doksum.
Mathematcal Statstcs: Basc Ideas and Selected Topcs. Vol, nd Edton. In OLS, the varance of the resduals for dfferent range of data s assumed to be constant σ, therefore the objectve s to mnmze n [ y ( β + β j xj )], where the regresson part or specfcally the regresson = j= k parameters are what should be derved from the observatons. However, f the varance of the resduals are not constant but Var ( e ) = v σ. We may consder a transformed set of observatons, y and v n = y [ v e x v j, j=,,k; =,,n by dvdng the coeffcent of the standard devaton. If we mnmze β + k j= v β x j j ] e as n OLS, the resdual should be correspondng to. Notce that v Var ( ) = v σ = σ satsfes the assumpton of OLS, thus the WLS s essentally an OLS to v v y x j process the transformed observatons { and, j=,,k; =,,n }. In practce, the true v v varance of the resduals may not be avalable, so the varance s substtuted by the emprcal estmaton of the resdual varance VAR, and the weght s w =, where the ndex s VAR nomnal n the sense that ether each observaton has ts own weght or a group of observatons may share the same weght. In our case as wll be llustrated later on, we are usng common weght for observatons n each group. In summary, the objectve of WLS s to mnmze n w [ y ( β + β x )], and the conceptual procedure s to frst weght the observatons and = j= k j j then do the ORL. The resultant WLS regresson model s n the form of w yˆ = w β + w β, whch s equvalent to the OLS formula yˆ = β + β x Therefore the WLS regresson parameters should be close to the OLS regresson parameters. Further n comparng the OLS and WLS resduals, the OLS s treatng { e } as the resdual n the mnmzaton, smlarly OLS treats { y } tself as the dependent varable, and { x j }, j =,... k by themselves as the ndependent varables. But the WLS s treatng { w e} as the resdual n the mnmzaton, smlarly WLS treats w } as the dependent varable and w ( β + β x ) as the { y regresson model. Therefore n comparng the resduals of OLS and WLS, we should wegh all the varables n WLS for a far comparson. k j= j j k j= Please note that n our project the notaton W s used as W = / VAR, so we should premultply both sdes of WLS results by W. Therefore for far comparson to the regular resduals j k j= j j x j
from OLS, both the resduals from WLS and the LnQ or LnX s are frst weghted by ther weghts utlzed n the WLS regresson and then plotted n the resdual plot n fgure 5. It s easly observable that the varaton of the resduals of the WLS model s almost the same for all the range of the predcted LnQ. Therefore the WLS method s very effectve to stable the varance of the resduals. Model [] WLS resdual - -4 4 6 Predcted LnQ Scatter Plots of Model [] WLS resduals vs LnQ & LnX's Model [] WLS resdual - -4-3 - - LnX Model [] WLS resdual - -4 5 5 LnX 3 Model [] WLS resdual - - -3-5 5 Model [] WLS resdual - - -3 5 5 Model [] WLS resdual - - -3 5 5 LnX 6 LnX 8 LnX 9 Fgure 5. Scatter Plot of the WLS resduals of LnQ to LnQ and Ln X s. All varables are already weghted by the square root of the whole weght. The normalty check by Proc UNIVARIATE gves the followng result: The UNIVARIATE Procedure Varable: resd4 Moments N 3 Sum Weghts 3 Mean -.34873 Sum Observatons -.469 Std Devaton.9574899 Varance.977364 Skewness -.6343 Kurtoss.73984 Uncorrected SS 6.36674 Corrected SS 6.34886 Coeff Varaton -73.58 Std Error Mean.7394737 Tests for Normalty Test --Statstc--- -----p Value------ 3
Shapro-Wlk W.9996 Pr < W.85 Kolmogorov-Smrnov D.5377 Pr > D.7 Cramer-von Mses W-Sq.59457 Pr > W-Sq.79 Anderson-Darlng A-Sq.93644 Pr > A-Sq.68 Normal Probablty Plot.75+ +++*+ ++++++ ****** * * *.5+ ******+ ******++ +**+++ -.5+ +++*+* ++++* * +++++ -.75+ * +----+----+----+----+----+----+----+----+----+----+ - - + + By WLS on model [], the magntude of both the skewness (now -.6343 from -.8767) and the kurtoss (now.977364 from 5.6776995) decreases a lot from the OLS model []. The normalty tests become less sgnfcant to reject the null hypothess of normalty. Only the normal probablty plot of the logarthm-transformed-resduals seems to be comparable to the OLS result. Above evdence ndcates that the WLS method mproves the normalty from the OLS method. So WLS can not only stable the varance, but also luckly mprove the normalty of the resduals. (6) Compare the OLS model and WLS model for the model [], ncludng parameter estmates, standard errors of the parameters, and resdual plots. For easer comparson, the OLS result on model [], the equaton, and the WLS result on model [], the equaton 3, are coped here: LnQ = 5.8759 +.688 * LnX +.3757 * LnX3 + (-.33576) * LnX6 +.7499 * LnX8 + (-.45653) * LnX9. (eq. ) LnQ = 5.944 +.68397 * LnX +.3854 * LnX3 + (-.334) * LnX6 +.678 * LnX8 + (-.43948) * LnX9. (eq. 3) The regresson coeffcents from both methods don t change much, whch s preferable as dscussed n the far comparson of resduals of OLS and WLS. Even though the OLS result s unbased, t s subject to greater samplng varaton, whch can be seen from the estmated standard error n Table. Meanwhle, all ndependent varables n WLS become more sgnfcant than they are n the OLS method. For succnct comparson on the scatter plots of the resdual v.s. predcted LnQ, fgure 6 s 4
extracted from fgure 4 and fgure 5. Be advsed that fgure 4, or the left part of fgure 6, s by the OLS method; fgure 5, or the rght part of fgure 6, s by the WLS method, and all the varables have been weghted by the weghts that are used n the WLS. From fgure 6, t s easly dscernable that by OLS method, the varance of the resduals tends to ncrease wth the ncrease of the predcted LnQ. But by WLS, the varance of the resduals seems to be constant wthn the range of the data, whch s exactly why we wanted to mplement the WLS method. Varable Parameter Estmate Standard Error Table. Parameter estmatons by OLS and WLS OLS t Value Pr > t Parameter Estmate Standard Error WLS t Value Pr > t Intercept 5.8759. 6.68 <. 5.944.64 36.64 <. LNX.688.343 9.7 <..68397.7 4. <. LNX3.3757.9489 3.95.6.3854.7587 5. <. LNX6 -.33576.85-4..4 -.334.74-4.6. LNX8.7499.6768.9 <..678.4665.4 <. LNX9 -.45653.4975-9.73 <. -.43948.36 -.67 <. Model [] OLS resdual.6.4. -. -.4 -.6 Resdual Scatter Plots of Model [] by OLS and WLS Model [] WLS resdual - - -.8 3 4 5 6 7 8 9 Predcted LnQ by OLS -3 3 4 5 6 7 Predcted LnQ by WLS Fgure 6. Scatter Plot of the resduals v.s. predcted LnQ by OLS and WLS. (7) Re-express the WLS model [] on the orgnal scale (by takng the antlogarthm of your equaton). Does ths equaton make sense? Would you expect the varables n the model to be mportant? From equaton (3), the nverse logarthm transform can help derve the formula for the orgnal Q: 5
Q = exp [ 5.944 +.68397 * LnX +.3854 * LnX3 + (-.334) * LnX6 +.678 * LnX8 + (-.43948) * LnX9] 5.944.678.678 e X X X 8 37.3357 X X X 8 Q = = (eq. 4) X X X.68397.334 6.3854 3.43948 9.68397.3854 3.334.43948 6 X 9 From ths formula, we can see that Q s monotoncally ncreasng wth the ncrease of the X, X 3 and X 8, but t s monotoncally decreasng wth the ncrease of the X 6 and X 9. As to the varables physcal meanng, Q s the peak rate of flow of the water from the watersheds, we would expect t to be bg f the area of the watersheds (X ) s bg, or the average slope of watershed (X 3 ) s steep, or the ranfall (X 8 ) s strong. On the other hand, f the estmated sol storage capacty (X 6 ) s bg, the peak rate of flow of water (Q) s expected to slow down. Or f the tme perod durng whch ranfall exceeded ¼ nch/hour (X 9 ) s long, whch means that the strong storm doesn t happen very frequently, the Q s also expected to slow down. Based on ths analyss, our model s consstent wth the physcal mechansm thus t s meanngful and the varables used n ths model are mportant to predct the Q. 3. SUMMARY Ths project s to model the peak rate Q of flow of the water from sx watersheds based on the area of the watersheds (X ), the average slope of watershed (X 3 ), the estmated sol storage capacty (X 6 ), the ranfall (X 8 ) and the tme perod durng whch ranfall exceeded ¼ nch/hour (X 9 ). The ordnary least square method s frst used on the raw observatons of above varables, but the resdual plot shows that there s nonlnear relatonshp between the resdual and the predcted Q value. Therefore the logarthm transformaton s used to lnearze the relaton between the dependent varable and the ndependent varables. Then the OLS s used on the logarthm-transformed varables agan, but the resdual analyss shows that the varance of the resduals s not constant wthn the range of observatons. Further the weghted least square (WLS) method s utlzed to stable the varance, and t can also mprove the normalty of the resduals luckly. The OLS and WLS models are compared n detal on the logarthm-transformed varables. Fnally the nverse logarthm transform s mplemented to convert the LnQ back to Q, and the physcal meanng of the constructed model s dscussed. The model shows that the Q wll ncrease wth the ncrease of X, X 3 and X 8, but Q wll decrease wth the ncrease of X 6 and X 9, whch make sense physcally. 6
Appendx: p.sas ************************************************** * THE SAS IS USED FOR MULTIPLE LINEAR REGRESSION * * FLOW RATE DATA * **************************************************; OPTIONS NOCENTER NODATE LS=97 PS=76 PAGENO=; *---INPUT DATA-----------------------------------; DATA ALL; INFILE 'I:\APM63\FLOW.DAT'; INPUT X-X9 Q; LNQ=LOG(Q); LNX=LOG(X); LNX=LOG(X); LNX3=LOG(X3); LNX4=LOG(X4); LNX5=LOG(X5); LNX6=LOG(X6); LNX7=LOG(X7); LNX8=LOG(X8); LNX9=LOG(X9); RUN; *=== = Descrptve parameters ====; proc corr data=all; var Q X X3 X6 X8 X9; proc gplot data=all; plot Q*X; symbol v=star h=; ttle 'Scatter Plot between Q and X'; *=== = OLS on the raw data: model [] ====; proc reg data=all; modelols: model Q = X X3 X6 X8 X9 /spec; output out=out p=pred r=resd; ttle 'OLS for model []'; *=== = Resdual Analyss for OLS on the raw data: model [] ====; proc gplot data=out; plot resd*pred='*' / VREF =; plot resd*x='*' /VREF=; plot resd*x3='*' /VREF= ; plot resd*x6='*' /VREF=; plot resd*x8='*' /VREF=; plo t resd*x9='*' /VREF=; proc unvarate data=out plot normal; var resd; *=== = OLS on the log data: model [] ====; proc reg data=all corr; modelols: model LNQ = LNX LNX3 LNX6 LNX8 LNX9 /spec; output out=out p=pred r=resd; ttle 'OLS for model []'; *=== = Resdual Analyss for OLS on the log data: model [] ====; proc gplot data=out; plot resd*pred='*' / VRE F =; plot resd*lnx='*' /VREF=; plot resd*lnx3='*' /VREF=; plot resd*lnx6='*' /VREF= ; plot resd*lnx8='*' /VREF=; plo t resd*lnx9='*' /VREF=; proc unvarate data=out plot normal; var resd; *==== group the data by X6: model [] ====; data new; set all; 7
f X6=.5 t hen G=; else f X6=. then G=; else f X6=.5 then G= 3; els e f X6=. then G=4; proc sort data=new; by G; proc reg data=new; groupng: model LNQ = LNX LNX3 LNX6 LNX8 LNX9 /spec; ID G; output out=out3 p=pred3 r=resd3; proc sort data=out; by G; *=== = compute the varance of the resduals for each group ====; proc means n mean var data=out3 noprnt; var resd3; by G; output out=varance n=n mean=mean var=var; *==== compute the weght for each group ====; data weght; merge new varance; by G; w=/var; proc prnt data=weght; var G var w; *=== = WLS on the log data: model [] ====; proc reg data=weght corr; modelwls: model LNQ = LNX LNX3 LNX6 LNX8 LNX9 /spec; weght w; output out=out3 p=pred3 r=resd3; ttle 'WLS for model []'; data out4; set out3; pred4=sqrt(w)*pred3; resd4=sqrt(w)*resd3; LNXw=sqrt(w)*LNX; LNX3w=sqrt(w)*LNX3; LNX6w=sqrt(w)*LNX6; LNX8w=sqrt(w)*LNX8; LNX9w=sqrt(w)*LNX9; proc gplot data=out4; plot resd4*pred4='*' /vref= ; plot resd4*lnxw='*' /VREF=; plot resd4*lnx3w='*' /VREF=; plot resd4*lnx6w='*' /VREF= ; plot resd4*lnx8w='*' /VREF= ; plo t resd4*lnx9w='*' /VREF=; proc unvarate data=out4 plot normal; var resd4; 8
Table. Descrpton of varables n the dataset, among whch X, X3, X6, X8 and X9 are used n developng the model. Name Varable Dependent Varable Q Peak rate of flow (cfs) of water Independent Varables X Area of watersheds (m ) X Area mpervous to water (m ) X3 Average slope of watershed (%) X4 Longest stream flow n watershed (n thousands of feet) X5 Surface absorbency ndex: X6 = complete absorbency, = no absorbenc Estmated sol storage capacty (nches of water) y X7 X8 Infltraton rate of water nto sol (nches/hour) Ranfall (nches) X9 Tme perod durng whch ranfall exceeded ¼ nch/hour 9