Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly.

Similar documents
Topic 8: Model Diagnostics

Chapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats

1. Distinguish three missing data mechanisms:

Descriptive Analysis

GGraph. Males Only. Premium. Experience. GGraph. Gender. 1 0: R 2 Linear = : R 2 Linear = Page 1

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

The SAS System 11:03 Monday, November 11,

Empirical Rule (P148)

Introduction to Statistical Data Analysis II

Two Way ANOVA in R Solutions

2018 AAPM: Normal and non normal distributions: Why understanding distributions are important when designing experiments and analyzing data

LAMPIRAN IV PENGUJIAN HIPOTESIS

SAS Simple Linear Regression Example

Financial Time Series and Their Characteristics

Solutions for Session 5: Linear Models

LAMPIRAN 1: OUTPUT SPSS

Stat 328, Summer 2005

Review: Chebyshev s Rule. Measures of Dispersion II. Review: Empirical Rule. Review: Empirical Rule. Auto Batteries Example, p 59.

Table of Contents. New to the Second Edition... Chapter 1: Introduction : Social Research...

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Homework 0 Key (not to be handed in) due? Jan. 10

Time series data: Part 2

The FREQ Procedure. Table of Sex by Gym Sex(Sex) Gym(Gym) No Yes Total Male Female Total

An Insight Into Heavy-Tailed Distribution

Valid Missing Total. N Percent N Percent N Percent , ,0% 0,0% 2 100,0% 1, ,0% 0,0% 2 100,0% 2, ,0% 0,0% 5 100,0%

1.1 ANNUAL PRICE MODEL

Appendix A (Pornprasertmanit & Little, in press) Mathematical Proof

One Way ANOVA with Tukey Post hoc. Case Processing Summary

Monetary Economics Risk and Return, Part 2. Gerald P. Dwyer Fall 2015

Loss Simulation Model Testing and Enhancement

You created this PDF from an application that is not licensed to print to novapdf printer (

9. Appendixes. Page 73 of 95

Performance of Credit Risk Management in Indian Commercial Banks

SPSS t tests (and NP Equivalent)

Handout seminar 6, ECON4150

tm / / / / / / / / / / / / Statistics/Data Analysis User: Klick Project: Limited Dependent Variables{space -6}

Are the movements of stocks, bonds, and housing linked? Zachary D Easterling Department of Economics The University of Akron

Joseph O. Marker Marker Actuarial Services, LLC and University of Michigan CLRS 2011 Meeting. J. Marker, LSMWP, CLRS 1

MODEL SELECTION CRITERIA IN R:

LAMPIRAN PERHITUNGAN EVIEWS

Application of value at risk on Moroccan exchange rates

Heteroskedasticity. . reg wage black exper educ married tenure

Frequency Distribution Models 1- Probability Density Function (PDF)

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

F^3: F tests, Functional Forms and Favorite Coefficient Models

*1A. Basic Descriptive Statistics sum housereg drive elecbill affidavit witness adddoc income male age literacy educ occup cityyears if control==1

Power comparisons of some selected normality tests

HASIL PENELITIAN BERUPA OUTPUT SPSS

Lecture 18 Section Mon, Feb 16, 2009

Lecture 18 Section Mon, Sep 29, 2008

Appendixes Appendix 1 Data of Dependent Variables and Independent Variables Period

İnsan TUNALI 8 November 2018 Econ 511: Econometrics I. ASSIGNMENT 7 STATA Supplement

Chapter 11 Part 6. Correlation Continued. LOWESS Regression

Empirical Asset Pricing for Tactical Asset Allocation

Financial Time Series Analysis (FTSA)

Intro. Econometrics Fall 2015

TSP-Programm zu Abschnitt 6.1.: Zeitreihenanalyse

Data analysis methods in weather and climate research

Technology Arts Sciences Cologne Faculty of Economics, Business and Law Prof. Dr. Arrenberg Room 221, Tel

Assignment #5 Solutions: Chapter 14 Q1.

Question 1a 1b 1c 1d 1e 1f 2a 2b 2c 2d 3a 3b 3c 3d M ult:choice Points

Data Distributions and Normality

A New Hybrid Estimation Method for the Generalized Pareto Distribution

Conditional Heteroscedasticity and Testing of the Granger Causality: Case of Slovakia. Michaela Chocholatá

Where s the Beef Does the Mack Method produce an undernourished range of possible outcomes?

David Tenenbaum GEOG 090 UNC-CH Spring 2005

Monetary Economics Measuring Asset Returns. Gerald P. Dwyer Fall 2015

Mean GMM. Standard error

Measures of Variability

Exploratory Data Analysis (EDA)

Lampiran 1 : Grafik Data HIV Asli

Dummy variables 9/22/2015. Are wages different across union/nonunion jobs. Treatment Control Y X X i identifies treatment

u panel_lecture . sum

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Homework Problems Stat 479

Two-Sample T-Test for Superiority by a Margin

Impact of Household Income on Poverty Levels

The Spot Forward Exchange Rate Relation in Indian Foreign Exchange Market - An Analysis

9/17/2015. Basic Statistics for the Healthcare Professional. Relax.it won t be that bad! Purpose of Statistic. Objectives

Overview/Outline. Moving beyond raw data. PSY 464 Advanced Experimental Design. Describing and Exploring Data The Normal Distribution

The Multivariate Regression Model

Integration of Financial Markets: A Study of Egypt and Palestine Stock Markets

Two-Sample T-Test for Non-Inferiority

Econometrics is. The estimation of relationships suggested by economic theory

Financial Returns. Dakota Wixom Quantitative Analyst QuantCourse.com INTRO TO PORTFOLIO RISK MANAGEMENT IN PYTHON

is the bandwidth and controls the level of smoothing of the estimator, n is the sample size and

Example 1 of econometric analysis: the Market Model

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 22 January :00 16:00

Exchange rate. Level and volatility FxRates

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm

2 Exploring Univariate Data

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

Study 2: data analysis. Example analysis using R

Analysis Variable : Y Analysis Variable : Y E

Lecture 1: Empirical Properties of Returns

Variance clustering. Two motivations, volatility clustering, and implied volatility

Chapter 3. Descriptive Measures. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 3, Slide 1

Professor Brad Jones University of Arizona POL 681, SPRING 2004 INTERACTIONS and STATA: Companion To Lecture Notes on Statistical Interactions

Modeling Panel Data: Choosing the Correct Strategy. Roberto G. Gutierrez

How To: Perform a Process Capability Analysis Using STATGRAPHICS Centurion

Transcription:

Notice that X2 and Y2 are skewed. Taking the SQRT of Y2 reduces the skewness greatly. The MEANS Procedure Variable Mean Std Dev Minimum Maximum Skewness ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ X 48.5850000 9.9850529 16.0000000 74.0000000-0.0525127 Y 49.6550000 9.8473768 22.0000000 77.0000000-0.0687420 X2 45.6750000 27.1233804 1.0000000 144.0000000 1.0456103 Y2 48.4300000 27.4836653 3.0000000 163.0000000 0.9484129 Y2_SQRT 6.6750871 1.9729864 1.7320508 12.7671453 0.1711926 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ------------------------------------------------------------------------------------------------ Predicting Y from X. Number of Observations Read 200 Number of Observations Used 200 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 4032.22920 4032.22920 52.30 <.0001 Error 198 15265 77.09579 Corrected Total 199 19297 Root MSE 8.78042 R-Square 0.2090 Dependent Mean 49.65500 Adj R-Sq 0.2050 Coeff Var 17.68285 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 27.75229 3.09158 8.98 <.0001 X 1 0.45081 0.06234 7.23 <.0001

Predicting Y from X. ƒƒƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒƒƒ RESIDUAL 30 ˆ ˆ 20 ˆ ˆ 1 1 1 1 1 1 1 1 1 1 11 1 10 ˆ 1 1 1 11 1 1 ˆ 1 1 2 1 2 1 1 1 1 R 11 1 2 1 2 121 11 1 1 e 11 11 311 11 1 1 1 1 s 1 1 1 1111 1 1 2 1 1 i 1 2 1 1 2 2 12 1 1 d 0 ˆ 11 1 1 1 21 1 1 1 1 1 ˆ u 1 1 11 1 12 1 1 1 1 11 1 1 a 1 1 1 1 1 1 1 1 1 l 11 1 11 1 1 1 1 21 1 1 1 1 12 1 3 1 1 1 1 1 1 1 1 1 1-10 ˆ 1 1 1 1 1 1 ˆ 1 1 11 1 12 1 11 1 1 1 1 11 1 1 1 1 1 1-20 ˆ 1 1 ˆ -30 ˆ ˆ ŠƒƒƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒƒƒŒ 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 Predicted Value of Y PRED

Distribution of the residuals is normal. The UNIVARIATE Procedure Variable: Y_Resid (Residual) Moments N 200 Sum Weights 200 Mean 0 Sum Observations 0 Std Deviation 8.75833151 Variance 76.7083708 Skewness -0.073839 Kurtosis -0.2635309 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.995522 Pr < W 0.8233 Kolmogorov-Smirnov D 0.041479 Pr > D >0.1500 Cramer-von Mises W-Sq 0.050073 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.294818 Pr > A-Sq >0.2500 ------------------------------------------------------------------------------------------------ Stem Leaf # Boxplot 22 3 1 20 9 1 18 0 1 16 395 3 14 178055 6 12 25 2 10 166990167 9 8 067788902248 12 6 0002358889901244666 19 +-----+ 4 01223335678915568 17 2 0111125580233456789 19 0 01123448001229 14 *--+--* -0 888764439973111 15-2 998875543211 12-4 999976421099876654100 21 +-----+ -6 98832976110 11-8 9727210 7-10 87642219541 11-12 785520 6-14 841963 6-16 7452 4-18 -20 63 2-22 4 1 ----+----+----+----+-

Predicting Y from skewed X2. Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 3913.33497 3913.33497 50.37 <.0001 Error 198 15384 77.69626 Corrected Total 199 19297 Root MSE 8.81455 R-Square 0.2028 Dependent Mean 49.65500 Adj R-Sq 0.1988 Coeff Var 17.75158 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 42.18739 1.22297 34.50 <.0001 X2 1 0.16349 0.02304 7.10 <.0001

Predicting Y from skewed X2. ƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒ RESIDUAL 30 ˆ ˆ 20 ˆ ˆ 1 1 1 1 1 1 1 1 1 1 1 2 1 10 ˆ 11 1 1 2 1 1 1 ˆ 2 111 2 1 1 1 1 1 R 1 1 12 1 11 11 1 1 1 e 1 2111 2 12 1 1 1 1 1 s 111 1 2 1 1 21 1 1 1 i 1 1 1 11 11 1 1 1 d 0 ˆ 1 21 2 3 1 1 1 1 1 ˆ u 111 1 1 1 1 11 1 1 1 1 1 a 2 11 1 11 1 1 1 1 l 1 1 12 1 4 1 1 12 1 1 11 1 1 11 1 1 1 1 1 1 1 1-10 ˆ 1 11 1 1 1 1 1 1 ˆ 1 11 1 1 11 11 1 1 1 1 1 2 1 1 1 1 1 1-20 ˆ 1 1 ˆ -30 ˆ ˆ ŠƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒŒ 42 44 46 48 50 52 54 56 58 60 62 64 66 Predicted Value of Y PRED

Distribution of the residuals is normal. The UNIVARIATE Procedure Variable: Y_Resid (Residual) Moments N 200 Sum Weights 200 Mean 0 Sum Observations 0 Std Deviation 8.79237336 Variance 77.3058293 Skewness -0.0663383 Kurtosis -0.2895344 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.99566 Pr < W 0.8413 Kolmogorov-Smirnov D 0.042296 Pr > D >0.1500 Cramer-von Mises W-Sq 0.048393 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.290731 Pr > A-Sq >0.2500 Stem Leaf # Boxplot 22 76 2 20 18 23 2 16 2 1 14 48967 5 12 2362 4 10 1579246789 10 8 01466234666 11 6 0033366791226668899 19 +-----+ 4 22233445679134589 17 2 357889224678999 15 0 12225778912226679 17 *--+--* -0 98887544310744322 17-2 997433886420 12-4 8875442220954333210 19 +-----+ -6 8531175444 10-8 76428441 8-10 887543111985430 15-12 83 2-14 846511 6-16 94621 5-18 4 1-20 1 1-22 1 1 ----+----+----+----+

Predicting skewed Y from X. 2 Number of Observations Read 200 Number of Observations Used 200 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 30674 30674 50.76 <.0001 Error 198 119641 604.24645 Corrected Total 199 150315 Root MSE 24.58142 R-Square 0.2041 Dependent Mean 48.43000 Adj R-Sq 0.2000 Coeff Var 50.75661 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1-11.98045 8.65509-1.38 0.1679 X 1 1.24340 0.17451 7.12 <.0001

Predicting skewed Y from X. 2 ƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒ RESIDUAL 100 ˆ ˆ 80 ˆ 1 ˆ 60 ˆ ˆ 1 1 1 40 ˆ 1 1 1 1 1 ˆ R 1 1 1 e 1 1 1 1 s 1 1 1 1 1 1 i 1 1 2 1 1 1 d 20 ˆ 1 1 2 1 1 ˆ u 1 1 1 1 1 1 2 1 11 1 a 1 12 1 1 1 1 l 1111 11 1 1 1 1 1 1 1 1 1 111 1 1 1 2 1 0 ˆ 1 11 1 2 1 1 2 2 1 1 1 ˆ 1 1 1 2 1 1 1 1 1 12 1 2 11 1 1 1 1 121 1 111 1 1 1 1 1 1 1 1 12 111 1 11 11 1-20 ˆ 1 2 3 1 112 1 ˆ 2111 11 1 1 1 11 1 2 1 1 1 1 1 1 1 1 2 1 11 1 1 1 1 1-40 ˆ 1 1 1 ˆ -60 ˆ ˆ ŠƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒŒ 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Predicted Value of Y2 PRED

Distribution of the residuals is skewed. The UNIVARIATE Procedure Variable: Y_Resid (Residual) Moments N 200 Sum Weights 200 Mean 0 Sum Observations 0 Std Deviation 24.5195847 Variance 601.210036 Skewness 0.80271812 Kurtosis 0.96531758 Uncorrected SS 119640.797 Corrected SS 119640.797 Coeff Variation. Std Error Mean 1.73379646 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.961907 Pr < W <0.0001 Kolmogorov-Smirnov D 0.064107 Pr > D 0.0438 Cramer-von Mises W-Sq 0.196838 Pr > W-Sq 0.0058 Anderson-Darling A-Sq 1.363844 Pr > A-Sq <0.0050 ------------------------------------------------------------------------------------------------ Stem Leaf # Boxplot 8 249 3 0 7 6 5 2467 4 4 1112238 7 3 13345679 8 2 1122233444778999 16 1 001122334455555666677889 24 +-----+ 0 01111122223344455555666777888889 32 + -0 99888877775544444432222111 26 *-----* -1 999888888887777776666544433222111111100 39 +-----+ -2 9977777766665443322110 22-3 9987644322221110 16-4 631 3 ----+----+----+----+----+----+----+---- Multiply Stem.Leaf by 10**+1

Predicting transformed Y2 from X. 2_SQRT Number of Observations Read 200 Number of Observations Used 200 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 163.14571 163.14571 52.83 <.0001 Error 198 611.49665 3.08837 Corrected Total 199 774.64236 Root MSE 1.75738 R-Square 0.2106 Dependent Mean 6.67509 Adj R-Sq 0.2066 Coeff Var 26.32737 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 2.26941 0.61877 3.67 0.0003 X 1 0.09068 0.01248 7.27 <.0001

Predicting transformed Y2 from X. 2_SQRT ƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒ RESIDUAL 6 ˆ ˆ 4 ˆ 1 1 ˆ 1 1 1 1 1 R 1 1 1 1 2 e 2 ˆ 1 1 1 1 1 1 1 1 1 ˆ s 1 1 1 2 1 2 1 1 i 11 1 1 1 1 121 11 1 1 d 1 131 11 1 1 1 1 1 u 1 1 2 1121 1 1 1 1 2 1 1 a 1 1 1 2 1 1 2 1 1 l 0 ˆ 11 1 2 1 2 11 1 1 1 1 1 ˆ 1 1 1 1 2 1 1 11 1 1 11 1 11 11 1 1 1 1 11 1 111 1 11 211 11 1 1 1 12 1 3 1 11 1 1 11 1 1 1 1-2 ˆ 1 111 1 1 1 1 ˆ 1 1 1 1 12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1-4 ˆ 1 ˆ ŠƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒŒ 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 Predicted Value of Y2_SQRT PRED

Distribution of the residuals is nearly normal. The UNIVARIATE Procedure Variable: Y_Resid (Residual) Moments N 200 Sum Weights 200 Mean 0 Sum Observations 0 Std Deviation 1.75295393 Variance 3.07284748 Skewness 0.13316606 Kurtosis -0.2395222 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.994031 Pr < W 0.6051 Kolmogorov-Smirnov D 0.037527 Pr > D >0.1500 Cramer-von Mises W-Sq 0.028558 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.204974 Pr > A-Sq >0.2500 ------------------------------------------------------------------------------------------------ Distribution of the residuals is nearly normal. The UNIVARIATE Procedure Variable: Y_Resid (Residual) Stem Leaf # Boxplot 5 0 1 4 8 1 4 3 5799 4 3 01112 5 2 569 3 2 11112223344 11 1 555566778888899 15 1 00011112223333333344 20 +-----+ 0 5666666677778888888999 22 0 0111122333334444 16 + -0 4444433333221111111100 22 *-----* -0 99987666666555 14-1 4443333322222111100000000 25 +-----+ -1 99877766555 11-2 444433332221100 15-2 9988866 7-3 43211 5-3 986 3 ----+----+----+----+----+

New Data Set. The MEANS Procedure Variable Mean Std Dev Skewness ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Y 92.2400000 12.1248722 0.0211379 X 20.0000000 14.2133811 0 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ------------------------------------------------------------------------------------------------ Predict Y from X. Number of Observations Read 100 Number of Observations Used 100 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 174.84500 174.84500 1.19 0.2777 Error 98 14379 146.72852 Corrected Total 99 14554 Root MSE 12.11315 R-Square 0.0120 Dependent Mean 92.24000 Adj R-Sq 0.0019 Coeff Var 13.13221 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 90.37000 2.09806 43.07 <.0001 X 1 0.09350 0.08565 1.09 0.2777

Predict Y from X. ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒ RESIDUAL 30 ˆ ˆ 2 2 20 ˆ 3 ˆ 1 1 1 2 1 10 ˆ 3 ˆ 3 1 2 R e 2 2 2 1 s 1 3 1 1 2 i 2 2 d 0 ˆ 1 2 2 ˆ u 1 1 3 1 a 2 2 l 2 2 1 1 1 1 3-10 ˆ 2 3 ˆ 1 1 5 1 1 2 1 3-20 ˆ ˆ 2-30 ˆ ˆ ŠˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒŒ 90.0 90.5 91.0 91.5 92.0 92.5 93.0 93.5 94.0 94.5 Predicted Value of Y PRED

Marginal distribution of residuals does not show the problem. The UNIVARIATE Procedure Variable: residuals (Residual) Moments N 100 Sum Weights 100 Mean 0 Sum Observations 0 Std Deviation 12.051822 Variance 145.246414 Skewness 0.04651058 Kurtosis -0.1088063 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.990678 Pr < W 0.7193 Kolmogorov-Smirnov D 0.045974 Pr > D >0.1500 Cramer-von Mises W-Sq 0.024687 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.218939 Pr > A-Sq >0.2500 Stem Leaf # Boxplot 2 559 3 2 01134 5 1 5558 4 1 11112444 8 0 555556677889999 15 +-----+ 0 11112233344444 14 + -0 44443333221111000 17 *-----* -0 99987666655 11 +-----+ -1 33322211111100 14-1 7765 4-2 43 2-2 866 3 ----+----+----+----+ Multiply Stem.Leaf by 10**+1

Polynomial regression: Quadratic. Number of Observations Read 100 Number of Observations Used 100 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 6275.73429 3137.86714 36.77 <.0001 Error 97 8278.50571 85.34542 Corrected Total 99 14554 Root MSE 9.23826 R-Square 0.4312 Dependent Mean 92.24000 Adj R-Sq 0.4195 Coeff Var 10.01546 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 99.70571 1.94411 51.29 <.0001 X 1-1.77364 0.23030-7.70 <.0001 X_SQ 1 0.04668 0.00552 8.45 <.0001

Polynomial regression: Quadratic. ƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒ RESIDUAL 20 ˆ 1 1 ˆ 1 2 1 1 1 1 2 10 ˆ 2 2 1 ˆ 1 1 1 1 2 2 1 2 3 2 1 3 R 1 2 2 3 1 e 0 ˆ 2 2 1 3 ˆ s 4 1 3 3 i 4 1 1 d 2 2 2 u 2 1 1 1 a 2 1 1 2 l -10 ˆ 1 2 ˆ -20 ˆ 1 1 ˆ 2 1-30 ˆ ˆ ŠƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒŒ 82 84 86 88 90 92 94 96 98 100 102 104 Predicted Value of Y PRED

Polynomial regression: Quadratic. ƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒƒ Y 130 ˆ ˆ 120 ˆ. ˆ.. 110 ˆ.. ˆ...... X.. 100 ˆ?... ˆ.............. 90 ˆ.... ˆ..??...?.. 80 ˆ.... ˆ 70 ˆ ˆ 60 ˆ ˆ ŠƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒƒŒ 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5 40.0 X

------------------------------------------------------------------------------------------------ Another new data set. The MEANS Procedure Variable Mean Std Dev Minimum Maximum Skewness ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ X 34.4500000 8.8401242 7.0000000 60.0000000-0.0533010 Y 644.6480000 193.9131783 83.0000000 1457.00 0.3953723 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ------------------------------------------------------------------------------------------------ Heteroscedasticity. Number of Observations Read 500 Number of Observations Used 500 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 3289612 3289612 105.87 <.0001 Error 498 15473946 31072 Corrected Total 499 18763558 Root MSE 176.27303 R-Square 0.1753 Dependent Mean 644.64800 Adj R-Sq 0.1737 Coeff Var 27.34407 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 328.23603 31.74586 10.34 <.0001 X 1 9.18467 0.89264 10.29 <.0001

Heteroscedasticity. ƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒ RESIDUAL 600 ˆ.. ˆ.. 400 ˆ.. ˆ..................................... 200 ˆ............. ˆ.................. R.............. e....................... s.................... i....................... d 0 ˆ........................... ˆ u...................... a..................... l..................................................... -200 ˆ............. ˆ................................... -400 ˆ. ˆ..... -600 ˆ. ˆ ŠƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒŒ 350 400 450 500 550 600 650 700 750 800 850 900 Predicted Value of Y PRED