σ e, which will be large when prediction errors are Linear regression model

Similar documents
Statistics TI-83 Usage Handout

Linear Regression with One Regressor

Chapter 7 1. Random Variables

Using the TI-83 Statistical Features

Tests for the Difference Between Two Linear Regression Intercepts

8.3 CI for μ, σ NOT known (old 8.4)

Business Statistics: A First Course

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

Introduction to Population Modeling

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

Chapter 14. Descriptive Methods in Regression and Correlation. Copyright 2016, 2012, 2008 Pearson Education, Inc. Chapter 14, Slide 1

The Least Squares Regression Line

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Correlation between Inflation Rates and Currency Values

Normal Probability Distributions

Chapter 7 presents the beginning of inferential statistics. The two major activities of inferential statistics are

STAT:2010 Statistical Methods and Computing. Using density curves to describe the distribution of values of a quantitative

Lecture 39 Section 11.5

Distribution. Lecture 34 Section Fri, Oct 31, Hampden-Sydney College. Student s t Distribution. Robb T. Koether.

Homework Assignment Section 3

Stat 328, Summer 2005

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

Determining Sample Size. Slide 1 ˆ ˆ. p q n E = z α / 2. (solve for n by algebra) n = E 2

Homework Assignment Section 3

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

The Normal Distribution

Statistics for Business and Economics

Measure of Variation

The t Test. Lecture 35 Section Robb T. Koether. Hampden-Sydney College. Mon, Oct 31, 2011

Study Ch. 11.2, #51, 63 69, 73

Lecture 5: Fundamentals of Statistical Analysis and Distributions Derived from Normal Distributions

Quantile Regression due to Skewness. and Outliers

Chapter 6 Confidence Intervals Section 6-1 Confidence Intervals for the Mean (Large Samples) Estimating Population Parameters

We take up chapter 7 beginning the week of October 16.

Rand Final Pop 2. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

CHAPTER 8. Confidence Interval Estimation Point and Interval Estimates

MBF1923 Econometrics Prepared by Dr Khairul Anuar

Chapter 7. Inferences about Population Variances

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

Lecture 35 Section Wed, Mar 26, 2008

Statistics 101: Section L - Laboratory 6

8.1 Estimation of the Mean and Proportion

Hedging and Regression. Hedging and Regression

AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:

Lecture 6: Confidence Intervals

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION

Statistics Class 15 3/21/2012

Stat3011: Solution of Midterm Exam One

GETTING STARTED. To OPEN MINITAB: Click Start>Programs>Minitab14>Minitab14 or Click Minitab 14 on your Desktop

A STATISTICAL ANALYSIS OF GDP AND FINAL CONSUMPTION USING SIMPLE LINEAR REGRESSION. THE CASE OF ROMANIA

WEB APPENDIX 8A 7.1 ( 8.9)

σ 2 : ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Analysis of Variance in Matrix form

Analysis of 2x2 Cross-Over Designs using T-Tests for Non-Inferiority

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

Final Exam Suggested Solutions

Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA

(i.e. the rate of change of y with respect to x)

Risk Analysis. å To change Benchmark tickers:

Lecture 6: Chapter 6

Tests for Intraclass Correlation

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

Random Variables and Probability Distributions

Multiple Regression. Review of Regression with One Predictor

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 Price

Independent-Samples t Test

Confidence Intervals and Sample Size

Econometric Methods for Valuation Analysis

Chapter 7: Random Variables

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Chapter 6 Confidence Intervals

MATH 143: Introduction to Probability and Statistics Worksheet for Tues., Dec. 7: What procedure?

Chapter 13 Return, Risk, and Security Market Line

Statistics for Managers Using Microsoft Excel 7 th Edition

Discrete Random Variables

Chapter 4 Variability

Statistical Intervals. Chapter 7 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Determination of the Optimal Stratum Boundaries in the Monthly Retail Trade Survey in the Croatian Bureau of Statistics

Going from General to Specific

Linear regression model

Statistics 511 Additional Materials

Learning Objectives for Ch. 7

3.3-Measures of Variation

Econometrics and Economic Data

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Regression and Simulation

Chapter 7. Confidence Intervals and Sample Sizes. Definition. Definition. Definition. Definition. Confidence Interval : CI. Point Estimate.

Class 16. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Discrete Random Variables

Correlation Sections 4.5, 4.6

Discrete Random Variables

SLIDES. BY. John Loucks. St. Edward s University

Continuous Distributions

MATH 143: Introduction to Probability and Statistics Worksheet 9 for Thurs., Dec. 10: What procedure?

Confidence Intervals for the Difference Between Two Means with Tolerance Probability

The instructions on this page also work for the TI-83 Plus and the TI-83 Plus Silver Edition.

Statistics 13 Elementary Statistics

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Transcription:

Linear regression model we assume that two quantitative variables, x and y, are linearly related; that is, the population of (x, y) pairs are related by an ideal population regression line y = α + βx + e where α and β represent the y-intercept and slope coefficients; the quantity e is included to represent the fact that the relation is subject to random errors in measurement e can be interpreted to represent either - the deviation from the mean of that value of y from the population regression line, or - the error in using the line to predict a value of y from the corresponding given x we assume that e is a normally distributed random variable with mean µ e = 0 and standard deviation σ e, which will be large when prediction errors are large and small when prediction errors are small

note that e is a different random variable for different values of x; all such e are assumed to be independent of each other and identically distributed (so there is no harm in giving them the same name) for a fixed value x* of x, the quantity α + βx* represents the (fixed) height of the regression line at x = x*, so y = α + βx* + e is subject to the same kind of variability as e: namely, y is normally distributed with mean µ y = α + βx* and standard deviation σ y = σ e β, being the slope of the line, represents the change in µ y associated with a unit change in x; that is, β is the average change in y associated with a unit change in x parameters of interest for the regression model are σ e, which measures the ideal size of errors in using the line to make predictions of y values, and β, which measures the average change in y associated with a unit change in x

Estimating regression parameters estimating σ e standard deviation about the regression line s e = SSResid n 2 is not an unbiased estimator of σ e [TI83: STAT TESTS LinRegTTest (denoted s)] estimating β the slope of the regression line, b = r s x s y, is an unbiased estimator for β [TI83: STAT TESTS LinRegTTest, also STAT CALC LinReg(a+bx).]

The sampling distribution for b the sampling distribution of b is studied to determine how estimates of β will behave from sample to sample assuming that the n data points produce identical independent normally distributed errors e, all with mean 0 and standard deviation σ e, we have that - µ b = β σ e - σ b = s x n 1 - the sampling distribution of b is normal, but since neither σ e nor σ b are known, we estimate σ e with the statistic s e, and σ b with the statistic s e s b =, then estimate b with the statistic s x n 1 t = b β having df = n 2 s b

Confidence interval for β Assuming that the n data points produce identical independent normally distributed errors e, all with mean 0 and standard deviation σ e, we obtain the following confidence interval for β: b ± (t-crit.) s b where the t-critical value is based on df = n 2

Model utility test for linear regression If the slope of the regression line is β = 0, then the line is horizontal and values of y do not depend on x, so there is no use to search for a prediction of y based on knowledge of x. A test for whether β = 0 can determine whether it is appropriate to search for a linear regression between the variables x and y. Hypotheses H 0 : β = 0 H a : β 0 Test statistic Assumptions t = b 0 s b, with df = n 2 independent normally distributed errors with mean 0 and equal standard deviatons [TI-83: STAT TESTS LinRegTTest ]

Residual analysis We can use a residual plot (a plot of residuals vs. x values) to check whether it is reasonable to assume that errors are identically distributed independent normal variables; the z-scores of these residuals can be used to display a standardized residual plot: z resid = resid 0 s resid but the standard deviations of each residual vary from point to point and are not automatically calculated by the TI-83. Many statistical packages, however, do perfome these calculations. What to look for: absence of patterns in the (standardized) residual plot very few large residuals (more than 2 standard deviations from the x-axis) no variations in spread of the residuals (would indicate that σ e varies with x) influential residuals (residual points far removed from the bulk of the plot)

The sampling distribution for a + bx* Assuming that the n data points produce identical independent normally distributed errors e, all with mean 0 and standard deviation σ e, we study the distribution of the prediction statistic a + bx* for some fixed choice of x = x*. a + bx* is an unbiased estimate for the true regression value α + βx*, which thus represents µa + bx* 1 σ a + bx* = σ e n + z x n 1 1 statistic s a + bx* = s e n + z x n 1 2 and is estimated by the 2 a + bx* is normally distributed, but replacing σ a + bx* with the estimate s a + bx* produces a standardized t variable with df = n 2

Confidence interval for a + bx* With the same assumptions as above, the confidence interval formula for a + bx*, the mean value of the predicted y, is where t has df = n 2 (a + bx*) ± (t-crit) s a + bx* Prediction intervals With the same assumptions as above, the prediction interval formula for y*, the prediction of y for the x value x = x*, is (a + bx*) ± (t-crit) s 2 2 e + s a+bx* where t has df = n 2 (variability comes not only from the size of the error but the extent to which the estimate a + bx* differs from the mean value)