Test #1 (Solution Key)

Size: px
Start display at page:

Download "Test #1 (Solution Key)"

Transcription

1 STAT 47/67 Test #1 (Solution Key) 1. (To be done by hand) Exploring his own drink-and-drive habits, a student recalls the last 7 parties that he attended. He records the number of cans of beer he drank, his highway speed on the way home, and whether he was stopped by police. Party Beer, cans Speed, mph Stopped by police yes yes no no no yes yes Use the following statistical machine learning methods to predict whether or not the person will be stopped by police. (a) Tree. Draw a decision tree that can be used for this classification. Label each node, to make it clear how to use this tree for classification. What is the training error rate of this tree? Speed Beer (b) Pruning. Suppose the tree is pruned to just one split. If speed is the only variable used, what split minimizes the Gini index? What is the Gini index for this split, and what is the training error rate of this tree? (c) Bagging. Four bootstrap samples from the given data are (listed by the party number): Bootstrap sample #1 ( 1, 1,, 3, 4, 5, 6 ) Bootstrap sample # ( 3, 4, 4, 5, 5, 7, 7 ) Bootstrap sample #3 ( 3, 3, 4, 6, 7, 7, 7 ) Bootstrap sample #4 ( 1,,,, 5, 5, 6 ) Draw a tree based on each bootstrap sample. Predict the out-of-bag data by the majority vote. Use these results to estimate the testing error rate of this method. SOLUTION. (a) or Beer<.5 Beer<.5 All the terminal nodes are pure, so the training error rate is 0. (b) There are 3 possible splits - at speed = 57.5, 65, or 7.5 (or 57, 65, and 73). At 57.5, one node is pure, with ˆp no = 1, ˆp yes = 0, and for the other node, there are blue and 4 red points, so ˆp no = 1/3 and ˆp yes = /3. The Gini index is G = (1)(0)+(1/3)(/3) = /9. At 65, one node is pure, and for the other node, ˆp no = 3/4 and ˆp yes = 1/4. G = 0+(3/4)(1/4)) = 3/16. At 7.5, one node is pure, and for the other node, ˆp no = 1/ and ˆp yes = 1/. G = 0+(1/)(1/)) = 1/4. Thus, the best split minimizing the Gini index is by Speed = 65. The classification is:

2 - if Speed < 65, then classify to. - if Speed 65, then classify to. This rule classifies data points 1-6 correctly and misclassifies point 7. The training error rate is 1/7 or 14.3%. (c) Bootstrap sample #1 ( 1, 1,, 3, 4, 5, 6 ) Both nodes are pure, hence no more splits. Predict OOB Ŷ7 =. Bootstrap sample # ( 3, 4, 4, 5, 5, 7, 7 ) Beer< Speed Beer Both nodes are pure, hence no more splits. Predict OOB Ŷ1 = Ŷ = Ŷ6 =. Bootstrap sample #3 ( 3, 3, 4, 6, 7, 7, 7 ) Beer < 1.5 Both nodes are pure, hence no more splits. Predict OOB Ŷ1 = Ŷ = and Ŷ5 =. Bootstrap sample #4 ( 1,,,, 5, 5, 6 ) Both nodes are pure, hence no more splits. Predict OOB Ŷ3 = Ŷ4 = Ŷ7 =. By the majority vote, we have predictions: Ŷ 1 = Ŷ = Ŷ3 = Ŷ4 = Ŷ6 = Ŷ7 = and Ŷ5 =. Among the OOB data, Y 3,4 are classified correctly, and Y 1,,5,6,7 are misclassified, so the testing error rate is estimated to be 5/7 or 71.4%.

3 . (Use R for data analysis) A real estate appraiser is interested in a reliable working method of evaluating residential home prices as a function of various features. Data on 5 recent home sales are available on our course web site All the files there contain identical data in different formats. The following variables are included. Column Variable 1 Identification number 1 5 Sales price of residence ( $1000 dollars) 3 Finished area of residence (square feet) 4 Total number of bedrooms in residence 5 Total number of bathrooms in residence 6 Air conditioning: present or absent 7 Number of cars that garage will hold 8 Swimming pool: present or absent 9 Year property was originally constructed 10 Quality of construction: high, medium, or low 11 Architectural style. Three styles are coded as 1,, and 3 1 Lot size (square feet) 13 Location near a highway: yes or no While predicting Sales Price, we would like to reduce multi-collinearity in order to obtain a more reliable prediction. Compare performance of the following methods. (a) Variable selection. List variables removed from the model, and write the final regression equation that a real estate appraiser should use. What method do you apply? Predict the sales price of a high quality 4-bedroom house built in 1990 in style #3, with a -car garage, an air conditioner, a swimming pool, 3 bathrooms, finished area of 500 square feet, and a 0,000 square foot lot, that is far from a highway. (b) Ridge regression. Find the optimal tuning parameter λ. According to the model, how is the home price expected to change if one additional bathroom and a swimming pool are built, without changing other variables? (c) Lasso. Find the optimal tuning parameter λ. List variables that lasso eliminated from the model and variables that were retained. According to the model, is a home expected to gain value if it is close to a highway? Use your results to answer this question. (d) Principal components regression. Fit a principal components regression based on standardized (rescaled) X-variables. How many principal components are needed to explain 80% of the total variation of standardized X-variables? How many principal components are needed to explain 80% of the total variation of sales prices? (Stat-67 only) In (a,b,c), estimate the prediction mean squared error by some cross-validation. Which method provides the lowest prediction MSE?

4 SOLUTION. (a) Stepwise selection method removes Bedrooms, Pool, and Air Conditioner from the model. The resulting regression equation is Price = FinishedArea I Quality=low 13.6 I Quality=medium +1.5 Year LotSize 15.9 I Style= 38.1 I Style= Bathrooms 36.5 Highway +9.4 GarageSize. Predicted sales price is Ŷ =$44s,413. The prediction mean squared error is M SE = 3447 (in millions squared dollars), which corresponds to the root mean squared error of MSE = 58.7 thousand dollars. Computed by the method of leave-oneout cross-validation. (b) The optimal tuning parameter is about λ = It provides the prediction mean squared error of MSE = 3486 (in millions squared dollars) The home price is expected to increase by an estimated value of ˆβ bathroom + ˆβ pool = = 18.4 thousand dollars (c) The optimal tuning parameter is about λ = It provides the prediction mean squared error of MSE = 3467 (in millions squared dollars) Lasso has not removed any variables. The slope ˆβ highway = 33. is negative, so the home valueis expected to decrease if it is close to a highway (decrease by 33. thousand dollars). (d) The first seven principal components explain 83% of the total variation of X. The first twelve principal components explain 8% of the total variation of the response, the sales price.

5 R Codes (FYI) Reading data > HOMES = read.table(url(" > names(homes) [1] "V1" "V" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11" "V1" "V13" or > HOMES = read.csv(url(" > names(homes) [1] "ID" "SALES_PRICE" "FINISHED_AREA" "BEDROOMS" "BATHROOMS" [6] "GARAGE_SIZE" "YEAR_BUILT" "STYLE" "LOT_SIZE" "AIR_CONDITIONER" [11] "POOL" "QUALITY" "HIGHWAY" > head(homes) ID SALES_PRICE FINISHED_AREA BEDROOMS BATHROOMS GARAGE_SIZE YEAR_BUILT STYLE LOT_SIZE AIR_CONDITIONER POOL QUALITY HIGHWAY 1 MEDIUM MEDIUM 3 MEDIUM 4 MEDIUM 5 MEDIUM 6 MEDIUM > attach(homes) Regression and variable selection > null = lm( SALES_PRICE ~ 1, data=homes ) > full = lm( SALES_PRICE ~.-ID, data=homes ) > step( null, scope=list(lower=null, upper=full), direction="forward" ) SALES_PRICE ~ FINISHED_AREA + QUALITY + YEAR_BUILT + LOT_SIZE + as.factor(style) + BATHROOMS + HIGHWAY + GARAGE_SIZE Df Sum of Sq RSS AIC <none> BEDROOMS POOL AIR_CONDITIONER Coefficients: (Intercept) FINISHED_AREA QUALITYLOW QUALITYMEDIUM -.346e e e e+0 YEAR_BUILT LOT_SIZE as.factor(style) as.factor(style)3 1.49e e e e+01 BATHROOMS HIGHWAY GARAGE_SIZE 8.81e e e+00

6 > reg = glm( SALES_PRICE ~ FINISHED_AREA + BATHROOMS + GARAGE_SIZE + YEAR_BUILT + LOT_SIZE + QUALITY + HIGHWAY + as.factor(style), data=homes ) > Yhat = predict( reg, data.frame( FINISHED_AREA=500, BATHROOMS=3, GARAGE_SIZE=, YEAR_BUILT=1990, LOT_SIZE=0000, QUALITY="HIGH", HIGHWAY="", STYLE=3 ) ) > Yhat Predicted price of a given home > library(boot) > cv.error = cv.glm( HOMES, reg ) > cv.error$delta [1] Predicted mean squared error Ridge Regression > library(glmnet) > X = model.matrix( SALES_PRICE ~.-ID-STYLE+as.factor(STYLE), data=homes ) > cv.ridge = cv.glmnet( X, SALES_PRICE, alpha=0, lambda=1:0 ) > cv.ridge$lambda.min [1] 1 > cv.ridge = cv.glmnet( X, SALES_PRICE, alpha=0, lambda=seq(0,10,0.01) ) > cv.ridge$lambda.min [1] 0.76 > cv.ridge = cv.glmnet( X, SALES_PRICE, alpha=0, lambda=seq(0,1,0.001) ) > cv.ridge$lambda.min [1] Optimal tuning parameter lambda (although it s unstable. Depends on > min(cv.ridge$cvm) randomized CV, so your answer may be different (has to be between 0 and 1) [1] Prediction MSE > ridgereg = glmnet( X, SALES_PRICE, alpha=0, lambda=cv.ridge$lambda.min ) Our best model > predict( ridgereg, type="coefficients" ) (Intercept) e+03 FINISHED_AREA e-01 BEDROOMS e+00 BATHROOMS e+00 GARAGE_SIZE e+00 YEAR_BUILT e+00 LOT_SIZE e-03 AIR_CONDITIONER e+00 AIR_CONDITIONER e-01 POOL e+00 QUALITYLOW e+0 QUALITYMEDIUM e+0 HIGHWAY e+01 as.factor(style) e+01 as.factor(style) e+01 > plot(cv.ridge$lambda, cv.ridge$cvm)

7 LASSO > cv.lasso = cv.glmnet( X, SALES_PRICE, alpha=1 ) > cv.lasso$lambda.min [1] > cv.lasso = cv.glmnet( X, SALES_PRICE, alpha=1, lambda=seq(0,1,0.001) ) > cv.lasso$lambda.min [1] 0.79 Optimal tuning parameter lambda > plot(cv.lasso$lambda, cv.lasso$cvm) > min(cv.lasso$cvm) [1] Prediction MSE > lasso = glmnet( X, SALES_PRICE, alpha=1, lambda=cv.lasso$lambda.min ) > predict( lasso, type="coefficients" ) Our best lasso model (Intercept) e+03 FINISHED_AREA e-01 BEDROOMS e+00 BATHROOMS e+00 GARAGE_SIZE e+00 YEAR_BUILT e+00 LOT_SIZE e-03 AIR_CONDITIONER e+00 POOL e+00 QUALITYLOW e+0 QUALITYMEDIUM e+0 HIGHWAY e+01 as.factor(style) e+01 as.factor(style) e+01 > plot(cv.lasso$lambda, cv.lasso$cvm) PC Regression > pcreg = pcr( SALES_PRICE ~.-ID-STYLE+as.factor(STYLE), data=homes, scale=true, validation="cv" ) > summary(pcreg) VALIDATION: RMSEP Cross-validated using 10 random segments. TRAINING: % variance explained 1 comps comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps X SALES_PRICE comps 10 comps 11 comps 1 comps 13 comps X SALES_PRICE

Lecture 9: Classification and Regression Trees

Lecture 9: Classification and Regression Trees Lecture 9: Classification and Regression Trees Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical

More information

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty Milestone2 Zillow House Price Prediciton Group Lingzi Hong and Pranali Shetty MILESTONE 2 REPORT Data Collection The following additional features were added 1. Population, Number of College Graduates

More information

11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression

11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression Multiple Linear Regression Analysis BSAD 30 Dave Novak Fall 208 Source: Ragsdale, 208 Spreadsheet Modeling and Decision Analysis 8 th edition 207 Cengage Learning 2 Overview Last class we considered the

More information

χ 2 distributions and confidence intervals for population variance

χ 2 distributions and confidence intervals for population variance χ 2 distributions and confidence intervals for population variance Let Z be a standard Normal random variable, i.e., Z N(0, 1). Define Y = Z 2. Y is a non-negative random variable. Its distribution is

More information

Linear regression model

Linear regression model Regression Model Assumptions (Solutions) STAT-UB.0003: Regression and Forecasting Models Linear regression model 1. Here is the least squares regression fit to the Zagat restaurant data: 10 15 20 25 10

More information

Analysis of Variance in Matrix form

Analysis of Variance in Matrix form Analysis of Variance in Matrix form The ANOVA table sums of squares, SSTO, SSR and SSE can all be expressed in matrix form as follows. week 9 Multiple Regression A multiple regression model is a model

More information

Decision Trees An Early Classifier

Decision Trees An Early Classifier An Early Classifier Jason Corso SUNY at Buffalo January 19, 2012 J. Corso (SUNY at Buffalo) Trees January 19, 2012 1 / 33 Introduction to Non-Metric Methods Introduction to Non-Metric Methods We cover

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Business Statistics 41000 Spring 2017 1 Topics 1. Including multiple predictors 2. Controlling for confounders 3. Transformations, interactions, dummy variables OpenIntro 8.1,

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees In unsupervised classification (clustering), there is no response variable ( dependent variable), the regions corresponding to a given node are based on a similarity

More information

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998

The data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998 Economics 312 Sample Project Report Jeffrey Parker Introduction This project is based on Exercise 2.12 on page 81 of the Hill, Griffiths, and Lim text. It examines how the sale price of houses in Stockton,

More information

Statistics 101: Section L - Laboratory 6

Statistics 101: Section L - Laboratory 6 Statistics 101: Section L - Laboratory 6 In today s lab, we are going to look more at least squares regression, and interpretations of slopes and intercepts. Activity 1: From lab 1, we collected data on

More information

Predicting Foreign Exchange Arbitrage

Predicting Foreign Exchange Arbitrage Predicting Foreign Exchange Arbitrage Stefan Huber & Amy Wang 1 Introduction and Related Work The Covered Interest Parity condition ( CIP ) should dictate prices on the trillion-dollar foreign exchange

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 15: Tree-based Algorithms Cho-Jui Hsieh UC Davis March 7, 2018 Outline Decision Tree Random Forest Gradient Boosted Decision Tree (GBDT) Decision Tree Each node checks

More information

Homework Assignment Section 3

Homework Assignment Section 3 Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.

More information

A new look at tree based approaches

A new look at tree based approaches A new look at tree based approaches Xifeng Wang University of North Carolina Chapel Hill xifeng@live.unc.edu April 18, 2018 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 1 / 27 Outline of this

More information

REAL ESTATE MATH REVIEW

REAL ESTATE MATH REVIEW P a g e 1 REAL ESTATE MATH REVIEW Quick Reference... 2 Review Quiz 1... 4 Review Quiz 2... 5 Review Quiz 3... 6 Review Quiz 4... 9 Answer Key... 11 P a g e 2 QUICK REFERENCE INCOME APPROACH/CASH FLOW GI

More information

Stat3011: Solution of Midterm Exam One

Stat3011: Solution of Midterm Exam One 1 Stat3011: Solution of Midterm Exam One Fall/2003, Tiefeng Jiang Name: Problem 1 (30 points). Choose one appropriate answer in each of the following questions. 1. (B ) The mean age of five people in a

More information

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time Internet Appendix A Additional Results Figure A1: Stock of retail credit cards over time Stock of retail credit cards by month. Time of deletion policy noted with vertical line. Figure A2: Retail credit

More information

Lecture 12: The Bootstrap

Lecture 12: The Bootstrap Lecture 12: The Bootstrap Reading: Chapter 5 STATS 202: Data mining and analysis October 20, 2017 1 / 16 Announcements Midterm is on Monday, Oct 30 Topics: chapters 1-5 and 10 of the book everything until

More information

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times. Mixed-effects models An introduction by Christoph Scherber Up to now, we have been dealing with linear models of the form where ß0 and ß1 are parameters of fixed value. Example: Let us assume that we are

More information

Business Statistics: A First Course

Business Statistics: A First Course Business Statistics: A First Course Fifth Edition Chapter 12 Correlation and Simple Linear Regression Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc. Chap 12-1 Learning Objectives In this

More information

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley. Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1

More information

LendingClub Loan Default and Profitability Prediction

LendingClub Loan Default and Profitability Prediction LendingClub Loan Default and Profitability Prediction Peiqian Li peiqian@stanford.edu Gao Han gh352@stanford.edu Abstract Credit risk is something all peer-to-peer (P2P) lending investors (and bond investors

More information

Intro to GLM Day 2: GLM and Maximum Likelihood

Intro to GLM Day 2: GLM and Maximum Likelihood Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam Name The bar graph shows the number of tickets sold each week by the garden club for their annual flower show. ) During which week was the most number of tickets sold? ) A) Week B) Week C) Week 5

More information

Reminders. Quiz today - please bring a calculator I ll post the next HW by Saturday (last HW!)

Reminders. Quiz today - please bring a calculator I ll post the next HW by Saturday (last HW!) Reminders Quiz today - please bring a calculator I ll post the next HW by Saturday (last HW!) 1 Warm Up Chat with your neighbor. What is the Central Limit Theorem? Why do we care about it? What s the (long)

More information

Lasso and Ridge Quantile Regression using Cross Validation to Estimate Extreme Rainfall

Lasso and Ridge Quantile Regression using Cross Validation to Estimate Extreme Rainfall Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 3 (2016), pp. 3305 3314 Research India Publications http://www.ripublication.com/gjpam.htm Lasso and Ridge Quantile Regression

More information

Econometrics and Economic Data

Econometrics and Economic Data Econometrics and Economic Data Chapter 1 What is a regression? By using the regression model, we can evaluate the magnitude of change in one variable due to a certain change in another variable. For example,

More information

Read the following situation to determine whether the inequality correctly models the company s information.

Read the following situation to determine whether the inequality correctly models the company s information. Algebra 1 EOC Review #7 m w2q0y1c7h \KquntBaE qsoohfhthwwabrhei vlxlvcc.e a naslzlu BrniVgwhgtosV er_e\sxeerivdeado. Additional Practice 1) Selected-Response Name Date Period Sandra sells necklaces at

More information

BUSI 344 LESSON 8 SUPPLEMENT TIME ADJUSTMENT ILLUSTRATION

BUSI 344 LESSON 8 SUPPLEMENT TIME ADJUSTMENT ILLUSTRATION BUSI 344 LESSON 8 SUPPLEMENT TIME ADJUSTMENT ILLUSTRATION The "Ontario" database used in Lesson 8 did not have sufficient market movement to require a time adjustment. However, because this is a common

More information

Nonparametric Estimation of a Hedonic Price Function

Nonparametric Estimation of a Hedonic Price Function Nonparametric Estimation of a Hedonic Price Function Daniel J. Henderson,SubalC.Kumbhakar,andChristopherF.Parmeter Department of Economics State University of New York at Binghamton February 23, 2005 Abstract

More information

6 Multiple Regression

6 Multiple Regression More than one X variable. 6 Multiple Regression Why? Might be interested in more than one marginal effect Omitted Variable Bias (OVB) 6.1 and 6.2 House prices and OVB Should I build a fireplace? The following

More information

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING

EXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING Multiple (Linear) Regression Introductory example Page 1 1 options ps=256 ls=132 nocenter nodate nonumber; 3 DATA ONE; 4 TITLE1 ''; 5 INPUT X1 X2 X3 Y; 6 **** LABEL Y ='Plant available phosphorus' 7 X1='Inorganic

More information

Final Exam Suggested Solutions

Final Exam Suggested Solutions University of Washington Fall 003 Department of Economics Eric Zivot Economics 483 Final Exam Suggested Solutions This is a closed book and closed note exam. However, you are allowed one page of handwritten

More information

Gradient Boosting Trees: theory and applications

Gradient Boosting Trees: theory and applications Gradient Boosting Trees: theory and applications Dmitry Efimov November 05, 2016 Outline Decision trees Boosting Boosting trees Metaparameters and tuning strategies How-to-use remarks Regression tree True

More information

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.

Econ 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213. Econ 371 Problem Set #4 Answer Sheet 6.2 This question asks you to use the results from column (1) in the table on page 213. a. The first part of this question asks whether workers with college degrees

More information

Intro. Econometrics Fall 2015

Intro. Econometrics Fall 2015 ECO 5350 Prof. Tom Fomby Intro. Econometrics Fall 2015 MIDTERM EXAM TAKE-HOME PART KEY Assignment of Points: Q5.5 (2, 2, 3, 3) = 10 Q5.9 (2, 3, 2, 3) = 10 Q5.15 (2, 3, 3) = 8 Q5.18 (3, 3) = 6 Total = 34

More information

Chapter 7 - Lecture 1 General concepts and criteria

Chapter 7 - Lecture 1 General concepts and criteria Chapter 7 - Lecture 1 General concepts and criteria January 29th, 2010 Best estimator Mean Square error Unbiased estimators Example Unbiased estimators not unique Special case MVUE Bootstrap General Question

More information

General Business 706 Midterm #3 November 25, 1997

General Business 706 Midterm #3 November 25, 1997 General Business 706 Midterm #3 November 25, 1997 There are 9 questions on this exam for a total of 40 points. Please be sure to put your name and ID in the spaces provided below. Now, if you feel any

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or Solve the problem. 1. Find forα=0.01. A. 1.96 B. 2.575 C. 1.645 D. 2.33 2.Whatistheconfidencelevelofthefolowingconfidenceintervalforμ?

More information

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book. Simulation Methods Chapter 13 of Chris Brook s Book Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 April 26, 2017 Christopher

More information

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions

Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No (Your online answer will be used to verify your response.) Directions There are two parts to the final exam.

More information

Multiple regression - a brief introduction

Multiple regression - a brief introduction Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict

More information

Regression Model Assumptions Solutions

Regression Model Assumptions Solutions Regression Model Assumptions Solutions Below are the solutions to these exercises on model diagnostics using residual plots. # Exercise 1 # data("cars") head(cars) speed dist 1 4 2 2 4 10 3 7 4 4 7 22

More information

NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam

NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam Do not look at other pages until instructed to do so. The time limit is two hours. This exam consists of 6 problems. Do all of your work

More information

PRACTICE PROBLEMS FOR EXAM 2

PRACTICE PROBLEMS FOR EXAM 2 ST 0 F'08 PRACTICE PROLEMS FOR EAM EAM : THURSDAY /6 Reiland Material covered on test: Chapters 7-9, in text. This material is covered in webassign homework assignments 6-9. Lecture worksheets: - 6 WARNING!

More information

STAT 1220 FALL 2010 Common Final Exam December 10, 2010

STAT 1220 FALL 2010 Common Final Exam December 10, 2010 STAT 1220 FALL 2010 Common Final Exam December 10, 2010 PLEASE PRINT THE FOLLOWING INFORMATION: Name: Instructor: Student ID #: Section/Time: THIS EXAM HAS TWO PARTS. PART I. Part I consists of 30 multiple

More information

Tests for the Difference Between Two Linear Regression Intercepts

Tests for the Difference Between Two Linear Regression Intercepts Chapter 853 Tests for the Difference Between Two Linear Regression Intercepts Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression

More information

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

Introduction Random Walk One-Period Option Pricing Binomial Option Pricing Nice Math. Binomial Models. Christopher Ting.

Introduction Random Walk One-Period Option Pricing Binomial Option Pricing Nice Math. Binomial Models. Christopher Ting. Binomial Models Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October 14, 2016 Christopher Ting QF 101 Week 9 October

More information

Economics 345 Applied Econometrics

Economics 345 Applied Econometrics Economics 345 Applied Econometrics Problem Set 4--Solutions Prof: Martin Farnham Problem sets in this course are ungraded. An answer key will be posted on the course website within a few days of the release

More information

Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases.

Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases. Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases. Goal: Find unusual cases that might be mistakes, or that might

More information

Academic Research Review. Classifying Market Conditions Using Hidden Markov Model

Academic Research Review. Classifying Market Conditions Using Hidden Markov Model Academic Research Review Classifying Market Conditions Using Hidden Markov Model INTRODUCTION Best known for their applications in speech recognition, Hidden Markov Models (HMMs) are able to discern and

More information

MS&E 448 Final Presentation High Frequency Algorithmic Trading

MS&E 448 Final Presentation High Frequency Algorithmic Trading MS&E 448 Final Presentation High Frequency Algorithmic Trading Francis Choi George Preudhomme Nopphon Siranart Roger Song Daniel Wright Stanford University June 6, 2017 High-Frequency Trading MS&E448 June

More information

Problem max points points scored Total 120. Do all 6 problems.

Problem max points points scored Total 120. Do all 6 problems. Solutions to (modified) practice exam 4 Statistics 224 Practice exam 4 FINAL Your Name Friday 12/21/07 Professor Michael Iltis (Lecture 2) Discussion section (circle yours) : section: 321 (3:30 pm M) 322

More information

Introduction to Population Modeling

Introduction to Population Modeling Introduction to Population Modeling In addition to estimating the size of a population, it is often beneficial to estimate how the population size changes over time. Ecologists often uses models to create

More information

Credit Supply and House Prices: Evidence from Mortgage Market Segmentation Online Appendix

Credit Supply and House Prices: Evidence from Mortgage Market Segmentation Online Appendix Credit Supply and House Prices: Evidence from Mortgage Market Segmentation Online Appendix Manuel Adelino Duke University Antoinette Schoar MIT and NBER June 19, 2013 Felipe Severino MIT 1 Robustness and

More information

Prepared by Pamela Peterson Drake, James Madison University

Prepared by Pamela Peterson Drake, James Madison University Prepared by Pamela Peterson Drake, James Madison University Contents Step 1: Calculate the spot rates corresponding to the yields 2 Step 2: Calculate the one-year forward rates for each relevant year ahead

More information

New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

More information

City of Saratoga Springs

City of Saratoga Springs City of Saratoga Springs BUILDING DEPARTMENT CITY HALL - 474 BROADWAY - SARATOGA SPRINGS, NY 12866 PHONE 518-587-3550 FAX 518-580-9480 APPLICATION FOR DETACHED GARAGE 1. APPLICATION MUST BE FILLED OUT

More information

2018 Predictive Analytics Symposium Session 10: Cracking the Black Box with Awareness & Validation

2018 Predictive Analytics Symposium Session 10: Cracking the Black Box with Awareness & Validation 2018 Predictive Analytics Symposium Session 10: Cracking the Black Box with Awareness & Validation SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer Cracking the Black Box with Awareness

More information

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron

Statistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. CHAPTER FORM A Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Determine whether the given ordered pair is a solution of the given equation.

More information

Going from General to Specific

Going from General to Specific Going from General to Specific Regression of Interest rate on All 7 Variables Comments: R Square is good at 63.8% The residual plot on the right is not looking entirely random Unemployment variable has

More information

Using Random Forests in conintegrated pairs trading

Using Random Forests in conintegrated pairs trading Using Random Forests in conintegrated pairs trading By: Reimer Meulenbeek Supervisor Radboud University: Prof. dr. E.A. Cator Supervisors FRIJT BV: Dr. O. de Mirleau Drs. M. Meuwissen November 5, 2017

More information

MATH 3200 Exam 3 Dr. Syring

MATH 3200 Exam 3 Dr. Syring . Suppose n eligible voters are polled (randomly sampled) from a population of size N. The poll asks voters whether they support or do not support increasing local taxes to fund public parks. Let M be

More information

Boosting Actuarial Regression Models in R

Boosting Actuarial Regression Models in R Carryl Oberson Faculty of Business and Economics University of Basel R in Insurance 2015 Build regression models (GLMs) for car insurance data. 3 types of response variables: claim incidence: y i = 0,

More information

The FREQ Procedure. Table of Sex by Gym Sex(Sex) Gym(Gym) No Yes Total Male Female Total

The FREQ Procedure. Table of Sex by Gym Sex(Sex) Gym(Gym) No Yes Total Male Female Total Jenn Selensky gathered data from students in an introduction to psychology course. The data are weights, sex/gender, and whether or not the student worked-out in the gym. Here is the output from a 2 x

More information

Effects of Financial Parameters on Poverty - Using SAS EM

Effects of Financial Parameters on Poverty - Using SAS EM Effects of Financial Parameters on Poverty - Using SAS EM By - Akshay Arora Student, MS in Business Analytics Spears School of Business Oklahoma State University Abstract Studies recommend that developing

More information

Loan Approval and Quality Prediction in the Lending Club Marketplace

Loan Approval and Quality Prediction in the Lending Club Marketplace Loan Approval and Quality Prediction in the Lending Club Marketplace Milestone Write-up Yondon Fu, Shuo Zheng and Matt Marcus Recap Lending Club is a peer-to-peer lending marketplace where individual investors

More information

Loan Approval and Quality Prediction in the Lending Club Marketplace

Loan Approval and Quality Prediction in the Lending Club Marketplace Loan Approval and Quality Prediction in the Lending Club Marketplace Final Write-up Yondon Fu, Matt Marcus and Shuo Zheng Introduction Lending Club is a peer-to-peer lending marketplace where individual

More information

Multiple Regression. Review of Regression with One Predictor

Multiple Regression. Review of Regression with One Predictor Fall Semester, 2001 Statistics 621 Lecture 4 Robert Stine 1 Preliminaries Multiple Regression Grading on this and other assignments Assignment will get placed in folder of first member of Learning Team.

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences STAB22H3 Statistics I Duration: 1 hour and 45 minutes Last Name: First Name: Student number: Aids allowed: - One handwritten

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Sample Exam 3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Question 1-7: The managers of a brokerage firm are interested in finding out if the

More information

Top-down particle filtering for Bayesian decision trees

Top-down particle filtering for Bayesian decision trees Top-down particle filtering for Bayesian decision trees Balaji Lakshminarayanan 1, Daniel M. Roy 2 and Yee Whye Teh 3 1. Gatsby Unit, UCL, 2. University of Cambridge and 3. University of Oxford Outline

More information

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree Introduction MS4424 Data Mining & Modelling Decision Tree Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk decision tree is a set of rules represented in a tree structure

More information

Competition price analysis in non-life insurance

Competition price analysis in non-life insurance White Paper on Non-Life Insurance: Competition A Reacfin price White analysis Paper in on non-life Non-Life insurance Insurance: - How machine learning and statistical predictive models can help Competition

More information

2. Does each situation represent direct variation or partial variation? a) Lily is paid $5 per hour for raking leaves.

2. Does each situation represent direct variation or partial variation? a) Lily is paid $5 per hour for raking leaves. MPM1D Date: Name: Relating Linear Equations, Graphs and Table of Values 1. Alan works part-time at a gas station. He earns $10/h. His pay varies directly with the time, in hours, he works. a) Choose appropriate

More information

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY

LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY 1 THIS WEEK S PLAN Part I: Theory + Practice ( Interval Estimation ) Part II: Theory + Practice ( Interval Estimation ) z-based Confidence Intervals for a Population

More information

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1 Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide

More information

GOALS. Discrete Probability Distributions. A Distribution. What is a Probability Distribution? Probability for Dice Toss. A Probability Distribution

GOALS. Discrete Probability Distributions. A Distribution. What is a Probability Distribution? Probability for Dice Toss. A Probability Distribution GOALS Discrete Probability Distributions Chapter 6 Dr. Richard Jerz Define the terms probability distribution and random variable. Distinguish between discrete and continuous probability distributions.

More information

Off to College? First Apartment? First House? Not So Fast!

Off to College? First Apartment? First House? Not So Fast! Home Sweet Home Off to College? First Apartment? First House? Not So Fast! 1. Do you know how to open a bank account? Yes No 2. Do you know how to balance a checkbook? Yes No 3. Do you know how to get

More information

Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS)

Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS) Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds INTRODUCTION Multicategory Logit

More information

Discrete Probability Distributions Chapter 6 Dr. Richard Jerz

Discrete Probability Distributions Chapter 6 Dr. Richard Jerz Discrete Probability Distributions Chapter 6 Dr. Richard Jerz 1 GOALS Define the terms probability distribution and random variable. Distinguish between discrete and continuous probability distributions.

More information

Cross-validation, ridge regression, and bootstrap

Cross-validation, ridge regression, and bootstrap Cross-validation, ridge regression, and bootstrap > par(mfrow=c(2,2)) > head(ironslag) chemical magnetic 1 24 25 2 16 22 3 24 17 4 18 21 5 18 20 6 10 13 > attach(ironslag) > a=seq(min(chemical), max(chemical),

More information

Solution to Exercise E5.

Solution to Exercise E5. Solution to Exercise E5. The Multiple Regression Model. Estimation. Exercise E5.1. Beach umbrella rental Part I. Simple Linear Regression Model. a. Regression model: U t = β 1 + β 2 T t + u t t = 1,...,

More information

Measures of Dispersion (Range, standard deviation, standard error) Introduction

Measures of Dispersion (Range, standard deviation, standard error) Introduction Measures of Dispersion (Range, standard deviation, standard error) Introduction We have already learnt that frequency distribution table gives a rough idea of the distribution of the variables in a sample

More information

Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA

Session 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA Session 178 TS, Stats for Health Actuaries Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA Presenter: Joan C. Barrett, FSA, MAAA Session 178 Statistics for Health Actuaries October 14, 2015 Presented

More information

The Least Squares Regression Line

The Least Squares Regression Line The Least Squares Regression Line Section 5.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 pm - 3:30 pm 620 PGH & 5:30 pm - 7:00 pm CASA Department of Mathematics University of Houston

More information

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS

PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS Melfi Alrasheedi School of Business, King Faisal University, Saudi

More information

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.

STAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved. STAT 509: Statistics for Engineers Dr. Dewei Wang Applied Statistics and Probability for Engineers Sixth Edition Douglas C. Montgomery George C. Runger 7 Point CHAPTER OUTLINE 7-1 Point Estimation 7-2

More information

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion Web Appendix Are the effects of monetary policy shocks big or small? Olivier Coibion Appendix 1: Description of the Model-Averaging Procedure This section describes the model-averaging procedure used in

More information

EE/AA 578 Univ. of Washington, Fall Homework 8

EE/AA 578 Univ. of Washington, Fall Homework 8 EE/AA 578 Univ. of Washington, Fall 2016 Homework 8 1. Multi-label SVM. The basic Support Vector Machine (SVM) described in the lecture (and textbook) is used for classification of data with two labels.

More information

Econometric Methods for Valuation Analysis

Econometric Methods for Valuation Analysis Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 26 Correlation Analysis Simple Regression

More information

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Midterm

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Midterm Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay Solutions to Midterm Problem A: (34 pts) Answer briefly the following questions. Each question has

More information

FINALTERM EXAMINATION Spring 2009 MGT201- Financial Management (Session - 2) Question No: 1 ( Marks: 1 ) - Please choose one What is the long-run objective of financial management? Maximize earnings per

More information

The relationship between GDP, labor force and health expenditure in European countries

The relationship between GDP, labor force and health expenditure in European countries Econometrics-Term paper The relationship between GDP, labor force and health expenditure in European countries Student: Nguyen Thu Ha Contents 1. Background:... 2 2. Discussion:... 2 3. Regression equation

More information

Chapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats

Chapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats EXST3201 Chapter 11b Geaghan Fall 2005: Page 1 Chapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats This study investigates the permeability of the blood-brain barrier

More information

Package FADA. May 20, 2016

Package FADA. May 20, 2016 Type Package Package FADA May 20, 2016 Title Variable Selection for Supervised Classification in High Dimension Version 1.3.2 Date 2016-05-12 Author Emeline Perthame (INRIA, Grenoble, France), Chloe Friguet

More information

Problem Set 9 Heteroskedasticty Answers

Problem Set 9 Heteroskedasticty Answers Problem Set 9 Heteroskedasticty Answers /* INVESTIGATION OF HETEROSKEDASTICITY */ First graph data. u hetdat2. gra manuf gdp, s([country].) xlab ylab 300000 manufacturing output (US$ miilio 200000 100000

More information