Test #1 (Solution Key)
|
|
- Shana Patrick
- 6 years ago
- Views:
Transcription
1 STAT 47/67 Test #1 (Solution Key) 1. (To be done by hand) Exploring his own drink-and-drive habits, a student recalls the last 7 parties that he attended. He records the number of cans of beer he drank, his highway speed on the way home, and whether he was stopped by police. Party Beer, cans Speed, mph Stopped by police yes yes no no no yes yes Use the following statistical machine learning methods to predict whether or not the person will be stopped by police. (a) Tree. Draw a decision tree that can be used for this classification. Label each node, to make it clear how to use this tree for classification. What is the training error rate of this tree? Speed Beer (b) Pruning. Suppose the tree is pruned to just one split. If speed is the only variable used, what split minimizes the Gini index? What is the Gini index for this split, and what is the training error rate of this tree? (c) Bagging. Four bootstrap samples from the given data are (listed by the party number): Bootstrap sample #1 ( 1, 1,, 3, 4, 5, 6 ) Bootstrap sample # ( 3, 4, 4, 5, 5, 7, 7 ) Bootstrap sample #3 ( 3, 3, 4, 6, 7, 7, 7 ) Bootstrap sample #4 ( 1,,,, 5, 5, 6 ) Draw a tree based on each bootstrap sample. Predict the out-of-bag data by the majority vote. Use these results to estimate the testing error rate of this method. SOLUTION. (a) or Beer<.5 Beer<.5 All the terminal nodes are pure, so the training error rate is 0. (b) There are 3 possible splits - at speed = 57.5, 65, or 7.5 (or 57, 65, and 73). At 57.5, one node is pure, with ˆp no = 1, ˆp yes = 0, and for the other node, there are blue and 4 red points, so ˆp no = 1/3 and ˆp yes = /3. The Gini index is G = (1)(0)+(1/3)(/3) = /9. At 65, one node is pure, and for the other node, ˆp no = 3/4 and ˆp yes = 1/4. G = 0+(3/4)(1/4)) = 3/16. At 7.5, one node is pure, and for the other node, ˆp no = 1/ and ˆp yes = 1/. G = 0+(1/)(1/)) = 1/4. Thus, the best split minimizing the Gini index is by Speed = 65. The classification is:
2 - if Speed < 65, then classify to. - if Speed 65, then classify to. This rule classifies data points 1-6 correctly and misclassifies point 7. The training error rate is 1/7 or 14.3%. (c) Bootstrap sample #1 ( 1, 1,, 3, 4, 5, 6 ) Both nodes are pure, hence no more splits. Predict OOB Ŷ7 =. Bootstrap sample # ( 3, 4, 4, 5, 5, 7, 7 ) Beer< Speed Beer Both nodes are pure, hence no more splits. Predict OOB Ŷ1 = Ŷ = Ŷ6 =. Bootstrap sample #3 ( 3, 3, 4, 6, 7, 7, 7 ) Beer < 1.5 Both nodes are pure, hence no more splits. Predict OOB Ŷ1 = Ŷ = and Ŷ5 =. Bootstrap sample #4 ( 1,,,, 5, 5, 6 ) Both nodes are pure, hence no more splits. Predict OOB Ŷ3 = Ŷ4 = Ŷ7 =. By the majority vote, we have predictions: Ŷ 1 = Ŷ = Ŷ3 = Ŷ4 = Ŷ6 = Ŷ7 = and Ŷ5 =. Among the OOB data, Y 3,4 are classified correctly, and Y 1,,5,6,7 are misclassified, so the testing error rate is estimated to be 5/7 or 71.4%.
3 . (Use R for data analysis) A real estate appraiser is interested in a reliable working method of evaluating residential home prices as a function of various features. Data on 5 recent home sales are available on our course web site All the files there contain identical data in different formats. The following variables are included. Column Variable 1 Identification number 1 5 Sales price of residence ( $1000 dollars) 3 Finished area of residence (square feet) 4 Total number of bedrooms in residence 5 Total number of bathrooms in residence 6 Air conditioning: present or absent 7 Number of cars that garage will hold 8 Swimming pool: present or absent 9 Year property was originally constructed 10 Quality of construction: high, medium, or low 11 Architectural style. Three styles are coded as 1,, and 3 1 Lot size (square feet) 13 Location near a highway: yes or no While predicting Sales Price, we would like to reduce multi-collinearity in order to obtain a more reliable prediction. Compare performance of the following methods. (a) Variable selection. List variables removed from the model, and write the final regression equation that a real estate appraiser should use. What method do you apply? Predict the sales price of a high quality 4-bedroom house built in 1990 in style #3, with a -car garage, an air conditioner, a swimming pool, 3 bathrooms, finished area of 500 square feet, and a 0,000 square foot lot, that is far from a highway. (b) Ridge regression. Find the optimal tuning parameter λ. According to the model, how is the home price expected to change if one additional bathroom and a swimming pool are built, without changing other variables? (c) Lasso. Find the optimal tuning parameter λ. List variables that lasso eliminated from the model and variables that were retained. According to the model, is a home expected to gain value if it is close to a highway? Use your results to answer this question. (d) Principal components regression. Fit a principal components regression based on standardized (rescaled) X-variables. How many principal components are needed to explain 80% of the total variation of standardized X-variables? How many principal components are needed to explain 80% of the total variation of sales prices? (Stat-67 only) In (a,b,c), estimate the prediction mean squared error by some cross-validation. Which method provides the lowest prediction MSE?
4 SOLUTION. (a) Stepwise selection method removes Bedrooms, Pool, and Air Conditioner from the model. The resulting regression equation is Price = FinishedArea I Quality=low 13.6 I Quality=medium +1.5 Year LotSize 15.9 I Style= 38.1 I Style= Bathrooms 36.5 Highway +9.4 GarageSize. Predicted sales price is Ŷ =$44s,413. The prediction mean squared error is M SE = 3447 (in millions squared dollars), which corresponds to the root mean squared error of MSE = 58.7 thousand dollars. Computed by the method of leave-oneout cross-validation. (b) The optimal tuning parameter is about λ = It provides the prediction mean squared error of MSE = 3486 (in millions squared dollars) The home price is expected to increase by an estimated value of ˆβ bathroom + ˆβ pool = = 18.4 thousand dollars (c) The optimal tuning parameter is about λ = It provides the prediction mean squared error of MSE = 3467 (in millions squared dollars) Lasso has not removed any variables. The slope ˆβ highway = 33. is negative, so the home valueis expected to decrease if it is close to a highway (decrease by 33. thousand dollars). (d) The first seven principal components explain 83% of the total variation of X. The first twelve principal components explain 8% of the total variation of the response, the sales price.
5 R Codes (FYI) Reading data > HOMES = read.table(url(" > names(homes) [1] "V1" "V" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11" "V1" "V13" or > HOMES = read.csv(url(" > names(homes) [1] "ID" "SALES_PRICE" "FINISHED_AREA" "BEDROOMS" "BATHROOMS" [6] "GARAGE_SIZE" "YEAR_BUILT" "STYLE" "LOT_SIZE" "AIR_CONDITIONER" [11] "POOL" "QUALITY" "HIGHWAY" > head(homes) ID SALES_PRICE FINISHED_AREA BEDROOMS BATHROOMS GARAGE_SIZE YEAR_BUILT STYLE LOT_SIZE AIR_CONDITIONER POOL QUALITY HIGHWAY 1 MEDIUM MEDIUM 3 MEDIUM 4 MEDIUM 5 MEDIUM 6 MEDIUM > attach(homes) Regression and variable selection > null = lm( SALES_PRICE ~ 1, data=homes ) > full = lm( SALES_PRICE ~.-ID, data=homes ) > step( null, scope=list(lower=null, upper=full), direction="forward" ) SALES_PRICE ~ FINISHED_AREA + QUALITY + YEAR_BUILT + LOT_SIZE + as.factor(style) + BATHROOMS + HIGHWAY + GARAGE_SIZE Df Sum of Sq RSS AIC <none> BEDROOMS POOL AIR_CONDITIONER Coefficients: (Intercept) FINISHED_AREA QUALITYLOW QUALITYMEDIUM -.346e e e e+0 YEAR_BUILT LOT_SIZE as.factor(style) as.factor(style)3 1.49e e e e+01 BATHROOMS HIGHWAY GARAGE_SIZE 8.81e e e+00
6 > reg = glm( SALES_PRICE ~ FINISHED_AREA + BATHROOMS + GARAGE_SIZE + YEAR_BUILT + LOT_SIZE + QUALITY + HIGHWAY + as.factor(style), data=homes ) > Yhat = predict( reg, data.frame( FINISHED_AREA=500, BATHROOMS=3, GARAGE_SIZE=, YEAR_BUILT=1990, LOT_SIZE=0000, QUALITY="HIGH", HIGHWAY="", STYLE=3 ) ) > Yhat Predicted price of a given home > library(boot) > cv.error = cv.glm( HOMES, reg ) > cv.error$delta [1] Predicted mean squared error Ridge Regression > library(glmnet) > X = model.matrix( SALES_PRICE ~.-ID-STYLE+as.factor(STYLE), data=homes ) > cv.ridge = cv.glmnet( X, SALES_PRICE, alpha=0, lambda=1:0 ) > cv.ridge$lambda.min [1] 1 > cv.ridge = cv.glmnet( X, SALES_PRICE, alpha=0, lambda=seq(0,10,0.01) ) > cv.ridge$lambda.min [1] 0.76 > cv.ridge = cv.glmnet( X, SALES_PRICE, alpha=0, lambda=seq(0,1,0.001) ) > cv.ridge$lambda.min [1] Optimal tuning parameter lambda (although it s unstable. Depends on > min(cv.ridge$cvm) randomized CV, so your answer may be different (has to be between 0 and 1) [1] Prediction MSE > ridgereg = glmnet( X, SALES_PRICE, alpha=0, lambda=cv.ridge$lambda.min ) Our best model > predict( ridgereg, type="coefficients" ) (Intercept) e+03 FINISHED_AREA e-01 BEDROOMS e+00 BATHROOMS e+00 GARAGE_SIZE e+00 YEAR_BUILT e+00 LOT_SIZE e-03 AIR_CONDITIONER e+00 AIR_CONDITIONER e-01 POOL e+00 QUALITYLOW e+0 QUALITYMEDIUM e+0 HIGHWAY e+01 as.factor(style) e+01 as.factor(style) e+01 > plot(cv.ridge$lambda, cv.ridge$cvm)
7 LASSO > cv.lasso = cv.glmnet( X, SALES_PRICE, alpha=1 ) > cv.lasso$lambda.min [1] > cv.lasso = cv.glmnet( X, SALES_PRICE, alpha=1, lambda=seq(0,1,0.001) ) > cv.lasso$lambda.min [1] 0.79 Optimal tuning parameter lambda > plot(cv.lasso$lambda, cv.lasso$cvm) > min(cv.lasso$cvm) [1] Prediction MSE > lasso = glmnet( X, SALES_PRICE, alpha=1, lambda=cv.lasso$lambda.min ) > predict( lasso, type="coefficients" ) Our best lasso model (Intercept) e+03 FINISHED_AREA e-01 BEDROOMS e+00 BATHROOMS e+00 GARAGE_SIZE e+00 YEAR_BUILT e+00 LOT_SIZE e-03 AIR_CONDITIONER e+00 POOL e+00 QUALITYLOW e+0 QUALITYMEDIUM e+0 HIGHWAY e+01 as.factor(style) e+01 as.factor(style) e+01 > plot(cv.lasso$lambda, cv.lasso$cvm) PC Regression > pcreg = pcr( SALES_PRICE ~.-ID-STYLE+as.factor(STYLE), data=homes, scale=true, validation="cv" ) > summary(pcreg) VALIDATION: RMSEP Cross-validated using 10 random segments. TRAINING: % variance explained 1 comps comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps X SALES_PRICE comps 10 comps 11 comps 1 comps 13 comps X SALES_PRICE
Lecture 9: Classification and Regression Trees
Lecture 9: Classification and Regression Trees Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical
More informationMilestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty
Milestone2 Zillow House Price Prediciton Group Lingzi Hong and Pranali Shetty MILESTONE 2 REPORT Data Collection The following additional features were added 1. Population, Number of College Graduates
More information11/28/2018. Overview. Multiple Linear Regression Analysis. Multiple regression. Multiple regression. Multiple regression. Multiple regression
Multiple Linear Regression Analysis BSAD 30 Dave Novak Fall 208 Source: Ragsdale, 208 Spreadsheet Modeling and Decision Analysis 8 th edition 207 Cengage Learning 2 Overview Last class we considered the
More informationχ 2 distributions and confidence intervals for population variance
χ 2 distributions and confidence intervals for population variance Let Z be a standard Normal random variable, i.e., Z N(0, 1). Define Y = Z 2. Y is a non-negative random variable. Its distribution is
More informationLinear regression model
Regression Model Assumptions (Solutions) STAT-UB.0003: Regression and Forecasting Models Linear regression model 1. Here is the least squares regression fit to the Zagat restaurant data: 10 15 20 25 10
More informationAnalysis of Variance in Matrix form
Analysis of Variance in Matrix form The ANOVA table sums of squares, SSTO, SSR and SSE can all be expressed in matrix form as follows. week 9 Multiple Regression A multiple regression model is a model
More informationDecision Trees An Early Classifier
An Early Classifier Jason Corso SUNY at Buffalo January 19, 2012 J. Corso (SUNY at Buffalo) Trees January 19, 2012 1 / 33 Introduction to Non-Metric Methods Introduction to Non-Metric Methods We cover
More informationMultiple linear regression
Multiple linear regression Business Statistics 41000 Spring 2017 1 Topics 1. Including multiple predictors 2. Controlling for confounders 3. Transformations, interactions, dummy variables OpenIntro 8.1,
More informationClassification and Regression Trees
Classification and Regression Trees In unsupervised classification (clustering), there is no response variable ( dependent variable), the regions corresponding to a given node are based on a similarity
More informationThe data definition file provided by the authors is reproduced below: Obs: 1500 home sales in Stockton, CA from Oct 1, 1996 to Nov 30, 1998
Economics 312 Sample Project Report Jeffrey Parker Introduction This project is based on Exercise 2.12 on page 81 of the Hill, Griffiths, and Lim text. It examines how the sale price of houses in Stockton,
More informationStatistics 101: Section L - Laboratory 6
Statistics 101: Section L - Laboratory 6 In today s lab, we are going to look more at least squares regression, and interpretations of slopes and intercepts. Activity 1: From lab 1, we collected data on
More informationPredicting Foreign Exchange Arbitrage
Predicting Foreign Exchange Arbitrage Stefan Huber & Amy Wang 1 Introduction and Related Work The Covered Interest Parity condition ( CIP ) should dictate prices on the trillion-dollar foreign exchange
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 15: Tree-based Algorithms Cho-Jui Hsieh UC Davis March 7, 2018 Outline Decision Tree Random Forest Gradient Boosted Decision Tree (GBDT) Decision Tree Each node checks
More informationHomework Assignment Section 3
Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.
More informationA new look at tree based approaches
A new look at tree based approaches Xifeng Wang University of North Carolina Chapel Hill xifeng@live.unc.edu April 18, 2018 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 1 / 27 Outline of this
More informationREAL ESTATE MATH REVIEW
P a g e 1 REAL ESTATE MATH REVIEW Quick Reference... 2 Review Quiz 1... 4 Review Quiz 2... 5 Review Quiz 3... 6 Review Quiz 4... 9 Answer Key... 11 P a g e 2 QUICK REFERENCE INCOME APPROACH/CASH FLOW GI
More informationStat3011: Solution of Midterm Exam One
1 Stat3011: Solution of Midterm Exam One Fall/2003, Tiefeng Jiang Name: Problem 1 (30 points). Choose one appropriate answer in each of the following questions. 1. (B ) The mean age of five people in a
More informationInternet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time
Internet Appendix A Additional Results Figure A1: Stock of retail credit cards over time Stock of retail credit cards by month. Time of deletion policy noted with vertical line. Figure A2: Retail credit
More informationLecture 12: The Bootstrap
Lecture 12: The Bootstrap Reading: Chapter 5 STATS 202: Data mining and analysis October 20, 2017 1 / 16 Announcements Midterm is on Monday, Oct 30 Topics: chapters 1-5 and 10 of the book everything until
More informationLet us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.
Mixed-effects models An introduction by Christoph Scherber Up to now, we have been dealing with linear models of the form where ß0 and ß1 are parameters of fixed value. Example: Let us assume that we are
More informationBusiness Statistics: A First Course
Business Statistics: A First Course Fifth Edition Chapter 12 Correlation and Simple Linear Regression Business Statistics: A First Course, 5e 2009 Prentice-Hall, Inc. Chap 12-1 Learning Objectives In this
More informationCopyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.
Appendix: Statistics in Action Part I Financial Time Series 1. These data show the effects of stock splits. If you investigate further, you ll find that most of these splits (such as in May 1970) are 3-for-1
More informationLendingClub Loan Default and Profitability Prediction
LendingClub Loan Default and Profitability Prediction Peiqian Li peiqian@stanford.edu Gao Han gh352@stanford.edu Abstract Credit risk is something all peer-to-peer (P2P) lending investors (and bond investors
More informationIntro to GLM Day 2: GLM and Maximum Likelihood
Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Exam Name The bar graph shows the number of tickets sold each week by the garden club for their annual flower show. ) During which week was the most number of tickets sold? ) A) Week B) Week C) Week 5
More informationReminders. Quiz today - please bring a calculator I ll post the next HW by Saturday (last HW!)
Reminders Quiz today - please bring a calculator I ll post the next HW by Saturday (last HW!) 1 Warm Up Chat with your neighbor. What is the Central Limit Theorem? Why do we care about it? What s the (long)
More informationLasso and Ridge Quantile Regression using Cross Validation to Estimate Extreme Rainfall
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 3 (2016), pp. 3305 3314 Research India Publications http://www.ripublication.com/gjpam.htm Lasso and Ridge Quantile Regression
More informationEconometrics and Economic Data
Econometrics and Economic Data Chapter 1 What is a regression? By using the regression model, we can evaluate the magnitude of change in one variable due to a certain change in another variable. For example,
More informationRead the following situation to determine whether the inequality correctly models the company s information.
Algebra 1 EOC Review #7 m w2q0y1c7h \KquntBaE qsoohfhthwwabrhei vlxlvcc.e a naslzlu BrniVgwhgtosV er_e\sxeerivdeado. Additional Practice 1) Selected-Response Name Date Period Sandra sells necklaces at
More informationBUSI 344 LESSON 8 SUPPLEMENT TIME ADJUSTMENT ILLUSTRATION
BUSI 344 LESSON 8 SUPPLEMENT TIME ADJUSTMENT ILLUSTRATION The "Ontario" database used in Lesson 8 did not have sufficient market movement to require a time adjustment. However, because this is a common
More informationNonparametric Estimation of a Hedonic Price Function
Nonparametric Estimation of a Hedonic Price Function Daniel J. Henderson,SubalC.Kumbhakar,andChristopherF.Parmeter Department of Economics State University of New York at Binghamton February 23, 2005 Abstract
More information6 Multiple Regression
More than one X variable. 6 Multiple Regression Why? Might be interested in more than one marginal effect Omitted Variable Bias (OVB) 6.1 and 6.2 House prices and OVB Should I build a fireplace? The following
More informationEXST7015: Multiple Regression from Snedecor & Cochran (1967) RAW DATA LISTING
Multiple (Linear) Regression Introductory example Page 1 1 options ps=256 ls=132 nocenter nodate nonumber; 3 DATA ONE; 4 TITLE1 ''; 5 INPUT X1 X2 X3 Y; 6 **** LABEL Y ='Plant available phosphorus' 7 X1='Inorganic
More informationFinal Exam Suggested Solutions
University of Washington Fall 003 Department of Economics Eric Zivot Economics 483 Final Exam Suggested Solutions This is a closed book and closed note exam. However, you are allowed one page of handwritten
More informationGradient Boosting Trees: theory and applications
Gradient Boosting Trees: theory and applications Dmitry Efimov November 05, 2016 Outline Decision trees Boosting Boosting trees Metaparameters and tuning strategies How-to-use remarks Regression tree True
More informationEcon 371 Problem Set #4 Answer Sheet. 6.2 This question asks you to use the results from column (1) in the table on page 213.
Econ 371 Problem Set #4 Answer Sheet 6.2 This question asks you to use the results from column (1) in the table on page 213. a. The first part of this question asks whether workers with college degrees
More informationIntro. Econometrics Fall 2015
ECO 5350 Prof. Tom Fomby Intro. Econometrics Fall 2015 MIDTERM EXAM TAKE-HOME PART KEY Assignment of Points: Q5.5 (2, 2, 3, 3) = 10 Q5.9 (2, 3, 2, 3) = 10 Q5.15 (2, 3, 3) = 8 Q5.18 (3, 3) = 6 Total = 34
More informationChapter 7 - Lecture 1 General concepts and criteria
Chapter 7 - Lecture 1 General concepts and criteria January 29th, 2010 Best estimator Mean Square error Unbiased estimators Example Unbiased estimators not unique Special case MVUE Bootstrap General Question
More informationGeneral Business 706 Midterm #3 November 25, 1997
General Business 706 Midterm #3 November 25, 1997 There are 9 questions on this exam for a total of 40 points. Please be sure to put your name and ID in the spaces provided below. Now, if you feel any
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Solve the problem.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or Solve the problem. 1. Find forα=0.01. A. 1.96 B. 2.575 C. 1.645 D. 2.33 2.Whatistheconfidencelevelofthefolowingconfidenceintervalforμ?
More informationIntroduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.
Simulation Methods Chapter 13 of Chris Brook s Book Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 April 26, 2017 Christopher
More informationYour Name (Please print) Did you agree to take the optional portion of the final exam Yes No. Directions
Your Name (Please print) Did you agree to take the optional portion of the final exam Yes No (Your online answer will be used to verify your response.) Directions There are two parts to the final exam.
More informationMultiple regression - a brief introduction
Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict
More informationRegression Model Assumptions Solutions
Regression Model Assumptions Solutions Below are the solutions to these exercises on model diagnostics using residual plots. # Exercise 1 # data("cars") head(cars) speed dist 1 4 2 2 4 10 3 7 4 4 7 22
More informationNCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam
NCC5010: Data Analytics and Modeling Spring 2015 Exemption Exam Do not look at other pages until instructed to do so. The time limit is two hours. This exam consists of 6 problems. Do all of your work
More informationPRACTICE PROBLEMS FOR EXAM 2
ST 0 F'08 PRACTICE PROLEMS FOR EAM EAM : THURSDAY /6 Reiland Material covered on test: Chapters 7-9, in text. This material is covered in webassign homework assignments 6-9. Lecture worksheets: - 6 WARNING!
More informationSTAT 1220 FALL 2010 Common Final Exam December 10, 2010
STAT 1220 FALL 2010 Common Final Exam December 10, 2010 PLEASE PRINT THE FOLLOWING INFORMATION: Name: Instructor: Student ID #: Section/Time: THIS EXAM HAS TWO PARTS. PART I. Part I consists of 30 multiple
More informationTests for the Difference Between Two Linear Regression Intercepts
Chapter 853 Tests for the Difference Between Two Linear Regression Intercepts Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression
More informationStat 101 Exam 1 - Embers Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.
More informationIntroduction Random Walk One-Period Option Pricing Binomial Option Pricing Nice Math. Binomial Models. Christopher Ting.
Binomial Models Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October 14, 2016 Christopher Ting QF 101 Week 9 October
More informationEconomics 345 Applied Econometrics
Economics 345 Applied Econometrics Problem Set 4--Solutions Prof: Martin Farnham Problem sets in this course are ungraded. An answer key will be posted on the course website within a few days of the release
More informationLecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases.
Lecture 13: Identifying unusual observations In lecture 12, we learned how to investigate variables. Now we learn how to investigate cases. Goal: Find unusual cases that might be mistakes, or that might
More informationAcademic Research Review. Classifying Market Conditions Using Hidden Markov Model
Academic Research Review Classifying Market Conditions Using Hidden Markov Model INTRODUCTION Best known for their applications in speech recognition, Hidden Markov Models (HMMs) are able to discern and
More informationMS&E 448 Final Presentation High Frequency Algorithmic Trading
MS&E 448 Final Presentation High Frequency Algorithmic Trading Francis Choi George Preudhomme Nopphon Siranart Roger Song Daniel Wright Stanford University June 6, 2017 High-Frequency Trading MS&E448 June
More informationProblem max points points scored Total 120. Do all 6 problems.
Solutions to (modified) practice exam 4 Statistics 224 Practice exam 4 FINAL Your Name Friday 12/21/07 Professor Michael Iltis (Lecture 2) Discussion section (circle yours) : section: 321 (3:30 pm M) 322
More informationIntroduction to Population Modeling
Introduction to Population Modeling In addition to estimating the size of a population, it is often beneficial to estimate how the population size changes over time. Ecologists often uses models to create
More informationCredit Supply and House Prices: Evidence from Mortgage Market Segmentation Online Appendix
Credit Supply and House Prices: Evidence from Mortgage Market Segmentation Online Appendix Manuel Adelino Duke University Antoinette Schoar MIT and NBER June 19, 2013 Felipe Severino MIT 1 Robustness and
More informationPrepared by Pamela Peterson Drake, James Madison University
Prepared by Pamela Peterson Drake, James Madison University Contents Step 1: Calculate the spot rates corresponding to the yields 2 Step 2: Calculate the one-year forward rates for each relevant year ahead
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationCity of Saratoga Springs
City of Saratoga Springs BUILDING DEPARTMENT CITY HALL - 474 BROADWAY - SARATOGA SPRINGS, NY 12866 PHONE 518-587-3550 FAX 518-580-9480 APPLICATION FOR DETACHED GARAGE 1. APPLICATION MUST BE FILLED OUT
More information2018 Predictive Analytics Symposium Session 10: Cracking the Black Box with Awareness & Validation
2018 Predictive Analytics Symposium Session 10: Cracking the Black Box with Awareness & Validation SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer Cracking the Black Box with Awareness
More informationStatistical Models of Stocks and Bonds. Zachary D Easterling: Department of Economics. The University of Akron
Statistical Models of Stocks and Bonds Zachary D Easterling: Department of Economics The University of Akron Abstract One of the key ideas in monetary economics is that the prices of investments tend to
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
CHAPTER FORM A Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Determine whether the given ordered pair is a solution of the given equation.
More informationGoing from General to Specific
Going from General to Specific Regression of Interest rate on All 7 Variables Comments: R Square is good at 63.8% The residual plot on the right is not looking entirely random Unemployment variable has
More informationUsing Random Forests in conintegrated pairs trading
Using Random Forests in conintegrated pairs trading By: Reimer Meulenbeek Supervisor Radboud University: Prof. dr. E.A. Cator Supervisors FRIJT BV: Dr. O. de Mirleau Drs. M. Meuwissen November 5, 2017
More informationMATH 3200 Exam 3 Dr. Syring
. Suppose n eligible voters are polled (randomly sampled) from a population of size N. The poll asks voters whether they support or do not support increasing local taxes to fund public parks. Let M be
More informationBoosting Actuarial Regression Models in R
Carryl Oberson Faculty of Business and Economics University of Basel R in Insurance 2015 Build regression models (GLMs) for car insurance data. 3 types of response variables: claim incidence: y i = 0,
More informationThe FREQ Procedure. Table of Sex by Gym Sex(Sex) Gym(Gym) No Yes Total Male Female Total
Jenn Selensky gathered data from students in an introduction to psychology course. The data are weights, sex/gender, and whether or not the student worked-out in the gym. Here is the output from a 2 x
More informationEffects of Financial Parameters on Poverty - Using SAS EM
Effects of Financial Parameters on Poverty - Using SAS EM By - Akshay Arora Student, MS in Business Analytics Spears School of Business Oklahoma State University Abstract Studies recommend that developing
More informationLoan Approval and Quality Prediction in the Lending Club Marketplace
Loan Approval and Quality Prediction in the Lending Club Marketplace Milestone Write-up Yondon Fu, Shuo Zheng and Matt Marcus Recap Lending Club is a peer-to-peer lending marketplace where individual investors
More informationLoan Approval and Quality Prediction in the Lending Club Marketplace
Loan Approval and Quality Prediction in the Lending Club Marketplace Final Write-up Yondon Fu, Matt Marcus and Shuo Zheng Introduction Lending Club is a peer-to-peer lending marketplace where individual
More informationMultiple Regression. Review of Regression with One Predictor
Fall Semester, 2001 Statistics 621 Lecture 4 Robert Stine 1 Preliminaries Multiple Regression Grading on this and other assignments Assignment will get placed in folder of first member of Learning Team.
More informationUNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences. STAB22H3 Statistics I Duration: 1 hour and 45 minutes
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences STAB22H3 Statistics I Duration: 1 hour and 45 minutes Last Name: First Name: Student number: Aids allowed: - One handwritten
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Sample Exam 3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Question 1-7: The managers of a brokerage firm are interested in finding out if the
More informationTop-down particle filtering for Bayesian decision trees
Top-down particle filtering for Bayesian decision trees Balaji Lakshminarayanan 1, Daniel M. Roy 2 and Yee Whye Teh 3 1. Gatsby Unit, UCL, 2. University of Cambridge and 3. University of Oxford Outline
More informationTree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree
Introduction MS4424 Data Mining & Modelling Decision Tree Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk decision tree is a set of rules represented in a tree structure
More informationCompetition price analysis in non-life insurance
White Paper on Non-Life Insurance: Competition A Reacfin price White analysis Paper in on non-life Non-Life insurance Insurance: - How machine learning and statistical predictive models can help Competition
More information2. Does each situation represent direct variation or partial variation? a) Lily is paid $5 per hour for raking leaves.
MPM1D Date: Name: Relating Linear Equations, Graphs and Table of Values 1. Alan works part-time at a gas station. He earns $10/h. His pay varies directly with the time, in hours, he works. a) Choose appropriate
More informationLESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY
LESSON 7 INTERVAL ESTIMATION SAMIE L.S. LY 1 THIS WEEK S PLAN Part I: Theory + Practice ( Interval Estimation ) Part II: Theory + Practice ( Interval Estimation ) z-based Confidence Intervals for a Population
More informationInterval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems
Interval estimation September 29, 2017 STAT 151 Class 7 Slide 1 Outline of Topics 1 Basic ideas 2 Sampling variation and CLT 3 Interval estimation using X 4 More general problems STAT 151 Class 7 Slide
More informationGOALS. Discrete Probability Distributions. A Distribution. What is a Probability Distribution? Probability for Dice Toss. A Probability Distribution
GOALS Discrete Probability Distributions Chapter 6 Dr. Richard Jerz Define the terms probability distribution and random variable. Distinguish between discrete and continuous probability distributions.
More informationOff to College? First Apartment? First House? Not So Fast!
Home Sweet Home Off to College? First Apartment? First House? Not So Fast! 1. Do you know how to open a bank account? Yes No 2. Do you know how to balance a checkbook? Yes No 3. Do you know how to get
More informationUsing New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Paul J. Hilliard, Educational Testing Service (ETS)
Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds Using New SAS 9.4 Features for Cumulative Logit Models with Partial Proportional Odds INTRODUCTION Multicategory Logit
More informationDiscrete Probability Distributions Chapter 6 Dr. Richard Jerz
Discrete Probability Distributions Chapter 6 Dr. Richard Jerz 1 GOALS Define the terms probability distribution and random variable. Distinguish between discrete and continuous probability distributions.
More informationCross-validation, ridge regression, and bootstrap
Cross-validation, ridge regression, and bootstrap > par(mfrow=c(2,2)) > head(ironslag) chemical magnetic 1 24 25 2 16 22 3 24 17 4 18 21 5 18 20 6 10 13 > attach(ironslag) > a=seq(min(chemical), max(chemical),
More informationSolution to Exercise E5.
Solution to Exercise E5. The Multiple Regression Model. Estimation. Exercise E5.1. Beach umbrella rental Part I. Simple Linear Regression Model. a. Regression model: U t = β 1 + β 2 T t + u t t = 1,...,
More informationMeasures of Dispersion (Range, standard deviation, standard error) Introduction
Measures of Dispersion (Range, standard deviation, standard error) Introduction We have already learnt that frequency distribution table gives a rough idea of the distribution of the variables in a sample
More informationSession 178 TS, Stats for Health Actuaries. Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA. Presenter: Joan C. Barrett, FSA, MAAA
Session 178 TS, Stats for Health Actuaries Moderator: Ian G. Duncan, FSA, FCA, FCIA, FIA, MAAA Presenter: Joan C. Barrett, FSA, MAAA Session 178 Statistics for Health Actuaries October 14, 2015 Presented
More informationThe Least Squares Regression Line
The Least Squares Regression Line Section 5.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Office hours: T Th 1:30 pm - 3:30 pm 620 PGH & 5:30 pm - 7:00 pm CASA Department of Mathematics University of Houston
More informationPARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS
PARAMETRIC AND NON-PARAMETRIC BOOTSTRAP: A SIMULATION STUDY FOR A LINEAR REGRESSION WITH RESIDUALS FROM A MIXTURE OF LAPLACE DISTRIBUTIONS Melfi Alrasheedi School of Business, King Faisal University, Saudi
More informationSTAT 509: Statistics for Engineers Dr. Dewei Wang. Copyright 2014 John Wiley & Sons, Inc. All rights reserved.
STAT 509: Statistics for Engineers Dr. Dewei Wang Applied Statistics and Probability for Engineers Sixth Edition Douglas C. Montgomery George C. Runger 7 Point CHAPTER OUTLINE 7-1 Point Estimation 7-2
More informationWeb Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion
Web Appendix Are the effects of monetary policy shocks big or small? Olivier Coibion Appendix 1: Description of the Model-Averaging Procedure This section describes the model-averaging procedure used in
More informationEE/AA 578 Univ. of Washington, Fall Homework 8
EE/AA 578 Univ. of Washington, Fall 2016 Homework 8 1. Multi-label SVM. The basic Support Vector Machine (SVM) described in the lecture (and textbook) is used for classification of data with two labels.
More informationEconometric Methods for Valuation Analysis
Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 26 Correlation Analysis Simple Regression
More informationBooth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Midterm
Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay Solutions to Midterm Problem A: (34 pts) Answer briefly the following questions. Each question has
More informationFINALTERM EXAMINATION Spring 2009 MGT201- Financial Management (Session - 2) Question No: 1 ( Marks: 1 ) - Please choose one What is the long-run objective of financial management? Maximize earnings per
More informationThe relationship between GDP, labor force and health expenditure in European countries
Econometrics-Term paper The relationship between GDP, labor force and health expenditure in European countries Student: Nguyen Thu Ha Contents 1. Background:... 2 2. Discussion:... 2 3. Regression equation
More informationChapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats
EXST3201 Chapter 11b Geaghan Fall 2005: Page 1 Chapter 11 : Model checking and refinement An example: Blood-brain barrier study on rats This study investigates the permeability of the blood-brain barrier
More informationPackage FADA. May 20, 2016
Type Package Package FADA May 20, 2016 Title Variable Selection for Supervised Classification in High Dimension Version 1.3.2 Date 2016-05-12 Author Emeline Perthame (INRIA, Grenoble, France), Chloe Friguet
More informationProblem Set 9 Heteroskedasticty Answers
Problem Set 9 Heteroskedasticty Answers /* INVESTIGATION OF HETEROSKEDASTICITY */ First graph data. u hetdat2. gra manuf gdp, s([country].) xlab ylab 300000 manufacturing output (US$ miilio 200000 100000
More information