Stat 401XV Exam 3 Spring 2017

Size: px
Start display at page:

Download "Stat 401XV Exam 3 Spring 2017"

Transcription

1 Stat 40XV Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit. Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit. SHOW YOUR WORK/EXPLAIN YOURSELF!

2 . There are data on the UCI Machine Learning Repository concerning financial characteristics of some of the Forbes 500 companies in 986. We will here concern ourselves with 7 of the cases and the 6 quantitative variables Assets, Sales, Market Value, Profits, Cash Flow, and Employees, and the factor variable Sector. (The last of these has 9 levels.) Beginning on Page 8 of this exam there is a printout of some analyses of these data, with particular emphasis on the modeling of Market Value as a function of the other variables. 4 pts a) After accounting for Sales, Cash Flow, and Employees, do the variables Assets and Profits add significantly to one's ability to predict or model Market Value? Give the value of an F statistic and associated degrees of freedom useful for assessing this. Say what you can (given the tables you have to work with) about a corresponding p -value. b) Notice that in the model for Market Value that includes Sales, Cash Flow, and Employees the fitted regression coefficient for Sales is negative. This is presumably counter-intuitive. Correlations between predictors are below. Explain how they help account for this seemingly strange outcome. Sales Cash_Flow Employees Sales Cash_Flow Employees

3 c) Below are a graphic from the leaps package and some cross-validation root mean squared prediction errors from caret. What do these suggest about an appropriate set of quantitative predictors for Market Value? Model Terms CVRMSPE Employees 449 Cash, Employees 4 Sales, Cash, Employees Sales, Profits, Cash, Employees 55 Assets, Sales, Profits, Cash, Employees d) There is some output on the printout from an lm()call that includes not only the quantitative predictors of Market Value, but the factor variable Sector as well. All else (values of the quantitative predictors) being equal, which sector seems to have companies with the largest (per company) Market Values? Explain carefully. Remember that there are 9 sectors to consider.

4 . There is a famous Wisconsin Breast Cancer dataset on the UCI ML Repository. This dataset has N = = 68 complete cases (6 have missing entries), each one describing k = 9 characteristics of a biopsied tumor that has been classified as either benign (444 cases) or malignant (9 cases). There is a printout beginning on Page 0 of this exam from an attempt to model the probability that a submitted biopsy is malignant on the basis of values of predictor variables (originally on -0 scales) x = Clump Thickness, x = Cell Size Uniformity, x = Cell Shape Uniformity, x 4 = Marginal Adhesions, x 5 = Single Epithelial Cell Size, x 6 = Bare Nuclei, x 7 = Bland Chromatin, x 8 = Normal Nucleoli, and x 9 = Mitoses. Use it to answer the questions on this page. a) Which of the features x through x 9 appears to be least important (in the presence of all others) in modeling the probability that a tumor is malignant? Explain. b) For what linear relationship among the predictor variables x through x 9 is the estimated probability that a submitted biopsy is malignant exactly.5? (Give values b0, b,, b9, and c so that the relationship is b0 + bx + bx + + b9x9 = c.) 4

5 . An R dataset concerns an experiment on the pharmacokinetics of theophylline. Subjects were given oral doses of the drug and serum concentrations were measured over time. These can be analyzed using a two-compartment open pharmacokinetic model, that for a single subject (at dose 4.4) is K K exp( K time) exp( K time) conc = 4.4 e a e a + ε (*) C K K ( ) for model parameters K e = the elimination rate constant, K a = the absorption rate constant, and C = the clearance. A printout beginning on Page 0 summarizes an analysis of n = data pairs time, conc for one subject. Use it to answer the following questions. ( ) a e a) Suppose relationship (*) above holds for iid N( 0,σ ) errors ε. Give approximate 95% two-sided confidence limits for σ. b) What does the plot below suggest about the plausibility of the usual non-linear regression model (*) in the present context? 5

6 4. There are old experimental data concerning noise passing through automotive exhaust systems at The response variable was y = noise level (db), for vehicles of Sizes (=small, =medium, and =large), for silencers/filters of Types (=standard silencer and =Octel pollution filter), and observations on Sides (=right and =left) of the cars studied. Each combination of levels of factors was recorded m = times. Various analyses of these data are on a printout beginning on Page. Use it to answer the following questions. First, ignore the Sides variable and treat the data as if they are factorial data. 8 pts a) Make an interaction plot enhanced with error bars based on 95% confidence limits for combination mean noise. What are your "margins of error" for this plotting? (Give a number.) + / margin: b) Based on the plot above, which effects appear to be both statistically detectable and most important? (Consider Size and Type main effects and interactions. List an order of importance.) c) The most basic goal of the original study was to establish that the Octel filter was at least as good as the standard silencer. Based on your plot and items on the printout, was this established? Explain. 6

7 Now consider the full -Factor structure of the dataset. d) Are "effects" of SIDE on NOISE detectable? Explain what on the printout supports your judgment. e) What is the value and degrees of freedom for an F test of the hypothesis that all effects involving SIDE are 0? f) What is the effect on perceived "experimental error" when one includes the factor SIDE in the modeling of NOISE? Refer to appropriate values on the printout and explain why what you see makes sense. g) Using the basic " L and L ˆ ideas," give 95% two-sided confidence limits for the difference between right and left side mean noise levels for large vehicles using the Octel filter. 7

8 R Code and Output for the Forbes 500 Company Data > summary(companies) Assets Sales Market_Value Profits Cash_Flow Min. : Min. : 76 Min. : 5 Min. : Min. :-54. st Qu.: st Qu.: 706 st Qu.: 478 st Qu.: 7.80 st Qu.: 7.5 Median : 548 Median : 679 Median : 88 Median : 67.0 Median : 0.4 Mean : 7 Mean : 04 Mean :5 Mean : 96. Mean : 7.8 rd Qu.: 5074 rd Qu.: 45 rd Qu.:89 rd Qu.: rd Qu.: 0. Max. :4045 Max. :74 Max. :946 Max. : Max. :46.0 Employees sector Min. : 0.60 Energy :5 st Qu.:.80 Finance :4 Median :.60 Manufacturing:0 Mean : 8.86 Retail :0 rd Qu.: 7.50 Other : 7 Max. :84.80 HiTech : 6 (Other) : > summary(lm(market_value~assets+sales+profits+cash_flow+ + Employees,data=Companies)) Call: lm(formula = Market_Value ~ Assets + Sales + Profits + Cash_Flow + Employees, data = Companies) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Assets Sales e-05 *** Profits Cash_Flow * e-05 *** Employees e-0 *** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: 05 on 67 degrees of freedom Multiple R-squared: 0.766, Adjusted R-squared: F-statistic:.89 on 5 and 67 DF, p-value: <.e-6 > anova(lm(market_value~assets+sales+profits+cash_flow+ + Employees,data=Companies)) Analysis of Variance Table Response: Market_Value Df Sum Sq Mean Sq F value Pr(>F) Assets e-0 *** Sales e-07 *** Profits e-05 *** Cash_Flow Employees e-0 *** Residuals Signif. codes: 0 *** 0.00 ** 0.0 * > summary(lm(market_value~sales+cash_flow+employees,data=companies)) Call: lm(formula = Market_Value ~ Sales + Cash_Flow + Employees, data = Companies) Residuals: Min Q Median Q Max

9 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Sales Cash_Flow ** e-09 *** Employees e-08 *** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: 088 on 69 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 45.5 on and 69 DF, p-value:.46e-6 > anova(lm(market_value~sales+cash_flow+employees,data=companies)) Analysis of Variance Table Response: Market_Value Df Sum Sq Mean Sq F value Pr(>F) Sales e- *** Cash_Flow e-06 *** Employees e-08 *** Residuals Signif. codes: 0 *** 0.00 ** 0.0 * > options(contrasts = rep("contr.sum", )) > summary(lm(market_value~assets+sales+profits+cash_flow+ + Employees+sector,data=Companies)) Call: lm(formula = Market_Value ~ Assets + Sales + Profits + Cash_Flow + Employees + sector, data = Companies) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) Assets * Sales *** Profits Cash_Flow ** Employees e-09 *** sector sector sector sector e-05 *** sector sector sector sector Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: 86. on 59 degrees of freedom Multiple R-squared: 0.80, Adjusted R-squared: F-statistic: 0.7 on and 59 DF, p-value: <.e-6 9

10 R Code and Output for the Wisconsin Cancer Data > model <- glm(y~.,family=binomial(link='logit'),data=wisc) > summary(model) Call: glm(formula = y ~., family = binomial(link = "logit"), data = WISC) Deviance Residuals: Min Q Median Q 0.0 Max.4698 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < e-6 *** x *** x x x ** x x e-05 *** x ** x x Signif. codes: 0 *** 0.00 ** 0.0 * (Dispersion parameter for binomial family taken to be ) Null deviance: on 68 degrees of freedom Residual deviance: 0.89 on 67 degrees of freedom AIC:.89 Number of Fisher Scoring iterations: 8 R Code and Output for the Theophylline Data > cbind(theoph.4$time,theoph.4$conc) [,] [,] [,] [,] [,] [4,] [5,]. 8.8 [6,] [7,] [8,] [9,] [0,] [,] > Conc.out<-nls(conc~4.4*(Ke*Ka/C)*(exp(-Ke*Time)-exp(-Ka*Time))/(Ka-Ke), + data=theoph.4,start=c(c=.04,ke=.09,ka=.),trace=t) : : : : : : : :

11 > summary(conc.out) Formula: conc ~ 4.4 * (Ke * Ka/C) * (exp(-ke * Time) - exp(-ka * Time))/(Ka - Ke) Parameters: Estimate Std. Error t value Pr(> t ) C *** Ke ** Ka ** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 8 degrees of freedom Number of iterations to convergence: 7 Achieved convergence tolerance: 6.0e-06 > plot(theoph.4$time,residuals(conc.out),ce=,pch=9,xlab="time",ylab="residual") > abline(a=0,b=0) R Code and Output for the Exhaust Noise Data > summary(noise) NOISE Min. :760.0 SIZE : TYPE :8 SIDE :8 st Qu.:78.5 : :8 :8 Median :80.0 Mean :80. : rd Qu.:87.5 Max. :855.0 > options(contrasts = rep("contr.sum", )) > summary(lm(noise~size*type,data=noise)) Call: lm(formula = NOISE ~ SIZE * TYPE, data = Noise) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < e-6 *** SIZE e-08 *** SIZE e- *** TYPE *** SIZE:TYPE SIZE:TYPE ** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 0 degrees of freedom Multiple R-squared: 0.94, Adjusted R-squared: 0.94 F-statistic: 85.4 on 5 and 0 DF, p-value: <.e-6 > > aggregate(noise$noise,by=list(noise$size,noise$type),mean) Group. Group. x

12 > aggregate(noise$noise,by=list(noise$size,noise$type),sd) Group. Group. x > > aggregate(noise$noise,by=list(noise$size,noise$type,noise$side),mean) Group. Group. Group. x > aggregate(noise$noise,by=list(noise$size,noise$type,noise$side),sd) Group. Group. Group. x > > summary(lm(noise~size*type*side,data=noise)) Call: lm(formula = NOISE ~ SIZE * TYPE * SIDE, data = Noise) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < e-6 *** SIZE e-4 *** SIZE < e-6 *** TYPE e-08 *** SIDE SIZE:TYPE *** SIZE:TYPE e-07 *** SIZE:SIDE e-07 *** SIZE:SIDE * TYPE:SIDE SIZE:TYPE:SIDE ** SIZE:TYPE:SIDE Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error:.89 on 4 degrees of freedom Multiple R-squared: 0.988, Adjusted R-squared: F-statistic: 84 on and 4 DF, p-value: <.e-6

Logistic Regression. Logistic Regression Theory

Logistic Regression. Logistic Regression Theory Logistic Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Logistic Regression The linear probability model.

More information

############################ ### toxo.r ### ############################

############################ ### toxo.r ### ############################ ############################ ### toxo.r ### ############################ toxo < read.table(file="n:\\courses\\stat8620\\fall 08\\toxo.dat",header=T) #toxo < read.table(file="c:\\documents and Settings\\dhall\\My

More information

Multiple Regression and Logistic Regression II. Dajiang 525 Apr

Multiple Regression and Logistic Regression II. Dajiang 525 Apr Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from Last Time Multiple regression model: Include multiple predictors in the model = + + + + How to interpret the

More information

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times.

Let us assume that we are measuring the yield of a crop plant on 5 different plots at 4 different observation times. Mixed-effects models An introduction by Christoph Scherber Up to now, we have been dealing with linear models of the form where ß0 and ß1 are parameters of fixed value. Example: Let us assume that we are

More information

Study 2: data analysis. Example analysis using R

Study 2: data analysis. Example analysis using R Study 2: data analysis Example analysis using R Steps for data analysis Install software on your computer or locate computer with software (e.g., R, systat, SPSS) Prepare data for analysis Subjects (rows)

More information

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013

Ordinal Multinomial Logistic Regression. Thom M. Suhy Southern Methodist University May14th, 2013 Ordinal Multinomial Logistic Thom M. Suhy Southern Methodist University May14th, 2013 GLM Generalized Linear Model (GLM) Framework for statistical analysis (Gelman and Hill, 2007, p. 135) Linear Continuous

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Scott Creel Wednesday, September 10, 2014 This exercise extends the prior material on using the lm() function to fit an OLS regression and test hypotheses about effects on a parameter.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - IIIb Henrik Madsen March 18, 2012 Henrik Madsen () Chapman & Hall March 18, 2012 1 / 32 Examples Overdispersion and Offset!

More information

Final Exam Suggested Solutions

Final Exam Suggested Solutions University of Washington Fall 003 Department of Economics Eric Zivot Economics 483 Final Exam Suggested Solutions This is a closed book and closed note exam. However, you are allowed one page of handwritten

More information

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction

Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction Negative Binomial Model for Count Data Log-linear Models for Contingency Tables - Introduction Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Negative Binomial Family Example: Absenteeism from

More information

Non-linearities in Simple Regression

Non-linearities in Simple Regression Non-linearities in Simple Regression 1. Eample: Monthly Earnings and Years of Education In this tutorial, we will focus on an eample that eplores the relationship between total monthly earnings and years

More information

Chapter 8 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (May 1, 2010)

Chapter 8 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (May 1, 2010) Chapter 8 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (May 1, 2010) Preliminaries > library(daag) Exercise 1 The following table shows numbers of occasions when inhibition (i.e.,

More information

Statistics 175 Applied Statistics Generalized Linear Models Jianqing Fan

Statistics 175 Applied Statistics Generalized Linear Models Jianqing Fan Statistics 175 Applied Statistics Generalized Linear Models Jianqing Fan Example 1 (Kyhposis data): (The data set kyphosis consists of measurements on 81 children following corrective spinal surgery. Variable

More information

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Midterm Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay Midterm GSB Honor Code: I pledge my honor that I have not violated the Honor Code during this examination.

More information

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 For this assignment use the Diamonds dataset in the Stat2Data library. The dataset is used in examples

More information

Predicting Charitable Contributions

Predicting Charitable Contributions Predicting Charitable Contributions By Lauren Meyer Executive Summary Charitable contributions depend on many factors from financial security to personal characteristics. This report will focus on demographic

More information

boxcox() returns the values of α and their loglikelihoods,

boxcox() returns the values of α and their loglikelihoods, Solutions to Selected Computer Lab Problems and Exercises in Chapter 11 of Statistics and Data Analysis for Financial Engineering, 2nd ed. by David Ruppert and David S. Matteson c 2016 David Ruppert and

More information

Random Effects ANOVA

Random Effects ANOVA Random Effects ANOVA Grant B. Morgan Baylor University This post contains code for conducting a random effects ANOVA. Make sure the following packages are installed: foreign, lme4, lsr, lattice. library(foreign)

More information

MODEL SELECTION CRITERIA IN R:

MODEL SELECTION CRITERIA IN R: 1. R 2 statistics We may use MODEL SELECTION CRITERIA IN R R 2 = SS R SS T = 1 SS Res SS T or R 2 Adj = 1 SS Res/(n p) SS T /(n 1) = 1 ( ) n 1 (1 R 2 ). n p where p is the total number of parameters. R

More information

Regression and Simulation

Regression and Simulation Regression and Simulation This is an introductory R session, so it may go slowly if you have never used R before. Do not be discouraged. A great way to learn a new language like this is to plunge right

More information

Step 1: Load the appropriate R package. Step 2: Fit a separate mixed model for each independence claim in the basis set.

Step 1: Load the appropriate R package. Step 2: Fit a separate mixed model for each independence claim in the basis set. Step 1: Load the appropriate R package. You will need two libraries: nlme and lme4. Step 2: Fit a separate mixed model for each independence claim in the basis set. For instance, in Table 2 the first basis

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 18, 2006, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTIONS Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT)

Regression Review and Robust Regression. Slides prepared by Elizabeth Newton (MIT) Regression Review and Robust Regression Slides prepared by Elizabeth Newton (MIT) S-Plus Oil City Data Frame Monthly Excess Returns of Oil City Petroleum, Inc. Stocks and the Market SUMMARY: The oilcity

More information

> budworm$samplogit < log((budworm$y+0.5)/(budworm$m budworm$y+0.5))

> budworm$samplogit < log((budworm$y+0.5)/(budworm$m budworm$y+0.5)) budworm < read.table(file="n:\\courses\\stat8620\\fall 08\\budworm.dat",header=T) #budworm < read.table(file="c:\\documents and Settings\\dhall\\My Documents\\Dan's Work Stuff\\courses\\STAT8620\\Fall

More information

6 Multiple Regression

6 Multiple Regression More than one X variable. 6 Multiple Regression Why? Might be interested in more than one marginal effect Omitted Variable Bias (OVB) 6.1 and 6.2 House prices and OVB Should I build a fireplace? The following

More information

Homework Assignment Section 3

Homework Assignment Section 3 Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.

More information

Final Exam - section 1. Thursday, December hours, 30 minutes

Final Exam - section 1. Thursday, December hours, 30 minutes Econometrics, ECON312 San Francisco State University Michael Bar Fall 2013 Final Exam - section 1 Thursday, December 19 1 hours, 30 minutes Name: Instructions 1. This is closed book, closed notes exam.

More information

NHY examples. Bernt Arne Ødegaard. 23 November Estimating dividend growth in Norsk Hydro 8

NHY examples. Bernt Arne Ødegaard. 23 November Estimating dividend growth in Norsk Hydro 8 NHY examples Bernt Arne Ødegaard 23 November 2017 Abstract Finance examples using equity data for Norsk Hydro (NHY) Contents 1 Calculating Beta 4 2 Cost of Capital 7 3 Estimating dividend growth in Norsk

More information

Economics 424/Applied Mathematics 540. Final Exam Solutions

Economics 424/Applied Mathematics 540. Final Exam Solutions University of Washington Summer 01 Department of Economics Eric Zivot Economics 44/Applied Mathematics 540 Final Exam Solutions I. Matrix Algebra and Portfolio Math (30 points, 5 points each) Let R i denote

More information

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty

Milestone2. Zillow House Price Prediciton. Group: Lingzi Hong and Pranali Shetty Milestone2 Zillow House Price Prediciton Group Lingzi Hong and Pranali Shetty MILESTONE 2 REPORT Data Collection The following additional features were added 1. Population, Number of College Graduates

More information

Multiple regression - a brief introduction

Multiple regression - a brief introduction Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict

More information

Stat 328, Summer 2005

Stat 328, Summer 2005 Stat 328, Summer 2005 Exam #2, 6/18/05 Name (print) UnivID I have neither given nor received any unauthorized aid in completing this exam. Signed Answer each question completely showing your work where

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Business Statistics 41000 Spring 2017 1 Topics 1. Including multiple predictors 2. Controlling for confounders 3. Transformations, interactions, dummy variables OpenIntro 8.1,

More information

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings

Dummy Variables. 1. Example: Factors Affecting Monthly Earnings Dummy Variables A dummy variable or binary variable is a variable that takes on a value of 0 or 1 as an indicator that the observation has some kind of characteristic. Common examples: Sex (female): FEMALE=1

More information

Bradley-Terry Models. Stat 557 Heike Hofmann

Bradley-Terry Models. Stat 557 Heike Hofmann Bradley-Terry Models Stat 557 Heike Hofmann Outline Definition: Bradley-Terry Fitting the model Extension: Order Effects Extension: Ordinal & Nominal Response Repeated Measures Bradley-Terry Model (1952)

More information

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER

STA2601. Tutorial letter 105/2/2018. Applied Statistics II. Semester 2. Department of Statistics STA2601/105/2/2018 TRIAL EXAMINATION PAPER STA2601/105/2/2018 Tutorial letter 105/2/2018 Applied Statistics II STA2601 Semester 2 Department of Statistics TRIAL EXAMINATION PAPER Define tomorrow. university of south africa Dear Student Congratulations

More information

CREDIT RISK MODELING IN R. Logistic regression: introduction

CREDIT RISK MODELING IN R. Logistic regression: introduction CREDIT RISK MODELING IN R Logistic regression: introduction Final data structure > str(training_set) 'data.frame': 19394 obs. of 8 variables: $ loan_status : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1

More information

State Ownership at the Oslo Stock Exchange. Bernt Arne Ødegaard

State Ownership at the Oslo Stock Exchange. Bernt Arne Ødegaard State Ownership at the Oslo Stock Exchange Bernt Arne Ødegaard Introduction We ask whether there is a state rebate on companies listed on the Oslo Stock Exchange, i.e. whether companies where the state

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis These models are appropriate when the response

More information

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING XLSTAT TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION XLSTAT makes accessible to anyone a powerful, complete and user-friendly data analysis and statistical solution. Accessibility to

More information

STA 4504/5503 Sample questions for exam True-False questions.

STA 4504/5503 Sample questions for exam True-False questions. STA 4504/5503 Sample questions for exam 2 1. True-False questions. (a) For General Social Survey data on Y = political ideology (categories liberal, moderate, conservative), X 1 = gender (1 = female, 0

More information

Case Study: Applying Generalized Linear Models

Case Study: Applying Generalized Linear Models Case Study: Applying Generalized Linear Models Dr. Kempthorne May 12, 2016 Contents 1 Generalized Linear Models of Semi-Quantal Biological Assay Data 2 1.1 Coal miners Pneumoconiosis Data.................

More information

General Business 706 Midterm #3 November 25, 1997

General Business 706 Midterm #3 November 25, 1997 General Business 706 Midterm #3 November 25, 1997 There are 9 questions on this exam for a total of 40 points. Please be sure to put your name and ID in the spaces provided below. Now, if you feel any

More information

The Norwegian State Equity Ownership

The Norwegian State Equity Ownership The Norwegian State Equity Ownership B A Ødegaard 15 November 2018 Contents 1 Introduction 1 2 Doing a performance analysis 1 2.1 Using R....................................................................

More information

MCMC Package Example

MCMC Package Example MCMC Package Example Charles J. Geyer April 4, 2005 This is an example of using the mcmc package in R. The problem comes from a take-home question on a (take-home) PhD qualifying exam (School of Statistics,

More information

Monetary Economics Risk and Return, Part 2. Gerald P. Dwyer Fall 2015

Monetary Economics Risk and Return, Part 2. Gerald P. Dwyer Fall 2015 Monetary Economics Risk and Return, Part 2 Gerald P. Dwyer Fall 2015 Reading Malkiel, Part 2, Part 3 Malkiel, Part 3 Outline Returns and risk Overall market risk reduced over longer periods Individual

More information

Generalized Multilevel Regression Example for a Binary Outcome

Generalized Multilevel Regression Example for a Binary Outcome Psy 510/610 Multilevel Regression, Spring 2017 1 HLM Generalized Multilevel Regression Example for a Binary Outcome Specifications for this Bernoulli HLM2 run Problem Title: no title The data source for

More information

σ e, which will be large when prediction errors are Linear regression model

σ e, which will be large when prediction errors are Linear regression model Linear regression model we assume that two quantitative variables, x and y, are linearly related; that is, the population of (x, y) pairs are related by an ideal population regression line y = α + βx +

More information

1 Estimating risk factors for IBM - using data 95-06

1 Estimating risk factors for IBM - using data 95-06 1 Estimating risk factors for IBM - using data 95-06 Basic estimation of asset pricing models, using IBM returns data Market model r IBM = a + br m + ɛ CAPM Fama French 1.1 Using octave/matlab er IBM =

More information

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Final Exam

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Final Exam The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay Final Exam Booth Honor Code: I pledge my honor that I have not violated the Honor Code during this

More information

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018 ` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.

More information

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers Diploma Part 2 Quantitative Methods Examiner s Suggested Answers Question 1 (a) The binomial distribution may be used in an experiment in which there are only two defined outcomes in any particular trial

More information

CHAPTER 4 DATA ANALYSIS Data Hypothesis

CHAPTER 4 DATA ANALYSIS Data Hypothesis CHAPTER 4 DATA ANALYSIS 4.1. Data Hypothesis The hypothesis for each independent variable to express our expectations about the characteristic of each independent variable and the pay back performance

More information

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Midterm Booth School of Business, University of Chicago Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay Solutions to Midterm Problem A: (30 pts) Answer briefly the following questions. Each question has

More information

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay

Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay Lecture Note: Analysis of Financial Time Series Spring 2017, Ruey S. Tsay Seasonal Time Series: TS with periodic patterns and useful in predicting quarterly earnings pricing weather-related derivatives

More information

Jaime Frade Dr. Niu Interest rate modeling

Jaime Frade Dr. Niu Interest rate modeling Interest rate modeling Abstract In this paper, three models were used to forecast short term interest rates for the 3 month LIBOR. Each of the models, regression time series, GARCH, and Cox, Ingersoll,

More information

Intro to GLM Day 2: GLM and Maximum Likelihood

Intro to GLM Day 2: GLM and Maximum Likelihood Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the

More information

Projects for Bayesian Computation with R

Projects for Bayesian Computation with R Projects for Bayesian Computation with R Laura Vana & Kurt Hornik Winter Semeter 2018/2019 1 S&P Rating Data On the homepage of this course you can find a time series for Standard & Poors default data

More information

Logistic Regression with R: Example One

Logistic Regression with R: Example One Logistic Regression with R: Example One math = read.table("http://www.utstat.toronto.edu/~brunner/appliedf12/data/mathcat.data") math[1:5,] hsgpa hsengl hscalc course passed outcome 1 78.0 80 Yes Mainstrm

More information

Two Way ANOVA in R Solutions

Two Way ANOVA in R Solutions Two Way ANOVA in R Solutions Solutions to exercises found here # Exercise 1 # #Read in the moth experiment data setwd("h:/datasets") moth.experiment = read.csv("moth trap experiment.csv", header = TRUE)

More information

Analysis of Variance in Matrix form

Analysis of Variance in Matrix form Analysis of Variance in Matrix form The ANOVA table sums of squares, SSTO, SSR and SSE can all be expressed in matrix form as follows. week 9 Multiple Regression A multiple regression model is a model

More information

R is a collaborative project with many contributors. Type contributors() for more information.

R is a collaborative project with many contributors. Type contributors() for more information. R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type license() or licence() for distribution details. R is a collaborative project

More information

ARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS

ARIMA ANALYSIS WITH INTERVENTIONS / OUTLIERS TASK Run intervention analysis on the price of stock M: model a function of the price as ARIMA with outliers and interventions. SOLUTION The document below is an abridged version of the solution provided

More information

BIOS 4120: Introduction to Biostatistics Breheny. Lab #7. I. Binomial Distribution. RCode: dbinom(x, size, prob) binom.test(x, n, p = 0.

BIOS 4120: Introduction to Biostatistics Breheny. Lab #7. I. Binomial Distribution. RCode: dbinom(x, size, prob) binom.test(x, n, p = 0. BIOS 4120: Introduction to Biostatistics Breheny Lab #7 I. Binomial Distribution P(X = k) = ( n k )pk (1 p) n k RCode: dbinom(x, size, prob) binom.test(x, n, p = 0.5) P(X < K) = P(X = 0) + P(X = 1) + +

More information

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance

Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance Mixed models in R using the lme4 package Part 3: Inference based on profiled deviance Douglas Bates Department of Statistics University of Wisconsin - Madison Madison January 11, 2011

More information

Quantitative Techniques Term 2

Quantitative Techniques Term 2 Quantitative Techniques Term 2 Laboratory 7 2 March 2006 Overview The objective of this lab is to: Estimate a cost function for a panel of firms; Calculate returns to scale; Introduce the command cluster

More information

PASS Sample Size Software

PASS Sample Size Software Chapter 850 Introduction Cox proportional hazards regression models the relationship between the hazard function λ( t X ) time and k covariates using the following formula λ log λ ( t X ) ( t) 0 = β1 X1

More information

State Ownership at the Oslo Stock Exchange

State Ownership at the Oslo Stock Exchange State Ownership at the Oslo Stock Exchange Bernt Arne Ødegaard 1 Introduction We ask whether there is a state rebate on companies listed on the Oslo Stock Exchange, i.e. whether companies where the state

More information

Lecture 1: Empirical Properties of Returns

Lecture 1: Empirical Properties of Returns Lecture 1: Empirical Properties of Returns Econ 589 Eric Zivot Spring 2011 Updated: March 29, 2011 Daily CC Returns on MSFT -0.3 r(t) -0.2-0.1 0.1 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996

More information

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE

AP STATISTICS FALL SEMESTSER FINAL EXAM STUDY GUIDE AP STATISTICS Name: FALL SEMESTSER FINAL EXAM STUDY GUIDE Period: *Go over Vocabulary Notecards! *This is not a comprehensive review you still should look over your past notes, homework/practice, Quizzes,

More information

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7

BEcon Program, Faculty of Economics, Chulalongkorn University Page 1/7 Mid-term Exam (November 25, 2005, 0900-1200hr) Instructions: a) Textbooks, lecture notes and calculators are allowed. b) Each must work alone. Cheating will not be tolerated. c) Attempt all the tests.

More information

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Final Exam

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Final Exam Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay Final Exam GSB Honor Code: I pledge my honor that I have not violated the Honor Code during this

More information

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4

7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4 7. For the table that follows, answer the following questions: x y 1-1/4 2-1/2 3-3/4 4 - Would the correlation between x and y in the table above be positive or negative? The correlation is negative. -

More information

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Midterm

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Midterm Booth School of Business, University of Chicago Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay Midterm ChicagoBooth Honor Code: I pledge my honor that I have not violated the Honor Code during this

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

SFSU FIN822 Project 1

SFSU FIN822 Project 1 SFSU FIN822 Project 1 This project can be done in a team of up to 3 people. Your project report must be accompanied by printouts of programming outputs. You could use any software to solve the problems.

More information

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm Booth School of Business, University of Chicago Business 41202, Spring Quarter 2013, Mr. Ruey S. Tsay Midterm ChicagoBooth Honor Code: I pledge my honor that I have not violated the Honor Code during this

More information

Lecture Note: Analysis of Financial Time Series Spring 2008, Ruey S. Tsay. Seasonal Time Series: TS with periodic patterns and useful in

Lecture Note: Analysis of Financial Time Series Spring 2008, Ruey S. Tsay. Seasonal Time Series: TS with periodic patterns and useful in Lecture Note: Analysis of Financial Time Series Spring 2008, Ruey S. Tsay Seasonal Time Series: TS with periodic patterns and useful in predicting quarterly earnings pricing weather-related derivatives

More information

Final Exam, section 1. Thursday, May hour, 30 minutes

Final Exam, section 1. Thursday, May hour, 30 minutes San Francisco State University Michael Bar ECON 312 Spring 2018 Final Exam, section 1 Thursday, May 17 1 hour, 30 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can use one

More information

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Final Exam

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay. Final Exam The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2011, Mr. Ruey S. Tsay Final Exam Booth Honor Code: I pledge my honor that I have not violated the Honor Code during this

More information

Econometric Methods for Valuation Analysis

Econometric Methods for Valuation Analysis Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 26 Correlation Analysis Simple Regression

More information

SAS Simple Linear Regression Example

SAS Simple Linear Regression Example SAS Simple Linear Regression Example This handout gives examples of how to use SAS to generate a simple linear regression plot, check the correlation between two variables, fit a simple linear regression

More information

Assessing Model Stability Using Recursive Estimation and Recursive Residuals

Assessing Model Stability Using Recursive Estimation and Recursive Residuals Assessing Model Stability Using Recursive Estimation and Recursive Residuals Our forecasting procedure cannot be expected to produce good forecasts if the forecasting model that we constructed was stable

More information

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR

STATISTICAL DISTRIBUTIONS AND THE CALCULATOR STATISTICAL DISTRIBUTIONS AND THE CALCULATOR 1. Basic data sets a. Measures of Center - Mean ( ): average of all values. Characteristic: non-resistant is affected by skew and outliers. - Median: Either

More information

3. The distinction between variable costs and fixed costs is:

3. The distinction between variable costs and fixed costs is: Practice Exam # 2 Dr. Bailey ACCT3310, Spring 2014, Chapters 4, 5, & 6 There are 25 questions, each worth 4 points. Please see my earlier advice on the appropriate use of this exam. Its purpose is to give

More information

Non-Inferiority Tests for Two Means in a 2x2 Cross-Over Design using Differences

Non-Inferiority Tests for Two Means in a 2x2 Cross-Over Design using Differences Chapter 510 Non-Inferiority Tests for Two Means in a 2x2 Cross-Over Design using Differences Introduction This procedure computes power and sample size for non-inferiority tests in 2x2 cross-over designs

More information

Homework Assignment Section 3

Homework Assignment Section 3 Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.

More information

Stat3011: Solution of Midterm Exam One

Stat3011: Solution of Midterm Exam One 1 Stat3011: Solution of Midterm Exam One Fall/2003, Tiefeng Jiang Name: Problem 1 (30 points). Choose one appropriate answer in each of the following questions. 1. (B ) The mean age of five people in a

More information

Regression Model Assumptions Solutions

Regression Model Assumptions Solutions Regression Model Assumptions Solutions Below are the solutions to these exercises on model diagnostics using residual plots. # Exercise 1 # data("cars") head(cars) speed dist 1 4 2 2 4 10 3 7 4 4 7 22

More information

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Midterm

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay. Midterm Booth School of Business, University of Chicago Business 41202, Spring Quarter 2016, Mr. Ruey S. Tsay Midterm ChicagoBooth Honor Code: I pledge my honor that I have not violated the Honor Code during this

More information

Demonstrate Approval of Loans by a Bank

Demonstrate Approval of Loans by a Bank 1 Running head: The Data Consists of 100 Cases of Hypothetical Data to Demonstrate Approval of Loans by a Bank Name Course Subject 2 Introduction There has been witnessed an alarming trend in the number

More information

Tests for Two ROC Curves

Tests for Two ROC Curves Chapter 65 Tests for Two ROC Curves Introduction Receiver operating characteristic (ROC) curves are used to summarize the accuracy of diagnostic tests. The technique is used when a criterion variable is

More information

Influence of Personal Factors on Health Insurance Purchase Decision

Influence of Personal Factors on Health Insurance Purchase Decision Influence of Personal Factors on Health Insurance Purchase Decision INFLUENCE OF PERSONAL FACTORS ON HEALTH INSURANCE PURCHASE DECISION The decision in health insurance purchase include decisions about

More information

Lecture 21: Logit Models for Multinomial Responses Continued

Lecture 21: Logit Models for Multinomial Responses Continued Lecture 21: Logit Models for Multinomial Responses Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison April 12, 2007 Statistics 572 (Spring 2007) Parameter Estimation April 12, 2007 1 / 14 Continue

More information

The SAS System 11:03 Monday, November 11,

The SAS System 11:03 Monday, November 11, The SAS System 11:3 Monday, November 11, 213 1 The CONTENTS Procedure Data Set Name BIO.AUTO_PREMIUMS Observations 5 Member Type DATA Variables 3 Engine V9 Indexes Created Monday, November 11, 213 11:4:19

More information

Lapse Modeling for the Post-Level Period

Lapse Modeling for the Post-Level Period Lapse Modeling for the Post-Level Period A Practical Application of Predictive Modeling JANUARY 2015 SPONSORED BY Committee on Finance Research PREPARED BY Richard Xu, FSA, Ph.D. Dihui Lai, Ph.D. Minyu

More information

WEB APPENDIX 8A 7.1 ( 8.9)

WEB APPENDIX 8A 7.1 ( 8.9) WEB APPENDIX 8A CALCULATING BETA COEFFICIENTS The CAPM is an ex ante model, which means that all of the variables represent before-the-fact expected values. In particular, the beta coefficient used in

More information

STAT758. Final Project. Time series analysis of daily exchange rate between the British Pound and the. US dollar (GBP/USD)

STAT758. Final Project. Time series analysis of daily exchange rate between the British Pound and the. US dollar (GBP/USD) STAT758 Final Project Time series analysis of daily exchange rate between the British Pound and the US dollar (GBP/USD) Theophilus Djanie and Harry Dick Thompson UNR May 14, 2012 INTRODUCTION Time Series

More information