Classification and Regression Trees

Size: px
Start display at page:

Download "Classification and Regression Trees"

Transcription

1 Classification and Regression Trees In unsupervised classification (clustering), there is no response variable ( dependent variable), the regions corresponding to a given node are based on a similarity of the observations to each other. In classification and regression trees, the region at each node is based on some similarity of the response variables to each other. Classification and regression trees are formed divisively, based on a response variable. If a node has more than one group, we may divide the node into multiple nodes that are more pure. 1

2 Classification and Regression Trees We will use the same notation as we used for clustering trees. Each node in the tree corresponds to some specific region X m of the feature space (space of independent variables). We will also occasionally abuse the notation slightly to use X m as the set of indexes i such that x i X m. The branches from a node are defined in terms of rules to split the feature space. 2

3 Classification and Regression Trees If the response variable is a group category, a classification tree is formed. Each node in a classification tree corresponds to the predominant value of the response within the subdomain of the features corresponding to that node. If the response is a numeric variable, a regression tree is formed. Each node in a regression tree corresponds to the average of the response within a given region, rather than to a predominant value of the response. 3

4 Impurity of Nodes of Trees Nodes are split based on their impurity. Impurity is a measure of how badly the observations at a given node fit the model. In a regression tree, for example, the impurity may be measured by the residual sum of squares within that node. In a classification tree, there are various ways of measuring the impurity, such as the misclassification error, the Gini index, and the entropy. 4

5 Deviance The term deviance is used in various ways in statistics. In general, it is a measure of variability that is not accounted for by the fitted model. It is usually not scaled either to account for the number of observations or to account for their magnitude; thus, a larger set of observations will usually have a larger deviance than a smaller set in the same situation, and likewise, data with larger values will usually have a larger deviance than the same data if measured on a larger scale. The fact that a deviance is not scaled for the number of observations yields an additive property for the nodes in a tree. 5

6 Regression Trees Our general model for regression (in one form) has been y i = β 0 + x T i β + ɛ i, where ɛ i is assumed to be a random variable with E(ɛ i ) = 0. This yields an expression for y i conditional on the corresponding x i ; hence we may write the left hand side as y i x i. The model for a regression tree is of the form y i x i = µ m + ɛ i, where m is determined by index of the region X m of the feature space such that x i X m, µ m = E(Y x i X m ), and ɛ i is assumed to be a random variable with E(ɛ i ) = 0 as before. At node m the model fitted is just ŷ i = 1 n m i X m y i. 6

7 Measure of Impurity in Regression Trees The obvious unscaled measure of impurity of any node in a regression tree is the residual sum of squares, RSS. An node m, it is just i X m (y i ŷ i ) 2. This is called the deviance. 7

8 Example set.seed(5) n <- 20 n1 <- 12 n2 <- n-n1 exreg <- data.frame(cbind(x1=c(rnorm(n1),rnorm(n2)+1.0), x2=c(rnorm(n1),rnorm(n2)+1.5), y=(c(1+0.25*rnorm(n1),2+0.25*rnorm(n2)))) exreg attach(exreg) plot(x1,x2,main="population Means of Responses") text(x1[1:n1]+.05,x2[1:n1], 1 ) text(x1[(n1+1):n]+.05,x2[(n1+1):n], 2 ) 8

9 The Data for the Regression Tree x1 x2 y

10 The Data for the Regression Tree Population Means of Responses 2 x x1 10

11 R on the Regression Tree Example library(tree) regtree <- tree(y~x1+x2) This produces node), split, n, deviance, yval * denotes terminal node 1) root ) x1 < ) x2 < * 5) x2 > * 3) x1 > ) x2 < * 7) x2 > * 11

12 Prediction in the Regression Tree Example Now, let s classify some new observations using the regression tree that R computed. newdata <- data.frame(cbind(x1=c(1,-1),x2=c(3,0))) predict.tree(regtree, newdata) This produces This is based on the average value of the response within each region. 12

13 Classification Trees in R 13

14 Example set.seed(5) n <- 20 n1 <- 12 n2 <- n-n1 exclass <- data.frame(cbind(x1=c(rnorm(n1),rnorm(n2)+1.0), x2=c(rnorm(n1),rnorm(n2)+1.5), y=c(rep(1,n1),rep(2,n2)))) attach(exclass) plot(x1,x2,col=y) 14

15 x x1 15

16 Classification Trees A classification tree is formed by recursively dividing up the space of the data. A simple procedure is to choose one feature at a time and make a split at a particular value of that feature. Let s just do this by eyeball. 16

17 Classification Trees x x1 17

18 Pruning Trees x x x x1 18

19 Nodes in Classification Trees For any node m, we consider the proportion of observations that we have assigned to class j (that is, estimated to be in class j). We denote the region of the feature space representing node m as R m, and the number of observations at that node as n m. Initially, of course, R 1 is the full space of the features in all observations and n 1 = n. Define ˆp mj = 1 n m x i R m I(y i = j). 19

20 Nodes in Classification Trees The class of a node is j(m) = argmax j ˆp mj. This is not well-defined if there is no unique maximum. The most common way of dealing with this is to may a random (or arbitrary) choice. In the case of only two classes with numeric labels, another way of assigning a class to a node is to take the average value of the class labels. Let s identify these in the figure on the previous slide. (The numbering of the nodes is arbitrary, but it should be done systematically.) 20

21 Impurity of Nodes in Classification Trees Nodes are split based on their impurity. A pure node has only one class, and obviously would not be split. There are various measures of impurity. Misclassification error: 1 n m i R m I(y i j(m)) = 1 ˆp mj(m). (Note that we also use R m to denote the set of indices of observations at node m.) Gini index: ˆp mjˆp m j = j j k ˆp mj (1 ˆp mj ) j=1 Cross-entropy: ˆp mj log(ˆp mj ) j ˆp mj >0 21

22 Impurity of Nodes in Classification Trees Let s identify these in the figure on the earlier slide. Let s number the nodes so that 2 bottom part of graph 3 top part of graph 4 bottom left part of graph 5 bottom right part of graph Notice that for node 5, which has 6 observations, half and half, the misclassification error requires us to choose a class for the node. There is no unique argmax j ˆp mj. The misclassification error is invariant to our choice, however. Note that if we try to use some kind of average class value, the misclassification error would need to be defined differently. 22

23 Impurity of Nodes in Classification Trees For example in node 5, which has 6 observations, half and half, we get Misclassification error: 0.5 Gini index: 0.5 Cross-entropy:

24 Classification and Regression Trees in R There are some R packages that provide functions for classification trees and regression trees. The R package tree was the first and probably still the most common one. The main function is tree. This function has several options. Other functions are tree.control, predict.tree, and prune.tree. The arguments of tree.control, such as minsize can also be included in the invocation of tree. 24

25 Classification and Regression Trees in R Another R package is rpart. It is probably the best one. The main function is rpart. This function has several options. The technical report by Therneau and Atkinson (2011, and subsequent dates) remains the best documentation for rpart. 25

26 Methods in rpart The R function rpart allows different splitting criteria, which can be specified in the argument method. For a regression tree, the obvious criterion is the reduction in sum of squares. Since this is the idea in analysis of variance, this method is called anova. This method is the default unless the response is a factor. For a classification tree, one common criterion is the Gini index. Use of the Gini index is specified by the method called class. This method is the default when the response is a factor. 26

27 Data Frames in R Over the years, as data frames have been developed in R, I have been more aggravated by the non-intuitive aspects of the structure and by its limited uses in R functions than I been pleased with its usefulness. In the data frame of my example, even if we use y=factor(c(rep(1,n1),rep(2,n2))) or y=as.factor(c(rep(1,n1),rep(2,n2))), y is not of mode factor. In case, yy=factor(example$y) yields a variable of mode factor. Of course, we could put this variable in the data frame. 27

28 Classification Trees in R Rather than fool with the vagaries required to coerce the variable in the data frame to be of mode factor, I prefer to do the coercion at the point of usage. The advantage of this is that nothing is hidden in the code. Hiding properties in an object is great, unless you want to be sure of what is being done. library(tree) attach(exclass) classtree <- tree(as.factor(y)~x1+x2) 28

29 Classification Trees in R This produces node), split, n, deviance, yval, (yprob) * denotes terminal node 1) root ( ) 2) x1 < ( ) * 3) x1 > ( ) 6) x2 < ( ) * 7) x2 > ( ) * Which is better than we got by our crude eye method. 29

30 Classification Trees in R Let s plot it: minx1 <-min(x1) maxx1 <-max(x1) minx2 <-min(x2) maxx2 <-max(x2) plot(x1,x2,col=y) lines(c( , ),c(minx2,maxx2)) lines(c( ,maxx2),c( , )) 30

31 Example: Classification Trees x x1 31

32 Prediction in the Classification Tree Example Now, let s classify some new observations using the classification tree that R computed. newdata <- data.frame(cbind(x1=c(1,-1),x2=c(3,0))) predict.tree(classtree, newdata) This produces [,1] [,2] This is based on the proportion of the classes within each region. 32

Lecture 9: Classification and Regression Trees

Lecture 9: Classification and Regression Trees Lecture 9: Classification and Regression Trees Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical

More information

Decision Trees An Early Classifier

Decision Trees An Early Classifier An Early Classifier Jason Corso SUNY at Buffalo January 19, 2012 J. Corso (SUNY at Buffalo) Trees January 19, 2012 1 / 33 Introduction to Non-Metric Methods Introduction to Non-Metric Methods We cover

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 15: Tree-based Algorithms Cho-Jui Hsieh UC Davis March 7, 2018 Outline Decision Tree Random Forest Gradient Boosted Decision Tree (GBDT) Decision Tree Each node checks

More information

Pattern Recognition Chapter 5: Decision Trees

Pattern Recognition Chapter 5: Decision Trees Pattern Recognition Chapter 5: Decision Trees Asst. Prof. Dr. Chumphol Bunkhumpornpat Department of Computer Science Faculty of Science Chiang Mai University Learning Objectives How decision trees are

More information

Using Random Forests in conintegrated pairs trading

Using Random Forests in conintegrated pairs trading Using Random Forests in conintegrated pairs trading By: Reimer Meulenbeek Supervisor Radboud University: Prof. dr. E.A. Cator Supervisors FRIJT BV: Dr. O. de Mirleau Drs. M. Meuwissen November 5, 2017

More information

(iii) Under equal cluster sampling, show that ( ) notations. (d) Attempt any four of the following:

(iii) Under equal cluster sampling, show that ( ) notations. (d) Attempt any four of the following: Central University of Rajasthan Department of Statistics M.Sc./M.A. Statistics (Actuarial)-IV Semester End of Semester Examination, May-2012 MSTA 401: Sampling Techniques and Econometric Methods Max. Marks:

More information

The Multistep Binomial Model

The Multistep Binomial Model Lecture 10 The Multistep Binomial Model Reminder: Mid Term Test Friday 9th March - 12pm Examples Sheet 1 4 (not qu 3 or qu 5 on sheet 4) Lectures 1-9 10.1 A Discrete Model for Stock Price Reminder: The

More information

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations

MLLunsford 1. Activity: Central Limit Theorem Theory and Computations MLLunsford 1 Activity: Central Limit Theorem Theory and Computations Concepts: The Central Limit Theorem; computations using the Central Limit Theorem. Prerequisites: The student should be familiar with

More information

Test #1 (Solution Key)

Test #1 (Solution Key) STAT 47/67 Test #1 (Solution Key) 1. (To be done by hand) Exploring his own drink-and-drive habits, a student recalls the last 7 parties that he attended. He records the number of cans of beer he drank,

More information

Interpolation. 1 What is interpolation? 2 Why are we interested in this?

Interpolation. 1 What is interpolation? 2 Why are we interested in this? Interpolation 1 What is interpolation? For a certain function f (x we know only the values y 1 = f (x 1,,y n = f (x n For a point x different from x 1,,x n we would then like to approximate f ( x using

More information

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree Introduction MS4424 Data Mining & Modelling Decision Tree Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk decision tree is a set of rules represented in a tree structure

More information

Modeling and Forecasting Customer Behavior for Revolving Credit Facilities

Modeling and Forecasting Customer Behavior for Revolving Credit Facilities Modeling and Forecasting Customer Behavior for Revolving Credit Facilities Radoslava Mirkov 1, Holger Thomae 1, Michael Feist 2, Thomas Maul 1, Gordon Gillespie 1, Bastian Lie 1 1 TriSolutions GmbH, Hamburg,

More information

Multiple regression - a brief introduction

Multiple regression - a brief introduction Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict

More information

Chapter 16. Random Variables. Copyright 2010 Pearson Education, Inc.

Chapter 16. Random Variables. Copyright 2010 Pearson Education, Inc. Chapter 16 Random Variables Copyright 2010 Pearson Education, Inc. Expected Value: Center A random variable assumes a value based on the outcome of a random event. We use a capital letter, like X, to denote

More information

Session 5. Predictive Modeling in Life Insurance

Session 5. Predictive Modeling in Life Insurance SOA Predictive Analytics Seminar Hong Kong 29 Aug. 2018 Hong Kong Session 5 Predictive Modeling in Life Insurance Jingyi Zhang, Ph.D Predictive Modeling in Life Insurance JINGYI ZHANG PhD Scientist Global

More information

Loan Approval and Quality Prediction in the Lending Club Marketplace

Loan Approval and Quality Prediction in the Lending Club Marketplace Loan Approval and Quality Prediction in the Lending Club Marketplace Final Write-up Yondon Fu, Matt Marcus and Shuo Zheng Introduction Lending Club is a peer-to-peer lending marketplace where individual

More information

χ 2 distributions and confidence intervals for population variance

χ 2 distributions and confidence intervals for population variance χ 2 distributions and confidence intervals for population variance Let Z be a standard Normal random variable, i.e., Z N(0, 1). Define Y = Z 2. Y is a non-negative random variable. Its distribution is

More information

Comparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations

Comparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations Comparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations Recai Yucel 1 Introduction This section introduces the general notation used throughout this

More information

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-93 Decision Trees STEIN/LETTMANN 2005-2017 Overfitting Definition 10 (Overfitting)

More information

MBF1413 Quantitative Methods

MBF1413 Quantitative Methods MBF1413 Quantitative Methods Prepared by Dr Khairul Anuar 4: Decision Analysis Part 1 www.notes638.wordpress.com 1. Problem Formulation a. Influence Diagrams b. Payoffs c. Decision Trees Content 2. Decision

More information

Regression. Lecture Notes VII

Regression. Lecture Notes VII Regression Lecture Notes VII Statistics 112, Fall 2002 Outline Predicting based on Use of the conditional mean (the regression function) to make predictions. Prediction based on a sample. Regression line.

More information

Non-linearities in Simple Regression

Non-linearities in Simple Regression Non-linearities in Simple Regression 1. Eample: Monthly Earnings and Years of Education In this tutorial, we will focus on an eample that eplores the relationship between total monthly earnings and years

More information

Enforcing monotonicity of decision models: algorithm and performance

Enforcing monotonicity of decision models: algorithm and performance Enforcing monotonicity of decision models: algorithm and performance Marina Velikova 1 and Hennie Daniels 1,2 A case study of hedonic price model 1 Tilburg University, CentER for Economic Research,Tilburg,

More information

Tutorial 6. Sampling Distribution. ENGG2450A Tutors. 27 February The Chinese University of Hong Kong 1/6

Tutorial 6. Sampling Distribution. ENGG2450A Tutors. 27 February The Chinese University of Hong Kong 1/6 Tutorial 6 Sampling Distribution ENGG2450A Tutors The Chinese University of Hong Kong 27 February 2017 1/6 Random Sample and Sampling Distribution 2/6 Random sample Consider a random variable X with distribution

More information

Loan Approval and Quality Prediction in the Lending Club Marketplace

Loan Approval and Quality Prediction in the Lending Club Marketplace Loan Approval and Quality Prediction in the Lending Club Marketplace Milestone Write-up Yondon Fu, Shuo Zheng and Matt Marcus Recap Lending Club is a peer-to-peer lending marketplace where individual investors

More information

Risk Reduction Potential

Risk Reduction Potential Risk Reduction Potential Research Paper 006 February, 015 015 Northstar Risk Corp. All rights reserved. info@northstarrisk.com Risk Reduction Potential In this paper we introduce the concept of risk reduction

More information

IOP 201-Q (Industrial Psychological Research) Tutorial 5

IOP 201-Q (Industrial Psychological Research) Tutorial 5 IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,

More information

Top-down particle filtering for Bayesian decision trees

Top-down particle filtering for Bayesian decision trees Top-down particle filtering for Bayesian decision trees Balaji Lakshminarayanan 1, Daniel M. Roy 2 and Yee Whye Teh 3 1. Gatsby Unit, UCL, 2. University of Cambridge and 3. University of Oxford Outline

More information

Chapter 11: Dynamic Games and First and Second Movers

Chapter 11: Dynamic Games and First and Second Movers Chapter : Dynamic Games and First and Second Movers Learning Objectives Students should learn to:. Extend the reaction function ideas developed in the Cournot duopoly model to a model of sequential behavior

More information

Econometric Methods for Valuation Analysis

Econometric Methods for Valuation Analysis Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 26 Correlation Analysis Simple Regression

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation The likelihood and log-likelihood functions are the basis for deriving estimators for parameters, given data. While the shapes of these two functions are different, they have

More information

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management

THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management BA 386T Tom Shively PROBABILITY CONCEPTS AND NORMAL DISTRIBUTIONS The fundamental idea underlying any statistical

More information

f x f x f x f x x 5 3 y-intercept: y-intercept: y-intercept: y-intercept: y-intercept of a linear function written in function notation

f x f x f x f x x 5 3 y-intercept: y-intercept: y-intercept: y-intercept: y-intercept of a linear function written in function notation Questions/ Main Ideas: Algebra Notes TOPIC: Function Translations and y-intercepts Name: Period: Date: What is the y-intercept of a graph? The four s given below are written in notation. For each one,

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 4. Cross-Sectional Models and Trading Strategies Steve Yang Stevens Institute of Technology 09/26/2013 Outline 1 Cross-Sectional Methods for Evaluation of Factor

More information

Monte-Carlo Methods in Financial Engineering

Monte-Carlo Methods in Financial Engineering Monte-Carlo Methods in Financial Engineering Universität zu Köln May 12, 2017 Outline Table of Contents 1 Introduction 2 Repetition Definitions Least-Squares Method 3 Derivation Mathematical Derivation

More information

Lecture 18 Section Mon, Feb 16, 2009

Lecture 18 Section Mon, Feb 16, 2009 The s the Lecture 18 Section 5.3.4 Hampden-Sydney College Mon, Feb 16, 2009 Outline The s the 1 2 3 The 4 s 5 the 6 The s the Exercise 5.12, page 333. The five-number summary for the distribution of income

More information

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15

STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 For this assignment use the Diamonds dataset in the Stat2Data library. The dataset is used in examples

More information

Making Hard Decision. ENCE 627 Decision Analysis for Engineering. Identify the decision situation and understand objectives. Identify alternatives

Making Hard Decision. ENCE 627 Decision Analysis for Engineering. Identify the decision situation and understand objectives. Identify alternatives CHAPTER Duxbury Thomson Learning Making Hard Decision Third Edition RISK ATTITUDES A. J. Clark School of Engineering Department of Civil and Environmental Engineering 13 FALL 2003 By Dr. Ibrahim. Assakkaf

More information

Algorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information

Algorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information Algorithmic Game Theory and Applications Lecture 11: Games of Perfect Information Kousha Etessami finite games of perfect information Recall, a perfect information (PI) game has only 1 node per information

More information

Lecture 18 Section Mon, Sep 29, 2008

Lecture 18 Section Mon, Sep 29, 2008 The s the Lecture 18 Section 5.3.4 Hampden-Sydney College Mon, Sep 29, 2008 Outline The s the 1 2 3 The 4 s 5 the 6 The s the Exercise 5.12, page 333. The five-number summary for the distribution of income

More information

Accepted Manuscript. Example-Dependent Cost-Sensitive Decision Trees. Alejandro Correa Bahnsen, Djamila Aouada, Björn Ottersten

Accepted Manuscript. Example-Dependent Cost-Sensitive Decision Trees. Alejandro Correa Bahnsen, Djamila Aouada, Björn Ottersten Accepted Manuscript Example-Dependent Cost-Sensitive Decision Trees Alejandro Correa Bahnsen, Djamila Aouada, Björn Ottersten PII: S0957-4174(15)00284-5 DOI: http://dx.doi.org/10.1016/j.eswa.2015.04.042

More information

Moments and Measures of Skewness and Kurtosis

Moments and Measures of Skewness and Kurtosis Moments and Measures of Skewness and Kurtosis Moments The term moment has been taken from physics. The term moment in statistical use is analogous to moments of forces in physics. In statistics the values

More information

Homework Assignment Section 3

Homework Assignment Section 3 Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.

More information

LECTURE 2: MULTIPERIOD MODELS AND TREES

LECTURE 2: MULTIPERIOD MODELS AND TREES LECTURE 2: MULTIPERIOD MODELS AND TREES 1. Introduction One-period models, which were the subject of Lecture 1, are of limited usefulness in the pricing and hedging of derivative securities. In real-world

More information

UNIT 5 DECISION MAKING

UNIT 5 DECISION MAKING UNIT 5 DECISION MAKING This unit: UNDER UNCERTAINTY Discusses the techniques to deal with uncertainties 1 INTRODUCTION Few decisions in construction industry are made with certainty. Need to look at: The

More information

CHAPTER 5 SAMPLING DISTRIBUTIONS

CHAPTER 5 SAMPLING DISTRIBUTIONS CHAPTER 5 SAMPLING DISTRIBUTIONS Sampling Variability. We will visualize our data as a random sample from the population with unknown parameter μ. Our sample mean Ȳ is intended to estimate population mean

More information

Lecture Note Set 3 3 N-PERSON GAMES. IE675 Game Theory. Wayne F. Bialas 1 Monday, March 10, N-Person Games in Strategic Form

Lecture Note Set 3 3 N-PERSON GAMES. IE675 Game Theory. Wayne F. Bialas 1 Monday, March 10, N-Person Games in Strategic Form IE675 Game Theory Lecture Note Set 3 Wayne F. Bialas 1 Monday, March 10, 003 3 N-PERSON GAMES 3.1 N-Person Games in Strategic Form 3.1.1 Basic ideas We can extend many of the results of the previous chapter

More information

Change of Measure (Cameron-Martin-Girsanov Theorem)

Change of Measure (Cameron-Martin-Girsanov Theorem) Change of Measure Cameron-Martin-Girsanov Theorem Radon-Nikodym derivative: Taking again our intuition from the discrete world, we know that, in the context of option pricing, we need to price the claim

More information

Discrete Random Variables

Discrete Random Variables Discrete Random Variables In this chapter, we introduce a new concept that of a random variable or RV. A random variable is a model to help us describe the state of the world around us. Roughly, a RV can

More information

A new look at tree based approaches

A new look at tree based approaches A new look at tree based approaches Xifeng Wang University of North Carolina Chapel Hill xifeng@live.unc.edu April 18, 2018 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 1 / 27 Outline of this

More information

MODEL SELECTION CRITERIA IN R:

MODEL SELECTION CRITERIA IN R: 1. R 2 statistics We may use MODEL SELECTION CRITERIA IN R R 2 = SS R SS T = 1 SS Res SS T or R 2 Adj = 1 SS Res/(n p) SS T /(n 1) = 1 ( ) n 1 (1 R 2 ). n p where p is the total number of parameters. R

More information

Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 18 PERT

Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 18 PERT Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 18 PERT (Refer Slide Time: 00:56) In the last class we completed the C P M critical path analysis

More information

The misleading nature of correlations

The misleading nature of correlations The misleading nature of correlations In this note we explain certain subtle features of calculating correlations between time-series. Correlation is a measure of linear co-movement, to be contrasted with

More information

Hedging and Regression. Hedging and Regression

Hedging and Regression. Hedging and Regression Returns The discrete return on a stock is the percentage change: S i S i 1 S i 1. The index i can represent days, weeks, hours etc. What happens if we compute returns at infinitesimally short intervals

More information

Random Effects... and more about pigs G G G G G G G G G G G

Random Effects... and more about pigs G G G G G G G G G G G et s examine the random effects model in terms of the pig weight example. This had eight litters, and in the first analysis we were willing to think of as fixed effects. This means that we might want to

More information

Section 0: Introduction and Review of Basic Concepts

Section 0: Introduction and Review of Basic Concepts Section 0: Introduction and Review of Basic Concepts Carlos M. Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1 Getting Started Syllabus

More information

- 1 - **** d(lns) = (µ (1/2)σ 2 )dt + σdw t

- 1 - **** d(lns) = (µ (1/2)σ 2 )dt + σdw t - 1 - **** These answers indicate the solutions to the 2014 exam questions. Obviously you should plot graphs where I have simply described the key features. It is important when plotting graphs to label

More information

Outline. Objective. Previous Results Our Results Discussion Current Research. 1 Motivation. 2 Model. 3 Results

Outline. Objective. Previous Results Our Results Discussion Current Research. 1 Motivation. 2 Model. 3 Results On Threshold Esteban 1 Adam 2 Ravi 3 David 4 Sergei 1 1 Stanford University 2 Harvard University 3 Yahoo! Research 4 Carleton College The 8th ACM Conference on Electronic Commerce EC 07 Outline 1 2 3 Some

More information

NEWCASTLE UNIVERSITY. School SEMESTER /2013 ACE2013. Statistics for Marketing and Management. Time allowed: 2 hours

NEWCASTLE UNIVERSITY. School SEMESTER /2013 ACE2013. Statistics for Marketing and Management. Time allowed: 2 hours NEWCASTLE UNIVERSITY School SEMESTER 2 2012/2013 Statistics for Marketing and Management Time allowed: 2 hours Candidates should attempt ALL questions. Marks for each question are indicated. However you

More information

Monotonically Constrained Bayesian Additive Regression Trees

Monotonically Constrained Bayesian Additive Regression Trees Constrained Bayesian Additive Regression Trees Robert McCulloch University of Chicago, Booth School of Business Joint with: Hugh Chipman (Acadia), Ed George (UPenn, Wharton), Tom Shively (U Texas, McCombs)

More information

Predicting Economic Recession using Data Mining Techniques

Predicting Economic Recession using Data Mining Techniques Predicting Economic Recession using Data Mining Techniques Authors Naveed Ahmed Kartheek Atluri Tapan Patwardhan Meghana Viswanath Predicting Economic Recession using Data Mining Techniques Page 1 Abstract

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ. 9 Point estimation 9.1 Rationale behind point estimation When sampling from a population described by a pdf f(x θ) or probability function P [X = x θ] knowledge of θ gives knowledge of the entire population.

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 8 Recursive Partitioning: Large Companies and Glaucoma Diagnosis 8.1 Introduction 8.2 Recursive Partitioning 8.3

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 05 Normal Distribution So far we have looked at discrete distributions

More information

High Frequency Trading Strategy Based on Prex Trees

High Frequency Trading Strategy Based on Prex Trees High Frequency Trading Strategy Based on Prex Trees Yijia Zhou, 05592862, Financial Mathematics, Stanford University December 11, 2010 1 Introduction 1.1 Goal I am an M.S. Finanical Mathematics student

More information

u (x) < 0. and if you believe in diminishing return of the wealth, then you would require

u (x) < 0. and if you believe in diminishing return of the wealth, then you would require Chapter 8 Markowitz Portfolio Theory 8.7 Investor Utility Functions People are always asked the question: would more money make you happier? The answer is usually yes. The next question is how much more

More information

DATA SUMMARIZATION AND VISUALIZATION

DATA SUMMARIZATION AND VISUALIZATION APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296

More information

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1

Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.

More information

NOTES ON FIBONACCI TREES AND THEIR OPTIMALITY* YASUICHI HORIBE INTRODUCTION 1. FIBONACCI TREES

NOTES ON FIBONACCI TREES AND THEIR OPTIMALITY* YASUICHI HORIBE INTRODUCTION 1. FIBONACCI TREES 0#0# NOTES ON FIBONACCI TREES AND THEIR OPTIMALITY* YASUICHI HORIBE Shizuoka University, Hamamatsu, 432, Japan (Submitted February 1982) INTRODUCTION Continuing a previous paper [3], some new observations

More information

Lecture 10: Point Estimation

Lecture 10: Point Estimation Lecture 10: Point Estimation MSU-STT-351-Sum-17B (P. Vellaisamy: MSU-STT-351-Sum-17B) Probability & Statistics for Engineers 1 / 31 Basic Concepts of Point Estimation A point estimate of a parameter θ,

More information

The Loans_processed.csv file is the dataset we obtained after the pre-processing part where the clean-up python code was used.

The Loans_processed.csv file is the dataset we obtained after the pre-processing part where the clean-up python code was used. Machine Learning Group Homework 3 MSc Business Analytics Team 9 Alexander Romanenko, Artemis Tomadaki, Justin Leiendecker, Zijun Wei, Reza Brianca Widodo The Loans_processed.csv file is the dataset we

More information

Chapter 8: CAPM. 1. Single Index Model. 2. Adding a Riskless Asset. 3. The Capital Market Line 4. CAPM. 5. The One-Fund Theorem

Chapter 8: CAPM. 1. Single Index Model. 2. Adding a Riskless Asset. 3. The Capital Market Line 4. CAPM. 5. The One-Fund Theorem Chapter 8: CAPM 1. Single Index Model 2. Adding a Riskless Asset 3. The Capital Market Line 4. CAPM 5. The One-Fund Theorem 6. The Characteristic Line 7. The Pricing Model Single Index Model 1 1. Covariance

More information

Statistic Midterm. Spring This is a closed-book, closed-notes exam. You may use any calculator.

Statistic Midterm. Spring This is a closed-book, closed-notes exam. You may use any calculator. Statistic Midterm Spring 2018 This is a closed-book, closed-notes exam. You may use any calculator. Please answer all problems in the space provided on the exam. Read each question carefully and clearly

More information

INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics

INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS 20 th May 2013 Subject CT3 Probability & Mathematical Statistics Time allowed: Three Hours (10.00 13.00) Total Marks: 100 INSTRUCTIONS TO THE CANDIDATES 1.

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Feb. 20th, Recursive, Stochastic Growth Model

Feb. 20th, Recursive, Stochastic Growth Model Feb 20th, 2007 1 Recursive, Stochastic Growth Model In previous sections, we discussed random shocks, stochastic processes and histories Now we will introduce those concepts into the growth model and analyze

More information

ELEMENTS OF MONTE CARLO SIMULATION

ELEMENTS OF MONTE CARLO SIMULATION APPENDIX B ELEMENTS OF MONTE CARLO SIMULATION B. GENERAL CONCEPT The basic idea of Monte Carlo simulation is to create a series of experimental samples using a random number sequence. According to the

More information

MAFS Computational Methods for Pricing Structured Products

MAFS Computational Methods for Pricing Structured Products MAFS550 - Computational Methods for Pricing Structured Products Solution to Homework Two Course instructor: Prof YK Kwok 1 Expand f(x 0 ) and f(x 0 x) at x 0 into Taylor series, where f(x 0 ) = f(x 0 )

More information

Corporate Finance, Module 21: Option Valuation. Practice Problems. (The attached PDF file has better formatting.) Updated: July 7, 2005

Corporate Finance, Module 21: Option Valuation. Practice Problems. (The attached PDF file has better formatting.) Updated: July 7, 2005 Corporate Finance, Module 21: Option Valuation Practice Problems (The attached PDF file has better formatting.) Updated: July 7, 2005 {This posting has more information than is needed for the corporate

More information

ECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium

ECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium ECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium Let us consider the following sequential game with incomplete information. Two players are playing

More information

Intro to GLM Day 2: GLM and Maximum Likelihood

Intro to GLM Day 2: GLM and Maximum Likelihood Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the

More information

And The Winner Is? How to Pick a Better Model

And The Winner Is? How to Pick a Better Model And The Winner Is? How to Pick a Better Model Part 2 Goodness-of-Fit and Internal Stability Dan Tevet, FCAS, MAAA Goodness-of-Fit Trying to answer question: How well does our model fit the data? Can be

More information

Two-Sample Cross Tabulation: Application to Poverty and Child. Malnutrition in Tanzania

Two-Sample Cross Tabulation: Application to Poverty and Child. Malnutrition in Tanzania Two-Sample Cross Tabulation: Application to Poverty and Child Malnutrition in Tanzania Tomoki Fujii and Roy van der Weide December 5, 2008 Abstract We apply small-area estimation to produce cross tabulations

More information

a 13 Notes on Hidden Markov Models Michael I. Jordan University of California at Berkeley Hidden Markov Models The model

a 13 Notes on Hidden Markov Models Michael I. Jordan University of California at Berkeley Hidden Markov Models The model Notes on Hidden Markov Models Michael I. Jordan University of California at Berkeley Hidden Markov Models This is a lightly edited version of a chapter in a book being written by Jordan. Since this is

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Evaluation of Models. Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Evaluation of Models. Niels Landwehr Universität Potsdam Institut für Informatik ehrstuhl Maschinelles ernen Evaluation of Models Niels andwehr earning and Prediction Classification, Regression: earning problem Input: training data Output:

More information

Introduction to Population Modeling

Introduction to Population Modeling Introduction to Population Modeling In addition to estimating the size of a population, it is often beneficial to estimate how the population size changes over time. Ecologists often uses models to create

More information

The Impact of Shareholder Taxation on Merger and Acquisition Behavior

The Impact of Shareholder Taxation on Merger and Acquisition Behavior The Impact of Shareholder Taxation on Merger and Acquisition Behavior Eric Ohrn, Grinnell College Nathan Seegert, University of Utah Grinnell College Department of Economics Seminar November 8, 2016 Introduction

More information

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example... Chapter 4 Point estimation Contents 4.1 Introduction................................... 2 4.2 Estimating a population mean......................... 2 4.2.1 The problem with estimating a population mean

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis These models are appropriate when the response

More information

Characterization of the Optimum

Characterization of the Optimum ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing

More information

Gradient Boosting Trees: theory and applications

Gradient Boosting Trees: theory and applications Gradient Boosting Trees: theory and applications Dmitry Efimov November 05, 2016 Outline Decision trees Boosting Boosting trees Metaparameters and tuning strategies How-to-use remarks Regression tree True

More information

AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:

AP Stats: 3B ~ Least Squares Regression and Residuals. Objectives: Objectives: INTERPRET the slope and y intercept of a least-squares regression line USE the least-squares regression line to predict y for a given x CALCULATE and INTERPRET residuals and their standard

More information

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Solutions to Final Exam

Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Solutions to Final Exam Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (30 pts) Answer briefly the following questions. 1. Suppose that

More information

Wage Determinants Analysis by Quantile Regression Tree

Wage Determinants Analysis by Quantile Regression Tree Communications of the Korean Statistical Society 2012, Vol. 19, No. 2, 293 301 DOI: http://dx.doi.org/10.5351/ckss.2012.19.2.293 Wage Determinants Analysis by Quantile Regression Tree Youngjae Chang 1,a

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections 1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:

More information

Jaime Frade Dr. Niu Interest rate modeling

Jaime Frade Dr. Niu Interest rate modeling Interest rate modeling Abstract In this paper, three models were used to forecast short term interest rates for the 3 month LIBOR. Each of the models, regression time series, GARCH, and Cox, Ingersoll,

More information

MAS187/AEF258. University of Newcastle upon Tyne

MAS187/AEF258. University of Newcastle upon Tyne MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................

More information

A Hidden Markov Model Approach to Information-Based Trading: Theory and Applications

A Hidden Markov Model Approach to Information-Based Trading: Theory and Applications A Hidden Markov Model Approach to Information-Based Trading: Theory and Applications Online Supplementary Appendix Xiangkang Yin and Jing Zhao La Trobe University Corresponding author, Department of Finance,

More information

Economics 430 Handout on Rational Expectations: Part I. Review of Statistics: Notation and Definitions

Economics 430 Handout on Rational Expectations: Part I. Review of Statistics: Notation and Definitions Economics 430 Chris Georges Handout on Rational Expectations: Part I Review of Statistics: Notation and Definitions Consider two random variables X and Y defined over m distinct possible events. Event

More information