Classification and Regression Trees
|
|
- Dorthy Carr
- 5 years ago
- Views:
Transcription
1 Classification and Regression Trees In unsupervised classification (clustering), there is no response variable ( dependent variable), the regions corresponding to a given node are based on a similarity of the observations to each other. In classification and regression trees, the region at each node is based on some similarity of the response variables to each other. Classification and regression trees are formed divisively, based on a response variable. If a node has more than one group, we may divide the node into multiple nodes that are more pure. 1
2 Classification and Regression Trees We will use the same notation as we used for clustering trees. Each node in the tree corresponds to some specific region X m of the feature space (space of independent variables). We will also occasionally abuse the notation slightly to use X m as the set of indexes i such that x i X m. The branches from a node are defined in terms of rules to split the feature space. 2
3 Classification and Regression Trees If the response variable is a group category, a classification tree is formed. Each node in a classification tree corresponds to the predominant value of the response within the subdomain of the features corresponding to that node. If the response is a numeric variable, a regression tree is formed. Each node in a regression tree corresponds to the average of the response within a given region, rather than to a predominant value of the response. 3
4 Impurity of Nodes of Trees Nodes are split based on their impurity. Impurity is a measure of how badly the observations at a given node fit the model. In a regression tree, for example, the impurity may be measured by the residual sum of squares within that node. In a classification tree, there are various ways of measuring the impurity, such as the misclassification error, the Gini index, and the entropy. 4
5 Deviance The term deviance is used in various ways in statistics. In general, it is a measure of variability that is not accounted for by the fitted model. It is usually not scaled either to account for the number of observations or to account for their magnitude; thus, a larger set of observations will usually have a larger deviance than a smaller set in the same situation, and likewise, data with larger values will usually have a larger deviance than the same data if measured on a larger scale. The fact that a deviance is not scaled for the number of observations yields an additive property for the nodes in a tree. 5
6 Regression Trees Our general model for regression (in one form) has been y i = β 0 + x T i β + ɛ i, where ɛ i is assumed to be a random variable with E(ɛ i ) = 0. This yields an expression for y i conditional on the corresponding x i ; hence we may write the left hand side as y i x i. The model for a regression tree is of the form y i x i = µ m + ɛ i, where m is determined by index of the region X m of the feature space such that x i X m, µ m = E(Y x i X m ), and ɛ i is assumed to be a random variable with E(ɛ i ) = 0 as before. At node m the model fitted is just ŷ i = 1 n m i X m y i. 6
7 Measure of Impurity in Regression Trees The obvious unscaled measure of impurity of any node in a regression tree is the residual sum of squares, RSS. An node m, it is just i X m (y i ŷ i ) 2. This is called the deviance. 7
8 Example set.seed(5) n <- 20 n1 <- 12 n2 <- n-n1 exreg <- data.frame(cbind(x1=c(rnorm(n1),rnorm(n2)+1.0), x2=c(rnorm(n1),rnorm(n2)+1.5), y=(c(1+0.25*rnorm(n1),2+0.25*rnorm(n2)))) exreg attach(exreg) plot(x1,x2,main="population Means of Responses") text(x1[1:n1]+.05,x2[1:n1], 1 ) text(x1[(n1+1):n]+.05,x2[(n1+1):n], 2 ) 8
9 The Data for the Regression Tree x1 x2 y
10 The Data for the Regression Tree Population Means of Responses 2 x x1 10
11 R on the Regression Tree Example library(tree) regtree <- tree(y~x1+x2) This produces node), split, n, deviance, yval * denotes terminal node 1) root ) x1 < ) x2 < * 5) x2 > * 3) x1 > ) x2 < * 7) x2 > * 11
12 Prediction in the Regression Tree Example Now, let s classify some new observations using the regression tree that R computed. newdata <- data.frame(cbind(x1=c(1,-1),x2=c(3,0))) predict.tree(regtree, newdata) This produces This is based on the average value of the response within each region. 12
13 Classification Trees in R 13
14 Example set.seed(5) n <- 20 n1 <- 12 n2 <- n-n1 exclass <- data.frame(cbind(x1=c(rnorm(n1),rnorm(n2)+1.0), x2=c(rnorm(n1),rnorm(n2)+1.5), y=c(rep(1,n1),rep(2,n2)))) attach(exclass) plot(x1,x2,col=y) 14
15 x x1 15
16 Classification Trees A classification tree is formed by recursively dividing up the space of the data. A simple procedure is to choose one feature at a time and make a split at a particular value of that feature. Let s just do this by eyeball. 16
17 Classification Trees x x1 17
18 Pruning Trees x x x x1 18
19 Nodes in Classification Trees For any node m, we consider the proportion of observations that we have assigned to class j (that is, estimated to be in class j). We denote the region of the feature space representing node m as R m, and the number of observations at that node as n m. Initially, of course, R 1 is the full space of the features in all observations and n 1 = n. Define ˆp mj = 1 n m x i R m I(y i = j). 19
20 Nodes in Classification Trees The class of a node is j(m) = argmax j ˆp mj. This is not well-defined if there is no unique maximum. The most common way of dealing with this is to may a random (or arbitrary) choice. In the case of only two classes with numeric labels, another way of assigning a class to a node is to take the average value of the class labels. Let s identify these in the figure on the previous slide. (The numbering of the nodes is arbitrary, but it should be done systematically.) 20
21 Impurity of Nodes in Classification Trees Nodes are split based on their impurity. A pure node has only one class, and obviously would not be split. There are various measures of impurity. Misclassification error: 1 n m i R m I(y i j(m)) = 1 ˆp mj(m). (Note that we also use R m to denote the set of indices of observations at node m.) Gini index: ˆp mjˆp m j = j j k ˆp mj (1 ˆp mj ) j=1 Cross-entropy: ˆp mj log(ˆp mj ) j ˆp mj >0 21
22 Impurity of Nodes in Classification Trees Let s identify these in the figure on the earlier slide. Let s number the nodes so that 2 bottom part of graph 3 top part of graph 4 bottom left part of graph 5 bottom right part of graph Notice that for node 5, which has 6 observations, half and half, the misclassification error requires us to choose a class for the node. There is no unique argmax j ˆp mj. The misclassification error is invariant to our choice, however. Note that if we try to use some kind of average class value, the misclassification error would need to be defined differently. 22
23 Impurity of Nodes in Classification Trees For example in node 5, which has 6 observations, half and half, we get Misclassification error: 0.5 Gini index: 0.5 Cross-entropy:
24 Classification and Regression Trees in R There are some R packages that provide functions for classification trees and regression trees. The R package tree was the first and probably still the most common one. The main function is tree. This function has several options. Other functions are tree.control, predict.tree, and prune.tree. The arguments of tree.control, such as minsize can also be included in the invocation of tree. 24
25 Classification and Regression Trees in R Another R package is rpart. It is probably the best one. The main function is rpart. This function has several options. The technical report by Therneau and Atkinson (2011, and subsequent dates) remains the best documentation for rpart. 25
26 Methods in rpart The R function rpart allows different splitting criteria, which can be specified in the argument method. For a regression tree, the obvious criterion is the reduction in sum of squares. Since this is the idea in analysis of variance, this method is called anova. This method is the default unless the response is a factor. For a classification tree, one common criterion is the Gini index. Use of the Gini index is specified by the method called class. This method is the default when the response is a factor. 26
27 Data Frames in R Over the years, as data frames have been developed in R, I have been more aggravated by the non-intuitive aspects of the structure and by its limited uses in R functions than I been pleased with its usefulness. In the data frame of my example, even if we use y=factor(c(rep(1,n1),rep(2,n2))) or y=as.factor(c(rep(1,n1),rep(2,n2))), y is not of mode factor. In case, yy=factor(example$y) yields a variable of mode factor. Of course, we could put this variable in the data frame. 27
28 Classification Trees in R Rather than fool with the vagaries required to coerce the variable in the data frame to be of mode factor, I prefer to do the coercion at the point of usage. The advantage of this is that nothing is hidden in the code. Hiding properties in an object is great, unless you want to be sure of what is being done. library(tree) attach(exclass) classtree <- tree(as.factor(y)~x1+x2) 28
29 Classification Trees in R This produces node), split, n, deviance, yval, (yprob) * denotes terminal node 1) root ( ) 2) x1 < ( ) * 3) x1 > ( ) 6) x2 < ( ) * 7) x2 > ( ) * Which is better than we got by our crude eye method. 29
30 Classification Trees in R Let s plot it: minx1 <-min(x1) maxx1 <-max(x1) minx2 <-min(x2) maxx2 <-max(x2) plot(x1,x2,col=y) lines(c( , ),c(minx2,maxx2)) lines(c( ,maxx2),c( , )) 30
31 Example: Classification Trees x x1 31
32 Prediction in the Classification Tree Example Now, let s classify some new observations using the classification tree that R computed. newdata <- data.frame(cbind(x1=c(1,-1),x2=c(3,0))) predict.tree(classtree, newdata) This produces [,1] [,2] This is based on the proportion of the classes within each region. 32
Lecture 9: Classification and Regression Trees
Lecture 9: Classification and Regression Trees Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical
More informationDecision Trees An Early Classifier
An Early Classifier Jason Corso SUNY at Buffalo January 19, 2012 J. Corso (SUNY at Buffalo) Trees January 19, 2012 1 / 33 Introduction to Non-Metric Methods Introduction to Non-Metric Methods We cover
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 15: Tree-based Algorithms Cho-Jui Hsieh UC Davis March 7, 2018 Outline Decision Tree Random Forest Gradient Boosted Decision Tree (GBDT) Decision Tree Each node checks
More informationPattern Recognition Chapter 5: Decision Trees
Pattern Recognition Chapter 5: Decision Trees Asst. Prof. Dr. Chumphol Bunkhumpornpat Department of Computer Science Faculty of Science Chiang Mai University Learning Objectives How decision trees are
More informationUsing Random Forests in conintegrated pairs trading
Using Random Forests in conintegrated pairs trading By: Reimer Meulenbeek Supervisor Radboud University: Prof. dr. E.A. Cator Supervisors FRIJT BV: Dr. O. de Mirleau Drs. M. Meuwissen November 5, 2017
More information(iii) Under equal cluster sampling, show that ( ) notations. (d) Attempt any four of the following:
Central University of Rajasthan Department of Statistics M.Sc./M.A. Statistics (Actuarial)-IV Semester End of Semester Examination, May-2012 MSTA 401: Sampling Techniques and Econometric Methods Max. Marks:
More informationThe Multistep Binomial Model
Lecture 10 The Multistep Binomial Model Reminder: Mid Term Test Friday 9th March - 12pm Examples Sheet 1 4 (not qu 3 or qu 5 on sheet 4) Lectures 1-9 10.1 A Discrete Model for Stock Price Reminder: The
More informationMLLunsford 1. Activity: Central Limit Theorem Theory and Computations
MLLunsford 1 Activity: Central Limit Theorem Theory and Computations Concepts: The Central Limit Theorem; computations using the Central Limit Theorem. Prerequisites: The student should be familiar with
More informationTest #1 (Solution Key)
STAT 47/67 Test #1 (Solution Key) 1. (To be done by hand) Exploring his own drink-and-drive habits, a student recalls the last 7 parties that he attended. He records the number of cans of beer he drank,
More informationInterpolation. 1 What is interpolation? 2 Why are we interested in this?
Interpolation 1 What is interpolation? For a certain function f (x we know only the values y 1 = f (x 1,,y n = f (x n For a point x different from x 1,,x n we would then like to approximate f ( x using
More informationTree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree
Introduction MS4424 Data Mining & Modelling Decision Tree Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk decision tree is a set of rules represented in a tree structure
More informationModeling and Forecasting Customer Behavior for Revolving Credit Facilities
Modeling and Forecasting Customer Behavior for Revolving Credit Facilities Radoslava Mirkov 1, Holger Thomae 1, Michael Feist 2, Thomas Maul 1, Gordon Gillespie 1, Bastian Lie 1 1 TriSolutions GmbH, Hamburg,
More informationMultiple regression - a brief introduction
Multiple regression - a brief introduction Multiple regression is an extension to regular (simple) regression. Instead of one X, we now have several. Suppose, for example, that you are trying to predict
More informationChapter 16. Random Variables. Copyright 2010 Pearson Education, Inc.
Chapter 16 Random Variables Copyright 2010 Pearson Education, Inc. Expected Value: Center A random variable assumes a value based on the outcome of a random event. We use a capital letter, like X, to denote
More informationSession 5. Predictive Modeling in Life Insurance
SOA Predictive Analytics Seminar Hong Kong 29 Aug. 2018 Hong Kong Session 5 Predictive Modeling in Life Insurance Jingyi Zhang, Ph.D Predictive Modeling in Life Insurance JINGYI ZHANG PhD Scientist Global
More informationLoan Approval and Quality Prediction in the Lending Club Marketplace
Loan Approval and Quality Prediction in the Lending Club Marketplace Final Write-up Yondon Fu, Matt Marcus and Shuo Zheng Introduction Lending Club is a peer-to-peer lending marketplace where individual
More informationχ 2 distributions and confidence intervals for population variance
χ 2 distributions and confidence intervals for population variance Let Z be a standard Normal random variable, i.e., Z N(0, 1). Define Y = Z 2. Y is a non-negative random variable. Its distribution is
More informationComparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations
Comparison of design-based sample mean estimate with an estimate under re-sampling-based multiple imputations Recai Yucel 1 Introduction This section introduces the general notation used throughout this
More informationChapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning
Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-93 Decision Trees STEIN/LETTMANN 2005-2017 Overfitting Definition 10 (Overfitting)
More informationMBF1413 Quantitative Methods
MBF1413 Quantitative Methods Prepared by Dr Khairul Anuar 4: Decision Analysis Part 1 www.notes638.wordpress.com 1. Problem Formulation a. Influence Diagrams b. Payoffs c. Decision Trees Content 2. Decision
More informationRegression. Lecture Notes VII
Regression Lecture Notes VII Statistics 112, Fall 2002 Outline Predicting based on Use of the conditional mean (the regression function) to make predictions. Prediction based on a sample. Regression line.
More informationNon-linearities in Simple Regression
Non-linearities in Simple Regression 1. Eample: Monthly Earnings and Years of Education In this tutorial, we will focus on an eample that eplores the relationship between total monthly earnings and years
More informationEnforcing monotonicity of decision models: algorithm and performance
Enforcing monotonicity of decision models: algorithm and performance Marina Velikova 1 and Hennie Daniels 1,2 A case study of hedonic price model 1 Tilburg University, CentER for Economic Research,Tilburg,
More informationTutorial 6. Sampling Distribution. ENGG2450A Tutors. 27 February The Chinese University of Hong Kong 1/6
Tutorial 6 Sampling Distribution ENGG2450A Tutors The Chinese University of Hong Kong 27 February 2017 1/6 Random Sample and Sampling Distribution 2/6 Random sample Consider a random variable X with distribution
More informationLoan Approval and Quality Prediction in the Lending Club Marketplace
Loan Approval and Quality Prediction in the Lending Club Marketplace Milestone Write-up Yondon Fu, Shuo Zheng and Matt Marcus Recap Lending Club is a peer-to-peer lending marketplace where individual investors
More informationRisk Reduction Potential
Risk Reduction Potential Research Paper 006 February, 015 015 Northstar Risk Corp. All rights reserved. info@northstarrisk.com Risk Reduction Potential In this paper we introduce the concept of risk reduction
More informationIOP 201-Q (Industrial Psychological Research) Tutorial 5
IOP 201-Q (Industrial Psychological Research) Tutorial 5 TRUE/FALSE [1 point each] Indicate whether the sentence or statement is true or false. 1. To establish a cause-and-effect relation between two variables,
More informationTop-down particle filtering for Bayesian decision trees
Top-down particle filtering for Bayesian decision trees Balaji Lakshminarayanan 1, Daniel M. Roy 2 and Yee Whye Teh 3 1. Gatsby Unit, UCL, 2. University of Cambridge and 3. University of Oxford Outline
More informationChapter 11: Dynamic Games and First and Second Movers
Chapter : Dynamic Games and First and Second Movers Learning Objectives Students should learn to:. Extend the reaction function ideas developed in the Cournot duopoly model to a model of sequential behavior
More informationEconometric Methods for Valuation Analysis
Econometric Methods for Valuation Analysis Margarita Genius Dept of Economics M. Genius (Univ. of Crete) Econometric Methods for Valuation Analysis Cagliari, 2017 1 / 26 Correlation Analysis Simple Regression
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation The likelihood and log-likelihood functions are the basis for deriving estimators for parameters, given data. While the shapes of these two functions are different, they have
More informationTHE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management
THE UNIVERSITY OF TEXAS AT AUSTIN Department of Information, Risk, and Operations Management BA 386T Tom Shively PROBABILITY CONCEPTS AND NORMAL DISTRIBUTIONS The fundamental idea underlying any statistical
More informationf x f x f x f x x 5 3 y-intercept: y-intercept: y-intercept: y-intercept: y-intercept of a linear function written in function notation
Questions/ Main Ideas: Algebra Notes TOPIC: Function Translations and y-intercepts Name: Period: Date: What is the y-intercept of a graph? The four s given below are written in notation. For each one,
More informationFE670 Algorithmic Trading Strategies. Stevens Institute of Technology
FE670 Algorithmic Trading Strategies Lecture 4. Cross-Sectional Models and Trading Strategies Steve Yang Stevens Institute of Technology 09/26/2013 Outline 1 Cross-Sectional Methods for Evaluation of Factor
More informationMonte-Carlo Methods in Financial Engineering
Monte-Carlo Methods in Financial Engineering Universität zu Köln May 12, 2017 Outline Table of Contents 1 Introduction 2 Repetition Definitions Least-Squares Method 3 Derivation Mathematical Derivation
More informationLecture 18 Section Mon, Feb 16, 2009
The s the Lecture 18 Section 5.3.4 Hampden-Sydney College Mon, Feb 16, 2009 Outline The s the 1 2 3 The 4 s 5 the 6 The s the Exercise 5.12, page 333. The five-number summary for the distribution of income
More informationSTATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15
STATISTICS 110/201, FALL 2017 Homework #5 Solutions Assigned Mon, November 6, Due Wed, November 15 For this assignment use the Diamonds dataset in the Stat2Data library. The dataset is used in examples
More informationMaking Hard Decision. ENCE 627 Decision Analysis for Engineering. Identify the decision situation and understand objectives. Identify alternatives
CHAPTER Duxbury Thomson Learning Making Hard Decision Third Edition RISK ATTITUDES A. J. Clark School of Engineering Department of Civil and Environmental Engineering 13 FALL 2003 By Dr. Ibrahim. Assakkaf
More informationAlgorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information
Algorithmic Game Theory and Applications Lecture 11: Games of Perfect Information Kousha Etessami finite games of perfect information Recall, a perfect information (PI) game has only 1 node per information
More informationLecture 18 Section Mon, Sep 29, 2008
The s the Lecture 18 Section 5.3.4 Hampden-Sydney College Mon, Sep 29, 2008 Outline The s the 1 2 3 The 4 s 5 the 6 The s the Exercise 5.12, page 333. The five-number summary for the distribution of income
More informationAccepted Manuscript. Example-Dependent Cost-Sensitive Decision Trees. Alejandro Correa Bahnsen, Djamila Aouada, Björn Ottersten
Accepted Manuscript Example-Dependent Cost-Sensitive Decision Trees Alejandro Correa Bahnsen, Djamila Aouada, Björn Ottersten PII: S0957-4174(15)00284-5 DOI: http://dx.doi.org/10.1016/j.eswa.2015.04.042
More informationMoments and Measures of Skewness and Kurtosis
Moments and Measures of Skewness and Kurtosis Moments The term moment has been taken from physics. The term moment in statistical use is analogous to moments of forces in physics. In statistics the values
More informationHomework Assignment Section 3
Homework Assignment Section 3 Tengyuan Liang Business Statistics Booth School of Business Problem 1 A company sets different prices for a particular stereo system in eight different regions of the country.
More informationLECTURE 2: MULTIPERIOD MODELS AND TREES
LECTURE 2: MULTIPERIOD MODELS AND TREES 1. Introduction One-period models, which were the subject of Lecture 1, are of limited usefulness in the pricing and hedging of derivative securities. In real-world
More informationUNIT 5 DECISION MAKING
UNIT 5 DECISION MAKING This unit: UNDER UNCERTAINTY Discusses the techniques to deal with uncertainties 1 INTRODUCTION Few decisions in construction industry are made with certainty. Need to look at: The
More informationCHAPTER 5 SAMPLING DISTRIBUTIONS
CHAPTER 5 SAMPLING DISTRIBUTIONS Sampling Variability. We will visualize our data as a random sample from the population with unknown parameter μ. Our sample mean Ȳ is intended to estimate population mean
More informationLecture Note Set 3 3 N-PERSON GAMES. IE675 Game Theory. Wayne F. Bialas 1 Monday, March 10, N-Person Games in Strategic Form
IE675 Game Theory Lecture Note Set 3 Wayne F. Bialas 1 Monday, March 10, 003 3 N-PERSON GAMES 3.1 N-Person Games in Strategic Form 3.1.1 Basic ideas We can extend many of the results of the previous chapter
More informationChange of Measure (Cameron-Martin-Girsanov Theorem)
Change of Measure Cameron-Martin-Girsanov Theorem Radon-Nikodym derivative: Taking again our intuition from the discrete world, we know that, in the context of option pricing, we need to price the claim
More informationDiscrete Random Variables
Discrete Random Variables In this chapter, we introduce a new concept that of a random variable or RV. A random variable is a model to help us describe the state of the world around us. Roughly, a RV can
More informationA new look at tree based approaches
A new look at tree based approaches Xifeng Wang University of North Carolina Chapel Hill xifeng@live.unc.edu April 18, 2018 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 1 / 27 Outline of this
More informationMODEL SELECTION CRITERIA IN R:
1. R 2 statistics We may use MODEL SELECTION CRITERIA IN R R 2 = SS R SS T = 1 SS Res SS T or R 2 Adj = 1 SS Res/(n p) SS T /(n 1) = 1 ( ) n 1 (1 R 2 ). n p where p is the total number of parameters. R
More informationOptimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 18 PERT
Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 18 PERT (Refer Slide Time: 00:56) In the last class we completed the C P M critical path analysis
More informationThe misleading nature of correlations
The misleading nature of correlations In this note we explain certain subtle features of calculating correlations between time-series. Correlation is a measure of linear co-movement, to be contrasted with
More informationHedging and Regression. Hedging and Regression
Returns The discrete return on a stock is the percentage change: S i S i 1 S i 1. The index i can represent days, weeks, hours etc. What happens if we compute returns at infinitesimally short intervals
More informationRandom Effects... and more about pigs G G G G G G G G G G G
et s examine the random effects model in terms of the pig weight example. This had eight litters, and in the first analysis we were willing to think of as fixed effects. This means that we might want to
More informationSection 0: Introduction and Review of Basic Concepts
Section 0: Introduction and Review of Basic Concepts Carlos M. Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1 Getting Started Syllabus
More information- 1 - **** d(lns) = (µ (1/2)σ 2 )dt + σdw t
- 1 - **** These answers indicate the solutions to the 2014 exam questions. Obviously you should plot graphs where I have simply described the key features. It is important when plotting graphs to label
More informationOutline. Objective. Previous Results Our Results Discussion Current Research. 1 Motivation. 2 Model. 3 Results
On Threshold Esteban 1 Adam 2 Ravi 3 David 4 Sergei 1 1 Stanford University 2 Harvard University 3 Yahoo! Research 4 Carleton College The 8th ACM Conference on Electronic Commerce EC 07 Outline 1 2 3 Some
More informationNEWCASTLE UNIVERSITY. School SEMESTER /2013 ACE2013. Statistics for Marketing and Management. Time allowed: 2 hours
NEWCASTLE UNIVERSITY School SEMESTER 2 2012/2013 Statistics for Marketing and Management Time allowed: 2 hours Candidates should attempt ALL questions. Marks for each question are indicated. However you
More informationMonotonically Constrained Bayesian Additive Regression Trees
Constrained Bayesian Additive Regression Trees Robert McCulloch University of Chicago, Booth School of Business Joint with: Hugh Chipman (Acadia), Ed George (UPenn, Wharton), Tom Shively (U Texas, McCombs)
More informationPredicting Economic Recession using Data Mining Techniques
Predicting Economic Recession using Data Mining Techniques Authors Naveed Ahmed Kartheek Atluri Tapan Patwardhan Meghana Viswanath Predicting Economic Recession using Data Mining Techniques Page 1 Abstract
More informationDRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics
Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward
More informationDefinition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.
9 Point estimation 9.1 Rationale behind point estimation When sampling from a population described by a pdf f(x θ) or probability function P [X = x θ] knowledge of θ gives knowledge of the entire population.
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 8 Recursive Partitioning: Large Companies and Glaucoma Diagnosis 8.1 Introduction 8.2 Recursive Partitioning 8.3
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 05 Normal Distribution So far we have looked at discrete distributions
More informationHigh Frequency Trading Strategy Based on Prex Trees
High Frequency Trading Strategy Based on Prex Trees Yijia Zhou, 05592862, Financial Mathematics, Stanford University December 11, 2010 1 Introduction 1.1 Goal I am an M.S. Finanical Mathematics student
More informationu (x) < 0. and if you believe in diminishing return of the wealth, then you would require
Chapter 8 Markowitz Portfolio Theory 8.7 Investor Utility Functions People are always asked the question: would more money make you happier? The answer is usually yes. The next question is how much more
More informationDATA SUMMARIZATION AND VISUALIZATION
APPENDIX DATA SUMMARIZATION AND VISUALIZATION PART 1 SUMMARIZATION 1: BUILDING BLOCKS OF DATA ANALYSIS 294 PART 2 PART 3 PART 4 VISUALIZATION: GRAPHS AND TABLES FOR SUMMARIZING AND ORGANIZING DATA 296
More informationStat 101 Exam 1 - Embers Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 - Embers Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2.
More informationNOTES ON FIBONACCI TREES AND THEIR OPTIMALITY* YASUICHI HORIBE INTRODUCTION 1. FIBONACCI TREES
0#0# NOTES ON FIBONACCI TREES AND THEIR OPTIMALITY* YASUICHI HORIBE Shizuoka University, Hamamatsu, 432, Japan (Submitted February 1982) INTRODUCTION Continuing a previous paper [3], some new observations
More informationLecture 10: Point Estimation
Lecture 10: Point Estimation MSU-STT-351-Sum-17B (P. Vellaisamy: MSU-STT-351-Sum-17B) Probability & Statistics for Engineers 1 / 31 Basic Concepts of Point Estimation A point estimate of a parameter θ,
More informationThe Loans_processed.csv file is the dataset we obtained after the pre-processing part where the clean-up python code was used.
Machine Learning Group Homework 3 MSc Business Analytics Team 9 Alexander Romanenko, Artemis Tomadaki, Justin Leiendecker, Zijun Wei, Reza Brianca Widodo The Loans_processed.csv file is the dataset we
More informationChapter 8: CAPM. 1. Single Index Model. 2. Adding a Riskless Asset. 3. The Capital Market Line 4. CAPM. 5. The One-Fund Theorem
Chapter 8: CAPM 1. Single Index Model 2. Adding a Riskless Asset 3. The Capital Market Line 4. CAPM 5. The One-Fund Theorem 6. The Characteristic Line 7. The Pricing Model Single Index Model 1 1. Covariance
More informationStatistic Midterm. Spring This is a closed-book, closed-notes exam. You may use any calculator.
Statistic Midterm Spring 2018 This is a closed-book, closed-notes exam. You may use any calculator. Please answer all problems in the space provided on the exam. Read each question carefully and clearly
More informationINSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS. 20 th May Subject CT3 Probability & Mathematical Statistics
INSTITUTE OF ACTUARIES OF INDIA EXAMINATIONS 20 th May 2013 Subject CT3 Probability & Mathematical Statistics Time allowed: Three Hours (10.00 13.00) Total Marks: 100 INSTRUCTIONS TO THE CANDIDATES 1.
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationFeb. 20th, Recursive, Stochastic Growth Model
Feb 20th, 2007 1 Recursive, Stochastic Growth Model In previous sections, we discussed random shocks, stochastic processes and histories Now we will introduce those concepts into the growth model and analyze
More informationELEMENTS OF MONTE CARLO SIMULATION
APPENDIX B ELEMENTS OF MONTE CARLO SIMULATION B. GENERAL CONCEPT The basic idea of Monte Carlo simulation is to create a series of experimental samples using a random number sequence. According to the
More informationMAFS Computational Methods for Pricing Structured Products
MAFS550 - Computational Methods for Pricing Structured Products Solution to Homework Two Course instructor: Prof YK Kwok 1 Expand f(x 0 ) and f(x 0 x) at x 0 into Taylor series, where f(x 0 ) = f(x 0 )
More informationCorporate Finance, Module 21: Option Valuation. Practice Problems. (The attached PDF file has better formatting.) Updated: July 7, 2005
Corporate Finance, Module 21: Option Valuation Practice Problems (The attached PDF file has better formatting.) Updated: July 7, 2005 {This posting has more information than is needed for the corporate
More informationECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium
ECONS 424 STRATEGY AND GAME THEORY HANDOUT ON PERFECT BAYESIAN EQUILIBRIUM- III Semi-Separating equilibrium Let us consider the following sequential game with incomplete information. Two players are playing
More informationIntro to GLM Day 2: GLM and Maximum Likelihood
Intro to GLM Day 2: GLM and Maximum Likelihood Federico Vegetti Central European University ECPR Summer School in Methods and Techniques 1 / 32 Generalized Linear Modeling 3 steps of GLM 1. Specify the
More informationAnd The Winner Is? How to Pick a Better Model
And The Winner Is? How to Pick a Better Model Part 2 Goodness-of-Fit and Internal Stability Dan Tevet, FCAS, MAAA Goodness-of-Fit Trying to answer question: How well does our model fit the data? Can be
More informationTwo-Sample Cross Tabulation: Application to Poverty and Child. Malnutrition in Tanzania
Two-Sample Cross Tabulation: Application to Poverty and Child Malnutrition in Tanzania Tomoki Fujii and Roy van der Weide December 5, 2008 Abstract We apply small-area estimation to produce cross tabulations
More informationa 13 Notes on Hidden Markov Models Michael I. Jordan University of California at Berkeley Hidden Markov Models The model
Notes on Hidden Markov Models Michael I. Jordan University of California at Berkeley Hidden Markov Models This is a lightly edited version of a chapter in a book being written by Jordan. Since this is
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Evaluation of Models. Niels Landwehr
Universität Potsdam Institut für Informatik ehrstuhl Maschinelles ernen Evaluation of Models Niels andwehr earning and Prediction Classification, Regression: earning problem Input: training data Output:
More informationIntroduction to Population Modeling
Introduction to Population Modeling In addition to estimating the size of a population, it is often beneficial to estimate how the population size changes over time. Ecologists often uses models to create
More informationThe Impact of Shareholder Taxation on Merger and Acquisition Behavior
The Impact of Shareholder Taxation on Merger and Acquisition Behavior Eric Ohrn, Grinnell College Nathan Seegert, University of Utah Grinnell College Department of Economics Seminar November 8, 2016 Introduction
More information4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...
Chapter 4 Point estimation Contents 4.1 Introduction................................... 2 4.2 Estimating a population mean......................... 2 4.2.1 The problem with estimating a population mean
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis These models are appropriate when the response
More informationCharacterization of the Optimum
ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 5. Portfolio Allocation with One Riskless, One Risky Asset Characterization of the Optimum Consider a risk-averse, expected-utility-maximizing
More informationGradient Boosting Trees: theory and applications
Gradient Boosting Trees: theory and applications Dmitry Efimov November 05, 2016 Outline Decision trees Boosting Boosting trees Metaparameters and tuning strategies How-to-use remarks Regression tree True
More informationAP Stats: 3B ~ Least Squares Regression and Residuals. Objectives:
Objectives: INTERPRET the slope and y intercept of a least-squares regression line USE the least-squares regression line to predict y for a given x CALCULATE and INTERPRET residuals and their standard
More informationGraduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay. Solutions to Final Exam
Graduate School of Business, University of Chicago Business 41202, Spring Quarter 2007, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (30 pts) Answer briefly the following questions. 1. Suppose that
More informationWage Determinants Analysis by Quantile Regression Tree
Communications of the Korean Statistical Society 2012, Vol. 19, No. 2, 293 301 DOI: http://dx.doi.org/10.5351/ckss.2012.19.2.293 Wage Determinants Analysis by Quantile Regression Tree Youngjae Chang 1,a
More informationChapter 7: Estimation Sections
1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:
More informationJaime Frade Dr. Niu Interest rate modeling
Interest rate modeling Abstract In this paper, three models were used to forecast short term interest rates for the 3 month LIBOR. Each of the models, regression time series, GARCH, and Cox, Ingersoll,
More informationMAS187/AEF258. University of Newcastle upon Tyne
MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................
More informationA Hidden Markov Model Approach to Information-Based Trading: Theory and Applications
A Hidden Markov Model Approach to Information-Based Trading: Theory and Applications Online Supplementary Appendix Xiangkang Yin and Jing Zhao La Trobe University Corresponding author, Department of Finance,
More informationEconomics 430 Handout on Rational Expectations: Part I. Review of Statistics: Notation and Definitions
Economics 430 Chris Georges Handout on Rational Expectations: Part I Review of Statistics: Notation and Definitions Consider two random variables X and Y defined over m distinct possible events. Event
More information