A new look at tree based approaches

Similar documents
ECS171: Machine Learning

Decision Trees An Early Classifier

Session 5. Predictive Modeling in Life Insurance

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Statistical Data Mining for Computational Financial Modeling

Top-down particle filtering for Bayesian decision trees

Lecture 9: Classification and Regression Trees

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Bootstrap Inference for Multiple Imputation Under Uncongeniality

Article from. Predictive Analytics and Futurism. June 2017 Issue 15

Extending MCTS

SEGMENTATION FOR CREDIT-BASED DELINQUENCY MODELS. May 2006

Predictive Modeling Cross Selling of Home Loans to Credit Card Customers

Machine Learning Performance over Long Time Frame

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Unfold Income Myth: Revolution in Income Models with Advanced Machine Learning. Techniques for Better Accuracy

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree

1. Challenges for early warning models in the policy process. 2. The evaluation approach: What is a good early warning model?

An introduction to Machine learning methods and forecasting of time series in financial markets

An Online Algorithm for Multi-Strategy Trading Utilizing Market Regimes

Session 57PD, Predicting High Claimants. Presenters: Zoe Gibbs Brian M. Hartman, ASA. SOA Antitrust Disclaimer SOA Presentation Disclaimer

The Balance-Matching Heuristic *

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time

Foreign Exchange Forecasting via Machine Learning

Lecture 21: Logit Models for Multinomial Responses Continued

MS&E 448 Final Presentation High Frequency Algorithmic Trading

Session 79PD, Using Predictive Analytics to Develop Assumptions. Moderator/Presenter: Jonathan D. White, FSA, MAAA, CERA

Wage Determinants Analysis by Quantile Regression Tree

Credit Card Default Predictive Modeling

Predictive Risk Categorization of Retail Bank Loans Using Data Mining Techniques

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Q1. [?? pts] Search Traces

LendingClub Loan Default and Profitability Prediction

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

Questions of Statistical Analysis and Discrete Choice Models

Enforcing monotonicity of decision models: algorithm and performance

Test #1 (Solution Key)

Producing actionable insights from predictive models built upon condensed electronic medical records.

WC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology

Technical Appendix. This appendix provides more details about patient identification, consent, randomization,

Algorithms and Networking for Computer Games

Loan Approval and Quality Prediction in the Lending Club Marketplace

Predicting Foreign Exchange Arbitrage

Decision making in the presence of uncertainty

Gamma Distribution Fitting

CS188 Spring 2012 Section 4: Games

Conditional inference trees in dynamic microsimulation - modelling transition probabilities in the SMILE model

Mining Investment Venture Rules from Insurance Data Based on Decision Tree

Log-linear Modeling Under Generalized Inverse Sampling Scheme

Technical Appendices to Extracting Summary Piles from Sorting Task Data

Expanding Predictive Analytics Through the Use of Machine Learning

CFA Level II - LOS Changes

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

Competition price analysis in non-life insurance

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

The Determinants of Bank Mergers: A Revealed Preference Analysis

Optimal Window Selection for Forecasting in The Presence of Recent Structural Breaks

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

Gradient Boosting Trees: theory and applications

Prior knowledge in economic applications of data mining

UGM Crash Course: Conditional Inference and Cutset Conditioning

Boosting Actuarial Regression Models in R

Deviations from Optimal Corporate Cash Holdings and the Valuation from a Shareholder s Perspective

Adaptive Experiments for Policy Choice. March 8, 2019

State Switching in US Equity Index Returns based on SETAR Model with Kalman Filter Tracking

Implementing Personalized Medicine: Estimating Optimal Treatment Regimes

A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam

Decision making in the presence of uncertainty

Introduction. Not as simple as. Sample Size Calculations. The Three Most Important Components of any Study Are

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

FIT OR HIT IN CHOICE MODELS

A Genetic Algorithm improving tariff variables reclassification for risk segmentation in Motor Third Party Liability Insurance.

Volatility Models and Their Applications

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Modeling Private Firm Default: PFirm

Implementing Models in Quantitative Finance: Methods and Cases

Internet Appendix for: Does Going Public Affect Innovation?

Handling Imbalanced Data Sets in Insurance Risk Modeling

UPDATED IAA EDUCATION SYLLABUS

A Comparative Study of Various Forecasting Techniques in Predicting. BSE S&P Sensex

Computational Statistics Handbook with MATLAB

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Predicting the stock price companies using artificial neural networks (ANN) method (Case Study: National Iranian Copper Industries Company)

Session 113 PD, Data and Model Actuaries Should be an Expert of Both. Moderator: David L. Snell, ASA, MAAA

Market Risk Analysis Volume I

Predicting Economic Recession using Data Mining Techniques

Machine Learning (CSE 446): Pratical issues: optimization and learning

How Can Quantitative Behavioral Finance Uncover Trader Motivations?

Pattern Recognition Chapter 5: Decision Trees

Test Volume 12, Number 1. June 2003

Jean Lemaire with Sojung Park and Kili Wang

MWSUG Paper AA 04. Claims Analytics. Mei Najim, Gallagher Bassett Services, Rolling Meadows, IL

Statistical Case Estimation Modelling

Transcription:

A new look at tree based approaches Xifeng Wang University of North Carolina Chapel Hill xifeng@live.unc.edu April 18, 2018 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 1 / 27

Outline of this presentation 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 2 / 27

Outline 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 3 / 27

Many drugs failed at Phase 3 1. Around 50% confirmatory phase III trials for new drug failed, posing great financial burdens to drug developers. 2. Among them, around 50% ended with efficacy failure: failed to meet primary or secondary efficacy endpoints. 3. Around 30% failed due to safety issues; 20% commercial issues. 4. Oncology trials (48% failure rate) failed more often than non-oncology trials (29% failure rate). Grignolo, A, Pretorius S. Phase III Trial Failures: Costly, But Preventable. Applied Clinical Trials 2016; 25: 36-42. Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 4 / 27

Many drugs failed at Phase 3 Main triggers of failures: Inadequate basic science Flaw study design (phase II surrogate endpoint not confirmed by Phase III clinical outcome) Suboptimal dose selection Flawed data collection and analysis Problems with study operations Other... Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 5 / 27

Population heterogeneity Another possible cause: the new drug is effective, but only for certain sub-population. (population heterogeneity) For approved drugs, the existence of population heterogeneity could imply sub-optimal use of approved drugs This means, failed drugs might actually be useful, approved drugs may be under-valued. Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 6 / 27

Subgroup analysis: overview Subgroup analysis for efficacy: partitioning of the entire covariate space into subsets of patients that are homogeneous with respect to the treatment effect and therefore can be used to evaluate the expected treatment effect (versus control) for patients with a specific set of covariates. Subgroups pre-specified: pre-define subgroups, e.g., age < 50, WBC < 4.3 10 9 /L, male/female, smoker/no-smoker,... Subgroups learned from data: regression tree and extensions Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 7 / 27

Subgroup analysis: Tree based method Tree-based methods play an important role in subgroup analysis Belongs to the field of statistical learning An important branch in predictive modelling approaches Different approaches have been developed: CART, Random Forest, Ensemble, Gradient Boosting Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 8 / 27

Outline 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 9 / 27

CART: classification and regression tree 1. Growing initial tree. Grow an initial tree. At each node there is a rule for splitting data based on the cutoff of one variable. Pre-set stopping rules for deciding when a branch is terminal and can be split no more. Different tree learning algorithms: ID3 (Iterative Dichotomiser 3) C4.5 (successor of ID3) CART (Classification And Regression Tree) CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing classification trees. MARS: extends decision trees to handle numerical data better.... 2. Prune To prevent over-fitting Cross-validation Extremely daunting and groundless and hard to justify / standardize... Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 10 / 27

IT: Interaction Tree A data-driven tree procedure, labelled as interaction trees (IT), to explore the heterogeneity structure of the treatment effect across a number of subgroups that are objectively defined in a post hoc manner. Su, X., Tsai, C. L., Wang, H., Nickerson, D. M., & Li, B. (2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10(Feb), 141-158. Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 11 / 27

SIDES: subgroup identification based on differential effect search A novel recursive partitioning procedure, which allows direct evaluation of the treatment effect in subgroups and is particularly tuned to evaluating modest- sized data sets, as well as large databases, from randomized clinical trials or other health outcomes databases. Lipkovich, I., Dmitrienko, A., Denne, J., & Enas, G. (2011). Subgroup identification based on differential effect search-a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in medicine, 30(21), 2601-2621. Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 12 / 27

Tree-based approaches: pros and cons Pros Cons Heuristic ideas Simplicity of results, easy to explain... nonparametric and nonlinear. Exploratory by nature, difficult to do inference, e.g., type 1 error control. Stand-alone, unclear connection with classical models Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 13 / 27

Outline 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 14 / 27

A decision tree is a regression model 1. growing initial tree = rule/feature generation 2. pruning = model (variable) selection Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 15 / 27

A decision tree is a regression model: a hypothetical example True model: y = β 0 + β 1 Trt + β 2 I (x 1 <= 0.5) + β 3 I (x 2 <= 0.5) + β 4 I (x 1 <= 0.5) Trt + β 5 sign(x 2 <= 0.5) Trt + ɛ with (β 0, β 1, β 2, β 3, β 4, β 5 ) = (2, 2, 2, 2, 2). Here we assume the error term ɛ comes from the standard normal distribution N(0, 1). We generated n = 1000 observations from the above model, but with 8 noise variables x 3,..., x 10 added. And x 1 x 10 are simulated from a discrete uniform distribution over (0.02, 0.04,..., 1.00). Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 16 / 27

Estimated Tree by CART trt< 0.5 x2>=0.505 x2>=0.505 1.999 n=128 x1>=0.505 3.908 n=104 4.061 n=124 x1>=0.5 5.918 n=113 x1>=0.505 x1>=0.505 4.006 n=139 7.901 n=127 7.954 n=121 12.1 n=144 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 17 / 27

Estimated model: new approach 1. Generate feature using the initial tree from CART(without pruning) 2. Model (variable) selection using LASSO + BIC Estimated model: Esti. Term True Coef Esti. Coef Intercept 2.0 2.12 Trt 2.0 2.06 I (x 1 <= 0.483) 2.0 1.79 I (x 2 <= 0.505) 2.0 1.98 Trt I (x 1 <= 0.483) 2.0 1.96 Trt I (x 2 <= 0.505) 2.0 1.91 I (x 1 <= 0.483) I (x 2 <= 0.505) 0 0.01 Trt I (x 1 <= 0.483) I (x 2 <= 0.505) 0 0.24 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 18 / 27

A decision tree is a regression model: proof of concept We apply model selection approach (e.g., LASSO) on generated rules and examine the performance of selected model with pruned trees in terms of 1. variable selection performance: sensitivity, specificity HITS: count the frequency of the final model which is split by X1 and X2 and only by them Less-true: count how many true terms do we miss in the model More-true: count how many extra terms do we involve in the model. 2. model estimation performance: Mean Squared Error(MSE) of coefficient estimates Prediction Error of response based on 300-observation test data Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 19 / 27

Simulation Results CART HITS 100% 100% Prediction Error 96.76 94.60 MSE NA 0.437 Less-True=0 NA 99% More-True=0 NA 36% More-True=1 NA 57% More-True 2 NA 7% New Approach Table: Simulation results for regression tree based on 100 simulation runs Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 20 / 27

Special trees: IT We generate data from two models. Each data set consists of a continuous response Y, a binary treatment, and four covariates X 1 X 4 simulated from a discrete uniform distribution over (0.02, 0.04,..., 1.00) However, only a subset of the covariates interact with the treatment. Model 1: Model 2: Y = 2 + 2 trt + 2Z 1 + 2Z 2 + ɛ Y = 2 + 2 trt + 2Z 1 + 2Z 2 + 2 trt Z 1 + 2 trt Z 2 + ɛ Here we assume the error term ɛ comes from the standard normal distribution N(0, 1). And Z 1 = I (X 1 0.5), Z 2 = I (X 2 0.5) Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 21 / 27

Estimated model: IT The Final IT Structure x1 0.5 x2 0.5 x2 0.5 n=152 n=147 n=148 n=153 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 22 / 27

Estimated model: new approach 1. Generate feature using a single boosted tree 2. Model (variable) selection using LASSO + BIC Estimated model:(600 observations) Esti. Term True Coef Esti. Coef Trt 2.0 2.06 Trt I (x 1 <= 0.483) 2.0 1.96 Trt I (x 2 <= 0.505) 2.0 1.91 Trt I (x 1 <= 0.483) I (x 2 <= 0.505) 0 0.24 Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 23 / 27

Simulation Results Models Hits (IT) Hits MSE Less-True=0 Model 1 98.5% 95.5% 0.20 100% Model 2 98.5% 98% 0.29 100% Models More-True=0 More-True=1 More-True 2 Model 1 19.5% 45.5% 35% Model 2 68.5% 27.5% 4% Table: Simulation results for IT based on 200 simulation runs Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 24 / 27

Why bother? Why look at tree-based approaches differently? Regression models theories / methodologies are very mature, easy to borrow strength Variable (model) selection has been pretty well studied Inference (e.g., significance, Type 1 error control) is possible Many extensions are possible, e.g., hidden (latent) heterogeneity. Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 25 / 27

Outline 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 26 / 27

Challenges and future directions Error-in-Variable issue: estimated rules/features is NOT the TRUE rules/fearures. Does this cause error-in-variable issue? Inference: post-learning inference, or post-selection inference in the presence of error-in-variable... feature generation (feature learning): more flexible approach Xifeng Wang (UNC-Chapel Hill) Short title April 18, 2018 27 / 27