A new look at tree based approaches

Size: px
Start display at page:

Download "A new look at tree based approaches"

Transcription

1 A new look at tree based approaches Xifeng Wang University of North Carolina Chapel Hill April 18, 2018 Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

2 Outline of this presentation 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

3 Outline 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

4 Many drugs failed at Phase 3 1. Around 50% confirmatory phase III trials for new drug failed, posing great financial burdens to drug developers. 2. Among them, around 50% ended with efficacy failure: failed to meet primary or secondary efficacy endpoints. 3. Around 30% failed due to safety issues; 20% commercial issues. 4. Oncology trials (48% failure rate) failed more often than non-oncology trials (29% failure rate). Grignolo, A, Pretorius S. Phase III Trial Failures: Costly, But Preventable. Applied Clinical Trials 2016; 25: Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

5 Many drugs failed at Phase 3 Main triggers of failures: Inadequate basic science Flaw study design (phase II surrogate endpoint not confirmed by Phase III clinical outcome) Suboptimal dose selection Flawed data collection and analysis Problems with study operations Other... Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

6 Population heterogeneity Another possible cause: the new drug is effective, but only for certain sub-population. (population heterogeneity) For approved drugs, the existence of population heterogeneity could imply sub-optimal use of approved drugs This means, failed drugs might actually be useful, approved drugs may be under-valued. Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

7 Subgroup analysis: overview Subgroup analysis for efficacy: partitioning of the entire covariate space into subsets of patients that are homogeneous with respect to the treatment effect and therefore can be used to evaluate the expected treatment effect (versus control) for patients with a specific set of covariates. Subgroups pre-specified: pre-define subgroups, e.g., age < 50, WBC < /L, male/female, smoker/no-smoker,... Subgroups learned from data: regression tree and extensions Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

8 Subgroup analysis: Tree based method Tree-based methods play an important role in subgroup analysis Belongs to the field of statistical learning An important branch in predictive modelling approaches Different approaches have been developed: CART, Random Forest, Ensemble, Gradient Boosting Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

9 Outline 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

10 CART: classification and regression tree 1. Growing initial tree. Grow an initial tree. At each node there is a rule for splitting data based on the cutoff of one variable. Pre-set stopping rules for deciding when a branch is terminal and can be split no more. Different tree learning algorithms: ID3 (Iterative Dichotomiser 3) C4.5 (successor of ID3) CART (Classification And Regression Tree) CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing classification trees. MARS: extends decision trees to handle numerical data better Prune To prevent over-fitting Cross-validation Extremely daunting and groundless and hard to justify / standardize... Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

11 IT: Interaction Tree A data-driven tree procedure, labelled as interaction trees (IT), to explore the heterogeneity structure of the treatment effect across a number of subgroups that are objectively defined in a post hoc manner. Su, X., Tsai, C. L., Wang, H., Nickerson, D. M., & Li, B. (2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10(Feb), Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

12 SIDES: subgroup identification based on differential effect search A novel recursive partitioning procedure, which allows direct evaluation of the treatment effect in subgroups and is particularly tuned to evaluating modest- sized data sets, as well as large databases, from randomized clinical trials or other health outcomes databases. Lipkovich, I., Dmitrienko, A., Denne, J., & Enas, G. (2011). Subgroup identification based on differential effect search-a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in medicine, 30(21), Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

13 Tree-based approaches: pros and cons Pros Cons Heuristic ideas Simplicity of results, easy to explain... nonparametric and nonlinear. Exploratory by nature, difficult to do inference, e.g., type 1 error control. Stand-alone, unclear connection with classical models Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

14 Outline 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

15 A decision tree is a regression model 1. growing initial tree = rule/feature generation 2. pruning = model (variable) selection Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

16 A decision tree is a regression model: a hypothetical example True model: y = β 0 + β 1 Trt + β 2 I (x 1 <= 0.5) + β 3 I (x 2 <= 0.5) + β 4 I (x 1 <= 0.5) Trt + β 5 sign(x 2 <= 0.5) Trt + ɛ with (β 0, β 1, β 2, β 3, β 4, β 5 ) = (2, 2, 2, 2, 2). Here we assume the error term ɛ comes from the standard normal distribution N(0, 1). We generated n = 1000 observations from the above model, but with 8 noise variables x 3,..., x 10 added. And x 1 x 10 are simulated from a discrete uniform distribution over (0.02, 0.04,..., 1.00). Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

17 Estimated Tree by CART trt< 0.5 x2>=0.505 x2>= n=128 x1>= n= n=124 x1>= n=113 x1>=0.505 x1>= n= n= n= n=144 Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

18 Estimated model: new approach 1. Generate feature using the initial tree from CART(without pruning) 2. Model (variable) selection using LASSO + BIC Estimated model: Esti. Term True Coef Esti. Coef Intercept Trt I (x 1 <= 0.483) I (x 2 <= 0.505) Trt I (x 1 <= 0.483) Trt I (x 2 <= 0.505) I (x 1 <= 0.483) I (x 2 <= 0.505) Trt I (x 1 <= 0.483) I (x 2 <= 0.505) Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

19 A decision tree is a regression model: proof of concept We apply model selection approach (e.g., LASSO) on generated rules and examine the performance of selected model with pruned trees in terms of 1. variable selection performance: sensitivity, specificity HITS: count the frequency of the final model which is split by X1 and X2 and only by them Less-true: count how many true terms do we miss in the model More-true: count how many extra terms do we involve in the model. 2. model estimation performance: Mean Squared Error(MSE) of coefficient estimates Prediction Error of response based on 300-observation test data Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

20 Simulation Results CART HITS 100% 100% Prediction Error MSE NA Less-True=0 NA 99% More-True=0 NA 36% More-True=1 NA 57% More-True 2 NA 7% New Approach Table: Simulation results for regression tree based on 100 simulation runs Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

21 Special trees: IT We generate data from two models. Each data set consists of a continuous response Y, a binary treatment, and four covariates X 1 X 4 simulated from a discrete uniform distribution over (0.02, 0.04,..., 1.00) However, only a subset of the covariates interact with the treatment. Model 1: Model 2: Y = trt + 2Z 1 + 2Z 2 + ɛ Y = trt + 2Z 1 + 2Z trt Z trt Z 2 + ɛ Here we assume the error term ɛ comes from the standard normal distribution N(0, 1). And Z 1 = I (X 1 0.5), Z 2 = I (X 2 0.5) Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

22 Estimated model: IT The Final IT Structure x1 0.5 x2 0.5 x2 0.5 n=152 n=147 n=148 n=153 Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

23 Estimated model: new approach 1. Generate feature using a single boosted tree 2. Model (variable) selection using LASSO + BIC Estimated model:(600 observations) Esti. Term True Coef Esti. Coef Trt Trt I (x 1 <= 0.483) Trt I (x 2 <= 0.505) Trt I (x 1 <= 0.483) I (x 2 <= 0.505) Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

24 Simulation Results Models Hits (IT) Hits MSE Less-True=0 Model % 95.5% % Model % 98% % Models More-True=0 More-True=1 More-True 2 Model % 45.5% 35% Model % 27.5% 4% Table: Simulation results for IT based on 200 simulation runs Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

25 Why bother? Why look at tree-based approaches differently? Regression models theories / methodologies are very mature, easy to borrow strength Variable (model) selection has been pretty well studied Inference (e.g., significance, Type 1 error control) is possible Many extensions are possible, e.g., hidden (latent) heterogeneity. Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

26 Outline 1 Background and motivation 2 Tree and related learning approaches: pros and cons 3 A new look at tree based approaches 4 Challenges and future directions Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

27 Challenges and future directions Error-in-Variable issue: estimated rules/features is NOT the TRUE rules/fearures. Does this cause error-in-variable issue? Inference: post-learning inference, or post-selection inference in the presence of error-in-variable... feature generation (feature learning): more flexible approach Xifeng Wang (UNC-Chapel Hill) Short title April 18, / 27

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 15: Tree-based Algorithms Cho-Jui Hsieh UC Davis March 7, 2018 Outline Decision Tree Random Forest Gradient Boosted Decision Tree (GBDT) Decision Tree Each node checks

More information

Decision Trees An Early Classifier

Decision Trees An Early Classifier An Early Classifier Jason Corso SUNY at Buffalo January 19, 2012 J. Corso (SUNY at Buffalo) Trees January 19, 2012 1 / 33 Introduction to Non-Metric Methods Introduction to Non-Metric Methods We cover

More information

Session 5. Predictive Modeling in Life Insurance

Session 5. Predictive Modeling in Life Insurance SOA Predictive Analytics Seminar Hong Kong 29 Aug. 2018 Hong Kong Session 5 Predictive Modeling in Life Insurance Jingyi Zhang, Ph.D Predictive Modeling in Life Insurance JINGYI ZHANG PhD Scientist Global

More information

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman

Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman Predictive modelling around the world Peter Banthorpe, RGA Kevin Manning, Milliman 11 November 2013 Agenda Introduction to predictive analytics Applications overview Case studies Conclusions and Q&A Introduction

More information

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) CS22 Artificial Intelligence Stanford University Autumn 26-27 Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) Overview Lending Club is an online peer-to-peer lending

More information

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting

Quantile Regression. By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting Quantile Regression By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting Agenda Overview of Predictive Modeling for P&C Applications Quantile

More information

Statistical Data Mining for Computational Financial Modeling

Statistical Data Mining for Computational Financial Modeling Statistical Data Mining for Computational Financial Modeling Ali Serhan KOYUNCUGIL, Ph.D. Capital Markets Board of Turkey - Research Department Ankara, Turkey askoyuncugil@gmail.com www.koyuncugil.org

More information

Top-down particle filtering for Bayesian decision trees

Top-down particle filtering for Bayesian decision trees Top-down particle filtering for Bayesian decision trees Balaji Lakshminarayanan 1, Daniel M. Roy 2 and Yee Whye Teh 3 1. Gatsby Unit, UCL, 2. University of Cambridge and 3. University of Oxford Outline

More information

Lecture 9: Classification and Regression Trees

Lecture 9: Classification and Regression Trees Lecture 9: Classification and Regression Trees Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical

More information

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

Investing through Economic Cycles with Ensemble Machine Learning Algorithms Investing through Economic Cycles with Ensemble Machine Learning Algorithms Thomas Raffinot Silex Investment Partners Big Data in Finance Conference Thomas Raffinot (Silex-IP) Economic Cycles-Machine Learning

More information

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies

Machine Learning in Risk Forecasting and its Application in Low Volatility Strategies NEW THINKING Machine Learning in Risk Forecasting and its Application in Strategies By Yuriy Bodjov Artificial intelligence and machine learning are two terms that have gained increased popularity within

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 4. Cross-Sectional Models and Trading Strategies Steve Yang Stevens Institute of Technology 09/26/2013 Outline 1 Cross-Sectional Methods for Evaluation of Factor

More information

Bootstrap Inference for Multiple Imputation Under Uncongeniality

Bootstrap Inference for Multiple Imputation Under Uncongeniality Bootstrap Inference for Multiple Imputation Under Uncongeniality Jonathan Bartlett www.thestatsgeek.com www.missingdata.org.uk Department of Mathematical Sciences University of Bath, UK Joint Statistical

More information

Article from. Predictive Analytics and Futurism. June 2017 Issue 15

Article from. Predictive Analytics and Futurism. June 2017 Issue 15 Article from Predictive Analytics and Futurism June 2017 Issue 15 Using Predictive Modeling to Risk- Adjust Primary Care Panel Sizes By Anders Larson Most health actuaries are familiar with the concept

More information

Extending MCTS

Extending MCTS Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCT b) UCT is a type of MCTS

More information

SEGMENTATION FOR CREDIT-BASED DELINQUENCY MODELS. May 2006

SEGMENTATION FOR CREDIT-BASED DELINQUENCY MODELS. May 2006 SEGMENTATION FOR CREDIT-BASED DELINQUENCY MODELS May 006 Overview The objective of segmentation is to define a set of sub-populations that, when modeled individually and then combined, rank risk more effectively

More information

Predictive Modeling Cross Selling of Home Loans to Credit Card Customers

Predictive Modeling Cross Selling of Home Loans to Credit Card Customers PAKDD COMPETITION 2007 Predictive Modeling Cross Selling of Home Loans to Credit Card Customers Hualin Wang 1 Amy Yu 1 Kaixia Zhang 1 800 Tech Center Drive Gahanna, Ohio 43230, USA April 11, 2007 1 Outline

More information

Machine Learning Performance over Long Time Frame

Machine Learning Performance over Long Time Frame Machine Learning Performance over Long Time Frame Yazhe Li, Tony Bellotti, Niall Adams Imperial College London yli16@imperialacuk Credit Scoring and Credit Control Conference, Aug 2017 Yazhe Li (Imperial

More information

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop

Hierarchical Generalized Linear Models. Measurement Incorporated Hierarchical Linear Models Workshop Hierarchical Generalized Linear Models Measurement Incorporated Hierarchical Linear Models Workshop Hierarchical Generalized Linear Models So now we are moving on to the more advanced type topics. To begin

More information

Unfold Income Myth: Revolution in Income Models with Advanced Machine Learning. Techniques for Better Accuracy

Unfold Income Myth: Revolution in Income Models with Advanced Machine Learning. Techniques for Better Accuracy Unfold Income Myth: Revolution in Income Models with Advanced Machine Learning Techniques for Better Accuracy ABSTRACT Consumer IncomeView is the Equifax next-gen income estimation model that estimates

More information

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree

Tree Diagram. Splitting Criterion. Splitting Criterion. Introduction. Building a Decision Tree. MS4424 Data Mining & Modelling Decision Tree Introduction MS4424 Data Mining & Modelling Decision Tree Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk decision tree is a set of rules represented in a tree structure

More information

1. Challenges for early warning models in the policy process. 2. The evaluation approach: What is a good early warning model?

1. Challenges for early warning models in the policy process. 2. The evaluation approach: What is a good early warning model? Outline 1. Challenges for early warning models in the policy process 2. The evaluation approach: What is a good early warning model? 3. Types of early warning models: some examples 4. Caveats on thresholds

More information

An introduction to Machine learning methods and forecasting of time series in financial markets

An introduction to Machine learning methods and forecasting of time series in financial markets An introduction to Machine learning methods and forecasting of time series in financial markets Mark Wong markwong@kth.se December 10, 2016 Abstract The goal of this paper is to give the reader an introduction

More information

An Online Algorithm for Multi-Strategy Trading Utilizing Market Regimes

An Online Algorithm for Multi-Strategy Trading Utilizing Market Regimes An Online Algorithm for Multi-Strategy Trading Utilizing Market Regimes Hynek Mlnařík 1 Subramanian Ramamoorthy 2 Rahul Savani 1 1 Warwick Institute for Financial Computing Department of Computer Science

More information

Session 57PD, Predicting High Claimants. Presenters: Zoe Gibbs Brian M. Hartman, ASA. SOA Antitrust Disclaimer SOA Presentation Disclaimer

Session 57PD, Predicting High Claimants. Presenters: Zoe Gibbs Brian M. Hartman, ASA. SOA Antitrust Disclaimer SOA Presentation Disclaimer Session 57PD, Predicting High Claimants Presenters: Zoe Gibbs Brian M. Hartman, ASA SOA Antitrust Disclaimer SOA Presentation Disclaimer Using Asymmetric Cost Matrices to Optimize Wellness Intervention

More information

The Balance-Matching Heuristic *

The Balance-Matching Heuristic * How Do Americans Repay Their Debt? The Balance-Matching Heuristic * John Gathergood Neale Mahoney Neil Stewart Jörg Weber February 6, 2019 Abstract In Gathergood et al. (forthcoming), we studied credit

More information

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time

Internet Appendix. Additional Results. Figure A1: Stock of retail credit cards over time Internet Appendix A Additional Results Figure A1: Stock of retail credit cards over time Stock of retail credit cards by month. Time of deletion policy noted with vertical line. Figure A2: Retail credit

More information

Foreign Exchange Forecasting via Machine Learning

Foreign Exchange Forecasting via Machine Learning Foreign Exchange Forecasting via Machine Learning Christian González Rojas cgrojas@stanford.edu Molly Herman mrherman@stanford.edu I. INTRODUCTION The finance industry has been revolutionized by the increased

More information

Lecture 21: Logit Models for Multinomial Responses Continued

Lecture 21: Logit Models for Multinomial Responses Continued Lecture 21: Logit Models for Multinomial Responses Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University

More information

MS&E 448 Final Presentation High Frequency Algorithmic Trading

MS&E 448 Final Presentation High Frequency Algorithmic Trading MS&E 448 Final Presentation High Frequency Algorithmic Trading Francis Choi George Preudhomme Nopphon Siranart Roger Song Daniel Wright Stanford University June 6, 2017 High-Frequency Trading MS&E448 June

More information

Session 79PD, Using Predictive Analytics to Develop Assumptions. Moderator/Presenter: Jonathan D. White, FSA, MAAA, CERA

Session 79PD, Using Predictive Analytics to Develop Assumptions. Moderator/Presenter: Jonathan D. White, FSA, MAAA, CERA Session 79PD, Using Predictive Analytics to Develop Assumptions Moderator/Presenter: Jonathan D. White, FSA, MAAA, CERA Presenters: Missy A. Gordon, FSA, MAAA Brian M. Hartman, ASA SOA Antitrust Disclaimer

More information

Wage Determinants Analysis by Quantile Regression Tree

Wage Determinants Analysis by Quantile Regression Tree Communications of the Korean Statistical Society 2012, Vol. 19, No. 2, 293 301 DOI: http://dx.doi.org/10.5351/ckss.2012.19.2.293 Wage Determinants Analysis by Quantile Regression Tree Youngjae Chang 1,a

More information

Credit Card Default Predictive Modeling

Credit Card Default Predictive Modeling Credit Card Default Predictive Modeling Background: Predicting credit card payment default is critical for the successful business model of a credit card company. An accurate predictive model can help

More information

Predictive Risk Categorization of Retail Bank Loans Using Data Mining Techniques

Predictive Risk Categorization of Retail Bank Loans Using Data Mining Techniques National Conference on Recent Advances in Computer Science and IT (NCRACIT) International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL NETWORKS K. Jayanthi, Dr. K. Suresh 1 Department of Computer

More information

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations

More information

Q1. [?? pts] Search Traces

Q1. [?? pts] Search Traces CS 188 Spring 2010 Introduction to Artificial Intelligence Midterm Exam Solutions Q1. [?? pts] Search Traces Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a

More information

LendingClub Loan Default and Profitability Prediction

LendingClub Loan Default and Profitability Prediction LendingClub Loan Default and Profitability Prediction Peiqian Li peiqian@stanford.edu Gao Han gh352@stanford.edu Abstract Credit risk is something all peer-to-peer (P2P) lending investors (and bond investors

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL

More information

Questions of Statistical Analysis and Discrete Choice Models

Questions of Statistical Analysis and Discrete Choice Models APPENDIX D Questions of Statistical Analysis and Discrete Choice Models In discrete choice models, the dependent variable assumes categorical values. The models are binary if the dependent variable assumes

More information

Enforcing monotonicity of decision models: algorithm and performance

Enforcing monotonicity of decision models: algorithm and performance Enforcing monotonicity of decision models: algorithm and performance Marina Velikova 1 and Hennie Daniels 1,2 A case study of hedonic price model 1 Tilburg University, CentER for Economic Research,Tilburg,

More information

Test #1 (Solution Key)

Test #1 (Solution Key) STAT 47/67 Test #1 (Solution Key) 1. (To be done by hand) Exploring his own drink-and-drive habits, a student recalls the last 7 parties that he attended. He records the number of cans of beer he drank,

More information

Producing actionable insights from predictive models built upon condensed electronic medical records.

Producing actionable insights from predictive models built upon condensed electronic medical records. Producing actionable insights from predictive models built upon condensed electronic medical records. Sheamus K. Parkes, FSA, MAAA Shea.Parkes@milliman.com Predictive modeling often has two competing goals:

More information

WC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology

WC-5 Just How Credible Is That Employer? Exploring GLMs and Multilevel Modeling for NCCI s Excess Loss Factor Methodology Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to

More information

Technical Appendix. This appendix provides more details about patient identification, consent, randomization,

Technical Appendix. This appendix provides more details about patient identification, consent, randomization, Peikes D, Peterson G, Brown RS, Graff S, Lynch JP. How changes in Washington University s Medicare Coordinated Care Demonstration pilot ultimately achieved savings. Health Aff (Millwood). 2012;31(6). Technical

More information

Algorithms and Networking for Computer Games

Algorithms and Networking for Computer Games Algorithms and Networking for Computer Games Chapter 4: Game Trees http://www.wiley.com/go/smed Game types perfect information games no hidden information two-player, perfect information games Noughts

More information

Loan Approval and Quality Prediction in the Lending Club Marketplace

Loan Approval and Quality Prediction in the Lending Club Marketplace Loan Approval and Quality Prediction in the Lending Club Marketplace Final Write-up Yondon Fu, Matt Marcus and Shuo Zheng Introduction Lending Club is a peer-to-peer lending marketplace where individual

More information

Predicting Foreign Exchange Arbitrage

Predicting Foreign Exchange Arbitrage Predicting Foreign Exchange Arbitrage Stefan Huber & Amy Wang 1 Introduction and Related Work The Covered Interest Parity condition ( CIP ) should dictate prices on the trillion-dollar foreign exchange

More information

Decision making in the presence of uncertainty

Decision making in the presence of uncertainty CS 2750 Foundations of AI Lecture 20 Decision making in the presence of uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Decision-making in the presence of uncertainty Computing the probability

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

CS188 Spring 2012 Section 4: Games

CS188 Spring 2012 Section 4: Games CS188 Spring 2012 Section 4: Games 1 Minimax Search In this problem, we will explore adversarial search. Consider the zero-sum game tree shown below. Trapezoids that point up, such as at the root, represent

More information

Conditional inference trees in dynamic microsimulation - modelling transition probabilities in the SMILE model

Conditional inference trees in dynamic microsimulation - modelling transition probabilities in the SMILE model 4th General Conference of the International Microsimulation Association Canberra, Wednesday 11th to Friday 13th December 2013 Conditional inference trees in dynamic microsimulation - modelling transition

More information

Mining Investment Venture Rules from Insurance Data Based on Decision Tree

Mining Investment Venture Rules from Insurance Data Based on Decision Tree Mining Investment Venture Rules from Insurance Data Based on Decision Tree Jinlan Tian, Suqin Zhang, Lin Zhu, and Ben Li Department of Computer Science and Technology Tsinghua University., Beijing, 100084,

More information

Log-linear Modeling Under Generalized Inverse Sampling Scheme

Log-linear Modeling Under Generalized Inverse Sampling Scheme Log-linear Modeling Under Generalized Inverse Sampling Scheme Soumi Lahiri (1) and Sunil Dhar (2) (1) Department of Mathematical Sciences New Jersey Institute of Technology University Heights, Newark,

More information

Technical Appendices to Extracting Summary Piles from Sorting Task Data

Technical Appendices to Extracting Summary Piles from Sorting Task Data Technical Appendices to Extracting Summary Piles from Sorting Task Data Simon J. Blanchard McDonough School of Business, Georgetown University, Washington, DC 20057, USA sjb247@georgetown.edu Daniel Aloise

More information

Expanding Predictive Analytics Through the Use of Machine Learning

Expanding Predictive Analytics Through the Use of Machine Learning Expanding Predictive Analytics Through the Use of Machine Learning Thursday, February 28, 2013, 11:10 a.m. Chris Cooksey, FCAS, MAAA Chief Actuary EagleEye Analytics Columbia, S.C. Christopher Cooksey,

More information

CFA Level II - LOS Changes

CFA Level II - LOS Changes CFA Level II - LOS Changes 2018-2019 Topic LOS Level II - 2018 (465 LOS) LOS Level II - 2019 (471 LOS) Compared Ethics 1.1.a describe the six components of the Code of Ethics and the seven Standards of

More information

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS

A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS A COMPARATIVE STUDY OF DATA MINING TECHNIQUES IN PREDICTING CONSUMERS CREDIT CARD RISK IN BANKS Ling Kock Sheng 1, Teh Ying Wah 2 1 Faculty of Computer Science and Information Technology, University of

More information

Competition price analysis in non-life insurance

Competition price analysis in non-life insurance White Paper on Non-Life Insurance: Competition A Reacfin price White analysis Paper in on non-life Non-Life insurance Insurance: - How machine learning and statistical predictive models can help Competition

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 8 Recursive Partitioning: Large Companies and Glaucoma Diagnosis 8.1 Introduction 8.2 Recursive Partitioning 8.3

More information

The Determinants of Bank Mergers: A Revealed Preference Analysis

The Determinants of Bank Mergers: A Revealed Preference Analysis The Determinants of Bank Mergers: A Revealed Preference Analysis Oktay Akkus Department of Economics University of Chicago Ali Hortacsu Department of Economics University of Chicago VERY Preliminary Draft:

More information

Optimal Window Selection for Forecasting in The Presence of Recent Structural Breaks

Optimal Window Selection for Forecasting in The Presence of Recent Structural Breaks Optimal Window Selection for Forecasting in The Presence of Recent Structural Breaks Yongli Wang University of Leicester Econometric Research in Finance Workshop on 15 September 2017 SGH Warsaw School

More information

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods

sociology SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 SO5032 Quantitative Research Methods 1 SO5032 Quantitative Research Methods Brendan Halpin, Sociology, University of Limerick Spring 2018 Lecture 10: Multinomial regression baseline category extension of binary What if we have multiple possible

More information

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion

Web Appendix. Are the effects of monetary policy shocks big or small? Olivier Coibion Web Appendix Are the effects of monetary policy shocks big or small? Olivier Coibion Appendix 1: Description of the Model-Averaging Procedure This section describes the model-averaging procedure used in

More information

Gradient Boosting Trees: theory and applications

Gradient Boosting Trees: theory and applications Gradient Boosting Trees: theory and applications Dmitry Efimov November 05, 2016 Outline Decision trees Boosting Boosting trees Metaparameters and tuning strategies How-to-use remarks Regression tree True

More information

Prior knowledge in economic applications of data mining

Prior knowledge in economic applications of data mining Prior knowledge in economic applications of data mining A.J. Feelders Tilburg University Faculty of Economics Department of Information Management PO Box 90153 5000 LE Tilburg, The Netherlands A.J.Feelders@kub.nl

More information

UGM Crash Course: Conditional Inference and Cutset Conditioning

UGM Crash Course: Conditional Inference and Cutset Conditioning UGM Crash Course: Conditional Inference and Cutset Conditioning Julie Nutini August 19 th, 2015 1 / 25 Conditional UGM 2 / 25 We know the value of one or more random variables i.e., we have observations,

More information

Boosting Actuarial Regression Models in R

Boosting Actuarial Regression Models in R Carryl Oberson Faculty of Business and Economics University of Basel R in Insurance 2015 Build regression models (GLMs) for car insurance data. 3 types of response variables: claim incidence: y i = 0,

More information

Deviations from Optimal Corporate Cash Holdings and the Valuation from a Shareholder s Perspective

Deviations from Optimal Corporate Cash Holdings and the Valuation from a Shareholder s Perspective Deviations from Optimal Corporate Cash Holdings and the Valuation from a Shareholder s Perspective Zhenxu Tong * University of Exeter Abstract The tradeoff theory of corporate cash holdings predicts that

More information

Adaptive Experiments for Policy Choice. March 8, 2019

Adaptive Experiments for Policy Choice. March 8, 2019 Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:

More information

State Switching in US Equity Index Returns based on SETAR Model with Kalman Filter Tracking

State Switching in US Equity Index Returns based on SETAR Model with Kalman Filter Tracking State Switching in US Equity Index Returns based on SETAR Model with Kalman Filter Tracking Timothy Little, Xiao-Ping Zhang Dept. of Electrical and Computer Engineering Ryerson University 350 Victoria

More information

Implementing Personalized Medicine: Estimating Optimal Treatment Regimes

Implementing Personalized Medicine: Estimating Optimal Treatment Regimes Implementing Personalized Medicine: Estimating Optimal Treatment Regimes Baqun Zhang, Phillip Schulte, Anastasios Tsiatis, Eric Laber, and Marie Davidian Department of Statistics North Carolina State University

More information

A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION

A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION A DECISION SUPPORT SYSTEM FOR HANDLING RISK MANAGEMENT IN CUSTOMER TRANSACTION K. Valarmathi Software Engineering, SonaCollege of Technology, Salem, Tamil Nadu valarangel@gmail.com ABSTRACT A decision

More information

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam

The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay. Solutions to Final Exam The University of Chicago, Booth School of Business Business 41202, Spring Quarter 2012, Mr. Ruey S. Tsay Solutions to Final Exam Problem A: (40 points) Answer briefly the following questions. 1. Consider

More information

Decision making in the presence of uncertainty

Decision making in the presence of uncertainty Lecture 19 Decision making in the presence of uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Decision-making in the presence of uncertainty Many real-world problems require to choose

More information

Introduction. Not as simple as. Sample Size Calculations. The Three Most Important Components of any Study Are

Introduction. Not as simple as. Sample Size Calculations. The Three Most Important Components of any Study Are Introduction Issues in Sample Size Calculations with Multiple Must-win Comparisons University of Sheffield Introduce the problem Describe some examples of multiple mustwin Give a solution for using bioequivalence

More information

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions

Business Strategies in Credit Rating and the Control of Misclassification Costs in Neural Network Predictions Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Business Strategies in Credit Rating and the Control

More information

FIT OR HIT IN CHOICE MODELS

FIT OR HIT IN CHOICE MODELS FIT OR HIT IN CHOICE MODELS KHALED BOUGHANMI, RAJEEV KOHLI, AND KAMEL JEDIDI Abstract. The predictive validity of a choice model is often assessed by its hit rate. We examine and illustrate conditions

More information

A Genetic Algorithm improving tariff variables reclassification for risk segmentation in Motor Third Party Liability Insurance.

A Genetic Algorithm improving tariff variables reclassification for risk segmentation in Motor Third Party Liability Insurance. A Genetic Algorithm improving tariff variables reclassification for risk segmentation in Motor Third Party Liability Insurance. Alberto Busetto, Andrea Costa RAS Insurance, Italy SAS European Users Group

More information

Volatility Models and Their Applications

Volatility Models and Their Applications HANDBOOK OF Volatility Models and Their Applications Edited by Luc BAUWENS CHRISTIAN HAFNER SEBASTIEN LAURENT WILEY A John Wiley & Sons, Inc., Publication PREFACE CONTRIBUTORS XVII XIX [JQ VOLATILITY MODELS

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 7, June 13, 2013 This version corrects errors in the October 4,

More information

Modeling Private Firm Default: PFirm

Modeling Private Firm Default: PFirm Modeling Private Firm Default: PFirm Grigoris Karakoulas Business Analytic Solutions May 30 th, 2002 Outline Problem Statement Modelling Approaches Private Firm Data Mining Model Development Model Evaluation

More information

Implementing Models in Quantitative Finance: Methods and Cases

Implementing Models in Quantitative Finance: Methods and Cases Gianluca Fusai Andrea Roncoroni Implementing Models in Quantitative Finance: Methods and Cases vl Springer Contents Introduction xv Parti Methods 1 Static Monte Carlo 3 1.1 Motivation and Issues 3 1.1.1

More information

Internet Appendix for: Does Going Public Affect Innovation?

Internet Appendix for: Does Going Public Affect Innovation? Internet Appendix for: Does Going Public Affect Innovation? July 3, 2014 I Variable Definitions Innovation Measures 1. Citations - Number of citations a patent receives in its grant year and the following

More information

Handling Imbalanced Data Sets in Insurance Risk Modeling

Handling Imbalanced Data Sets in Insurance Risk Modeling IBM Research Report RC-73, March 0, 000 Handling Imbalanced Data Sets in Insurance Ris Modeling Edwin P. D. Pednault, Barry K. Rosen, and Chidanand Apte IBM T. J. Watson Research Center P.O. Box 8 Yortown

More information

UPDATED IAA EDUCATION SYLLABUS

UPDATED IAA EDUCATION SYLLABUS II. UPDATED IAA EDUCATION SYLLABUS A. Supporting Learning Areas 1. STATISTICS Aim: To enable students to apply core statistical techniques to actuarial applications in insurance, pensions and emerging

More information

A Comparative Study of Various Forecasting Techniques in Predicting. BSE S&P Sensex

A Comparative Study of Various Forecasting Techniques in Predicting. BSE S&P Sensex NavaJyoti, International Journal of Multi-Disciplinary Research Volume 1, Issue 1, August 2016 A Comparative Study of Various Forecasting Techniques in Predicting BSE S&P Sensex Dr. Jahnavi M 1 Assistant

More information

Computational Statistics Handbook with MATLAB

Computational Statistics Handbook with MATLAB «H Computer Science and Data Analysis Series Computational Statistics Handbook with MATLAB Second Edition Wendy L. Martinez The Office of Naval Research Arlington, Virginia, U.S.A. Angel R. Martinez Naval

More information

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018 ` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.

More information

Predicting the stock price companies using artificial neural networks (ANN) method (Case Study: National Iranian Copper Industries Company)

Predicting the stock price companies using artificial neural networks (ANN) method (Case Study: National Iranian Copper Industries Company) ORIGINAL ARTICLE Received 2 February. 2016 Accepted 6 March. 2016 Vol. 5, Issue 2, 55-61, 2016 Academic Journal of Accounting and Economic Researches ISSN: 2333-0783 (Online) ISSN: 2375-7493 (Print) ajaer.worldofresearches.com

More information

Session 113 PD, Data and Model Actuaries Should be an Expert of Both. Moderator: David L. Snell, ASA, MAAA

Session 113 PD, Data and Model Actuaries Should be an Expert of Both. Moderator: David L. Snell, ASA, MAAA Session 113 PD, Data and Model Actuaries Should be an Expert of Both Moderator: David L. Snell, ASA, MAAA Presenters: Matthias Kullowatz Kenneth Warren Pagington, FSA, CERA, MAAA Qichun (Richard) Xu, FSA

More information

Market Risk Analysis Volume I

Market Risk Analysis Volume I Market Risk Analysis Volume I Quantitative Methods in Finance Carol Alexander John Wiley & Sons, Ltd List of Figures List of Tables List of Examples Foreword Preface to Volume I xiii xvi xvii xix xxiii

More information

Predicting Economic Recession using Data Mining Techniques

Predicting Economic Recession using Data Mining Techniques Predicting Economic Recession using Data Mining Techniques Authors Naveed Ahmed Kartheek Atluri Tapan Patwardhan Meghana Viswanath Predicting Economic Recession using Data Mining Techniques Page 1 Abstract

More information

Machine Learning (CSE 446): Pratical issues: optimization and learning

Machine Learning (CSE 446): Pratical issues: optimization and learning Machine Learning (CSE 446): Pratical issues: optimization and learning John Thickstun guest lecture c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 10 Review 1 / 10 Our running example

More information

How Can Quantitative Behavioral Finance Uncover Trader Motivations?

How Can Quantitative Behavioral Finance Uncover Trader Motivations? How Can Quantitative Behavioral Finance Uncover Trader Motivations? Gunduz Caginalp University of Pittsburgh April 5, 2013 unduz Caginalp University of Pittsburgh () Quantitative Behavioral Finance April

More information

Pattern Recognition Chapter 5: Decision Trees

Pattern Recognition Chapter 5: Decision Trees Pattern Recognition Chapter 5: Decision Trees Asst. Prof. Dr. Chumphol Bunkhumpornpat Department of Computer Science Faculty of Science Chiang Mai University Learning Objectives How decision trees are

More information

Test Volume 12, Number 1. June 2003

Test Volume 12, Number 1. June 2003 Sociedad Española de Estadística e Investigación Operativa Test Volume 12, Number 1. June 2003 Power and Sample Size Calculation for 2x2 Tables under Multinomial Sampling with Random Loss Kung-Jong Lui

More information

Jean Lemaire with Sojung Park and Kili Wang

Jean Lemaire with Sojung Park and Kili Wang Jean Lemaire with Sojung Park and Kili Wang ASTIN (Actuarial Studies in Non-Life Insurance) was founded in New York on October 16, 1957 First ASTIN Colloquium: La Baule, June 11 / 12, 1959 Only topic:

More information

MWSUG Paper AA 04. Claims Analytics. Mei Najim, Gallagher Bassett Services, Rolling Meadows, IL

MWSUG Paper AA 04. Claims Analytics. Mei Najim, Gallagher Bassett Services, Rolling Meadows, IL MWSUG 2017 - Paper AA 04 Claims Analytics Mei Najim, Gallagher Bassett Services, Rolling Meadows, IL ABSTRACT In the Property & Casualty Insurance industry, advanced analytics has increasingly penetrated

More information

Statistical Case Estimation Modelling

Statistical Case Estimation Modelling Statistical Case Estimation Modelling - An Overview of the NSW WorkCover Model Presented by Richard Brookes and Mitchell Prevett Presented to the Institute of Actuaries of Australia Accident Compensation

More information