Applying Monte Carlo Tree Search to Curling AI

Size: px
Start display at page:

Download "Applying Monte Carlo Tree Search to Curling AI"

Transcription

1 AI 1,a) 2,b) MDP Applying Monte Carlo Tree Search to Curling AI Katsuki Ohto 1,a) Tetsuro Tanaka 2,b) Abstract: We propose an action decision method based on Monte Carlo Tree Search for MDPs with continuous state space. We applied our method to agents of the UEC digital curling system, which is build for arguing curling strategies. The experimental results show that our method is effective for not only agents with a simple simulation policy, but also agents with a handmade complex one. 1. MDP [1][2] AI [3] [4] Expectimax-search 1 Graduate School of Arts and Sciences, The University of Tokyo 2 Information Technology Center, The University of Tokyo a) ohto@tanaka.ecc.u-tokyo.ac.jp b) ktanaka@tanaka.ecc.u-tokyo.ac.jp AI [3] [1] Box2D *1 *1 Box2D A 2D Physics Engine for Games Information Processing Society of Japan

2 *2 1 UEC *3 1 GAT * UCB1 [7] UCT[8] [9] 3.3 *2 *3 [2] UEC 1 *4 1 GAT UCBC[10] Hierarchical Optimistic Optimization HOO [11] UCBC HOO UCT HOO 1 UEC [12] 1 1 Yee Kernel Regression UCT[13] Yee 3.4 Double Progressive Widening DPW [14] 1 1 DPW UCB HOO Information Processing Society of Japan

3 State Tree [0, 1) [0, 0.5) [0.5, 1.0) [0, 0.25) [0.25, 0.5) [05, 0.75) [0.75, 1) D 2 D n stone 2 nstone 2 nstone d N ex (d) N ex (d) d 1 d d w(d) w(d) x a ñ(x, a) r(x, a) S x s d(s) s n(s, a) r(s, a) a ñ(x, a) = s S w(d(s))n(s, a) (1) r(x, a) = s S w(d(s))r(s, a) (2) x x A ñ(x) = a A ñ(x, a) (3) s x 0 x 1 x 0 x 1 ñ(x) = s S w(d(s))n(s) ñ(x, a) r(x, a) x a UCB ñ(x, a) 2016 Information Processing Society of Japan

4 ñ(x) > 1 n(x) < ñ(x) ñ(x) n(x) ñ(x, a) r(x, a) UCB1 n(x, a) r(x, a) n(x, a) = ñ(x, a) n(x) ñ(x) r(x, a) = r(x, a) n(x) ñ(x) (4) (5) n(x) n(x, a) r(x, a) UCB1 [7] a try a try = argmax a A ( r(x, a) n(x, a) + C UCB ) 2 ln n(x) n(x, a) 4.1 (6) 6 Chaslot [15] softmax a V pre (a) softmax T N pre W P tor() x a n (x, a) r (x, a) r (x, a) = r(x, a) + W P tor( n (x, a) = n(x, a) + N pre (7) e Vpre(a) T b A e Vpre(b) T )N pre (8) (, + ) [4] [4] 1336 *5 l ( l 2, 2l) ( l 4, ) *6 [1] *5 * Information Processing Society of Japan

5 1 2 2 No * 7 softmax 2 [4] Stochastic Gradient Descent i ϕ j j V ( ar() ) V ar(ϕ j) i+1 1 L1 0 L ( ) i+1 UEC GAT 10 GCCS CSACE 184 * a similar { a similar = argmin 2(vx (a) V x ) 2 + (v y (a) V y ) 2} a A (9) A a 1 v x (a) v y (a) a V x V y v x (a) V x 2016 Information Processing Society of Japan

6 2 curing log viewer *8 softmax T = (8) W P tor() *8 log viewer x y 1 x y d d Information Processing Society of Japan

7 x y mm 15.8m *9 d N ex (d) 6.2 [2] Box2D Box2D UCB1 C UCB = 1 softmax T = 0.8 N pre = 2 State - Tree State - Tree MCTS Pure MC N ex (d) = N exbase C d ex (10) N exbase C ex N exbase = 1 C ex = 1.3 w(d) w(d) = (d + 1) Cw (11) C w C w = 4 w(d) x ñ(x) n(x) C mod = 0.4 n(x) = ñ(x) C mod (12) UCB1 1 2 *9 l 2l State - Tree MCTS vs Pure MC 2 State - Tree MCTS vs Pure MC 3 Pure MC vs Pure MC 4 State - Tree MCTS vs State - Tree MCTS GAT * *10 [2] GAT (2016) 2016 Information Processing Society of Japan

8 State - Tree MCTS Pure MC State - Tree MCTS Pure MC (p = ) 5 State - Tree MCTS Pure MC (p = 0.003) 6 Pure MC (p = ) 7 State - Tree MCTS (p = 0.001) State - Tree MCTS Pure MC 5% % 7. [1],,,, 2014-GI-31, No. 2, pp. 1-5 (2014). [2]., ( ). [3], 2015,, 2016-GI-36, No. 2, pp. 1-6 (2016). [4],, AI, ( 104 ), pp (2015). [5] M. Yamamoto, S. Kato, H. Iizuka, Digital Curling Strategy on Game Tree Search, 2015 IEEE Conference on Computational Intelligence and Games, (2015). [6],,, 2015, pp (2015). [7] P. Auer, N. Cesa-Bianchi, and P. Fischer Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, Vol. 47, pp (2002). [8] L. Kocsis and C. Szepesvari Bandit based Monte-Carlo Planning. European conference on machine learning (ECML2006), pp (2006). [9] F. van Lishout, G. Chaslot, and J. Uiterwijk Monte- Carlo Tree Search in Backgammon, Computer Games Workshop, pp (2007). [10] P. Auer, R. Ortner and C. Szepesvari Improved Rates for the Stochastic Continuum-Armed Bandit Problem, International Conference on Computational Learning Theory, Springer, pp (2007). [11] S. Bubeck, R. Munos, G. Stoltz and C. Szepesvari Online Optimization in X-Armed Bandits. Advances in Neural Information Processing Systems (NIPS2009), pp (2009). [12],, 1 UEC,, 2015-GI-34, No. 2, pp. 1-6 (2015). [13] T. Yee, V. Lisy, and M. Bowling Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty. International Joint Conference on Artificial Intelligence. (2016). [14] A. Couetoux, J. Hoock, N. Sokolovska, O. Teytaud and N. Bonnard, Continuous upper confidence trees. International Conference on Learning and Intelligent Optimization, pp (2011). [15] G. Chaslot, C. Fiter, J.P. Hoock, A. Rimmel and O.Teytaud Adding expert knowledge and exploration in Monte-Carlo Tree Search, Advances in Computer Games, LNCS, Vol. 6048, pp (2009) Information Processing Society of Japan

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned

More information

Biasing Monte-Carlo Simulations through RAVE Values

Biasing Monte-Carlo Simulations through RAVE Values Biasing Monte-Carlo Simulations through RAVE Values Arpad Rimmel, Fabien Teytaud, Olivier Teytaud To cite this version: Arpad Rimmel, Fabien Teytaud, Olivier Teytaud. Biasing Monte-Carlo Simulations through

More information

Monte-Carlo Planning Look Ahead Trees. Alan Fern

Monte-Carlo Planning Look Ahead Trees. Alan Fern Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 44. Monte-Carlo Tree Search: Introduction Thomas Keller Universität Basel May 27, 2016 Board Games: Overview chapter overview: 41. Introduction and State of the Art

More information

Cooperative Games with Monte Carlo Tree Search

Cooperative Games with Monte Carlo Tree Search Int'l Conf. Artificial Intelligence ICAI'5 99 Cooperative Games with Monte Carlo Tree Search CheeChian Cheng and Norman Carver Department of Computer Science, Southern Illinois University, Carbondale,

More information

Monte-Carlo Planning: Basic Principles and Recent Progress

Monte-Carlo Planning: Basic Principles and Recent Progress Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo

More information

Adding Double Progressive Widening to Upper Confidence Trees to Cope with Uncertainty in Planning Problems

Adding Double Progressive Widening to Upper Confidence Trees to Cope with Uncertainty in Planning Problems Adding Double Progressive Widening to Upper Confidence Trees to Cope with Uncertainty in Planning Problems Adrien Couëtoux 1,2 and Hassen Doghmen 1 1 TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud,

More information

Bandit algorithms for tree search Applications to games, optimization, and planning

Bandit algorithms for tree search Applications to games, optimization, and planning Bandit algorithms for tree search Applications to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord Europe Journées MAS

More information

Tuning bandit algorithms in stochastic environments

Tuning bandit algorithms in stochastic environments Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference

More information

Monte-Carlo Planning Look Ahead Trees. Alan Fern

Monte-Carlo Planning Look Ahead Trees. Alan Fern Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy

More information

MDP Algorithms. Thomas Keller. June 20, University of Basel

MDP Algorithms. Thomas Keller. June 20, University of Basel MDP Algorithms Thomas Keller University of Basel June 20, 208 Outline of this lecture Markov decision processes Planning via determinization Monte-Carlo methods Monte-Carlo Tree Search Heuristic Search

More information

Monte-Carlo Beam Search

Monte-Carlo Beam Search IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 1 Monte-Carlo Beam Search Tristan Cazenave Abstract Monte-Carlo Tree Search is state of the art for multiple games and for solving puzzles

More information

Optimistic Planning for the Stochastic Knapsack Problem

Optimistic Planning for the Stochastic Knapsack Problem Optimistic Planning for the Stochastic Knapsack Problem Anonymous Author Anonymous Author 2 Anonymous Author 3 Unknown Institution Unknown Institution 2 Unknown Institution 3 Abstract The stochastic knapsack

More information

Bernoulli Bandits An Empirical Comparison

Bernoulli Bandits An Empirical Comparison Bernoulli Bandits An Empirical Comparison Ronoh K.N1,2, Oyamo R.1,2, Milgo E.1,2, Drugan M.1 and Manderick B.1 1- Vrije Universiteit Brussel - Computer Sciences Department - AI Lab Pleinlaan 2 - B-1050

More information

Multi-Armed Bandit, Dynamic Environments and Meta-Bandits

Multi-Armed Bandit, Dynamic Environments and Meta-Bandits Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This

More information

Variance Reduction in Monte-Carlo Tree Search

Variance Reduction in Monte-Carlo Tree Search Variance Reduction in Monte-Carlo Tree Search Joel Veness University of Alberta veness@cs.ualberta.ca Marc Lanctot University of Alberta lanctot@cs.ualberta.ca Michael Bowling University of Alberta bowling@cs.ualberta.ca

More information

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds Daniel R. Jiang, Lina Al-Kanj, Warren B. Powell April 19, 2017 Abstract Monte Carlo Tree Search (MCTS), most famously used in game-play

More information

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet

More information

Exploration for sequential decision making Application to games, tree search, optimization, and planning

Exploration for sequential decision making Application to games, tree search, optimization, and planning Exploration for sequential decision making Application to games, tree search, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Michèle Sebag ; TP : Herilalaina Rakotoarison TAO, CNRS INRIA Université Paris-Sud Nov. 26th, 2018 Credit for slides: Richard Sutton, Freek Stulp, Olivier Pietquin 1 / 90 Where we

More information

Bandit based Monte-Carlo Planning

Bandit based Monte-Carlo Planning Bandit based Monte-Carlo Planning Levente Kocsis and Csaba Szepesvári Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111 Budapest, Hungary kocsis@sztaki.hu

More information

Action Selection for MDPs: Anytime AO* vs. UCT

Action Selection for MDPs: Anytime AO* vs. UCT Action Selection for MDPs: Anytime AO* vs. UCT Blai Bonet 1 and Hector Geffner 2 1 Universidad Simón Boĺıvar 2 ICREA & Universitat Pompeu Fabra AAAI, Toronto, Canada, July 2012 Online MDP Planning and

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use

More information

Rollout Allocation Strategies for Classification-based Policy Iteration

Rollout Allocation Strategies for Classification-based Policy Iteration Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large

More information

Monte-Carlo tree search for multi-player, no-limit Texas hold'em poker. Guy Van den Broeck

Monte-Carlo tree search for multi-player, no-limit Texas hold'em poker. Guy Van den Broeck Monte-Carlo tree search for multi-player, no-limit Texas hold'em poker Guy Van den Broeck Should I bluff? Deceptive play Should I bluff? Is he bluffing? Opponent modeling Should I bluff? Is he bluffing?

More information

Treatment Allocations Based on Multi-Armed Bandit Strategies

Treatment Allocations Based on Multi-Armed Bandit Strategies Treatment Allocations Based on Multi-Armed Bandit Strategies Wei Qian and Yuhong Yang Applied Economics and Statistics, University of Delaware School of Statistics, University of Minnesota Innovative Statistics

More information

Extending MCTS

Extending MCTS Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCT b) UCT is a type of MCTS

More information

1 Bandit View on Noisy Optimization

1 Bandit View on Noisy Optimization 1 Bandit View on Noisy Optimization Jean-Yves Audibert audibert@certis.enpc.fr Imagine, Université Paris Est; Willow, CNRS/ENS/INRIA Paris, France Sébastien Bubeck sebastien.bubeck@inria.fr Sequel Project,

More information

Forecasting Financial Volatility Using Nested Monte Carlo Expression Discovery

Forecasting Financial Volatility Using Nested Monte Carlo Expression Discovery Forecasting Financial Volatility Using Nested Monte Carlo Expression Discovery Tristan Cazenave and Sana Ben Hamida LAMSADE Université Paris-Dauphine Paris, France Email: cazenave@lamsade.dauphine.fr sbenhami@u-paris10.fr

More information

Reinforcement Learning and Simulation-Based Search

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision

More information

Decision support for tactical planning A use case of the INFRALERT project

Decision support for tactical planning A use case of the INFRALERT project Proceedings of 7th Transport Research Arena TRA 2018, April 16-19, 2018, Vienna, Austria Decision support for tactical planning A use case of the INFRALERT project Ute Kandler a*, Axel Simroth a, João

More information

Inverse reinforcement learning from summary data

Inverse reinforcement learning from summary data Inverse reinforcement learning from summary data Antti Kangasrääsiö, Samuel Kaski Aalto University, Finland ECML PKDD 2018 journal track Published in Machine Learning (2018), 107:1517 1535 September 12,

More information

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing

Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Prof. Chuan-Ju Wang Department of Computer Science University of Taipei Joint work with Prof. Ming-Yang Kao March 28, 2014

More information

Application of Monte-Carlo Tree Search to Traveling-Salesman Problem

Application of Monte-Carlo Tree Search to Traveling-Salesman Problem R4-14 SASIMI 2016 Proceedings Alication of Monte-Carlo Tree Search to Traveling-Salesman Problem Masato Shimomura Yasuhiro Takashima Faculty of Environmental Engineering University of Kitakyushu Kitakyushu,

More information

Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme

Learning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10

More information

2D5362 Machine Learning

2D5362 Machine Learning 2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-based RL and Integrated Learning-Planning Planning and Search, Model Learning, Dyna Architecture, Exploration-Exploitation (many slides from lectures of Marc Toussaint & David

More information

Machine Learning for Physicists Lecture 10. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt

Machine Learning for Physicists Lecture 10. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Machine Learning for Physicists Lecture 10 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Function/Image representation Image classification [Handwriting recognition] Convolutional nets

More information

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning Jean-Bastien Grill Michal Valko SequeL team, INRIA Lille - Nord Europe, France jean-bastien.grill@inria.fr michal.valko@inria.fr

More information

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) CS22 Artificial Intelligence Stanford University Autumn 26-27 Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) Overview Lending Club is an online peer-to-peer lending

More information

Generalised Discount Functions applied to a Monte-Carlo AIµ Implementation

Generalised Discount Functions applied to a Monte-Carlo AIµ Implementation Generalised Discount Functions applied to a Monte-Carlo AIµ Implementation Sean Lamont 1, John Aslanides 1, Jan Leike 2, and Marcus Hutter 1 1 Research School of Computer Science, Australian National University

More information

$tock Forecasting using Machine Learning

$tock Forecasting using Machine Learning $tock Forecasting using Machine Learning Greg Colvin, Garrett Hemann, and Simon Kalouche Abstract We present an implementation of 3 different machine learning algorithms gradient descent, support vector

More information

4 Reinforcement Learning Basic Algorithms

4 Reinforcement Learning Basic Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems

More information

Adaptive Market Design - The SHMart Approach

Adaptive Market Design - The SHMart Approach Adaptive Market Design - The SHMart Approach Harivardan Jayaraman hari81@cs.utexas.edu Sainath Shenoy sainath@cs.utexas.edu Department of Computer Sciences The University of Texas at Austin Abstract Markets

More information

Application of Bayesian Network to stock price prediction

Application of Bayesian Network to stock price prediction ORIGINAL RESEARCH Application of Bayesian Network to stock price prediction Eisuke Kita, Yi Zuo, Masaaki Harada, Takao Mizuno Graduate School of Information Science, Nagoya University, Japan Correspondence:

More information

Introduction to Reinforcement Learning. MAL Seminar

Introduction to Reinforcement Learning. MAL Seminar Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology

More information

Regret Minimization against Strategic Buyers

Regret Minimization against Strategic Buyers Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and

More information

Application of Importance Sampling using Contaminated Normal Distribution to Multidimensional Variation Analysis

Application of Importance Sampling using Contaminated Normal Distribution to Multidimensional Variation Analysis 1, 2 1 3, 4 1 3 1 Monte Carlo g(x) g(x) g(x) g(x) g(x) / 6-24 SRAM Monte Carlo 2 5 Application of Importance Sampling using Contaminated Normal Distribution to Multidimensional Variation Analysis Shiho

More information

Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers

Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers Mehryar Mohri Courant Institute and Google Research 251 Mercer Street New York, NY 10012 mohri@cims.nyu.edu Andres Muñoz Medina

More information

Dynamic Programming and Reinforcement Learning

Dynamic Programming and Reinforcement Learning Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL NETWORKS K. Jayanthi, Dr. K. Suresh 1 Department of Computer

More information

An introduction to Machine learning methods and forecasting of time series in financial markets

An introduction to Machine learning methods and forecasting of time series in financial markets An introduction to Machine learning methods and forecasting of time series in financial markets Mark Wong markwong@kth.se December 10, 2016 Abstract The goal of this paper is to give the reader an introduction

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives

More information

Revenue optimization in AdExchange against strategic advertisers

Revenue optimization in AdExchange against strategic advertisers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

c 2004 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-2004), Budapest, Hungary, pp

c 2004 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-2004), Budapest, Hungary, pp c 24 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-24), Budapest, Hungary, pp. 197 112. This material is posted here with permission of the IEEE.

More information

Adaptive Experiments for Policy Choice. March 8, 2019

Adaptive Experiments for Policy Choice. March 8, 2019 Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:

More information

Dynamic Pricing with Limited Supply (extended abstract)

Dynamic Pricing with Limited Supply (extended abstract) 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning)

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) 1 / 24 Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) Julie Nutini MLRG - Winter Term 2 January 24 th, 2017 2 / 24 Monte Carlo Methods Monte Carlo (MC) methods are learning methods, used

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC

More information

Multi-armed bandit problems

Multi-armed bandit problems Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in

More information

The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index

The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index Soleh Ardiansyah 1, Mazlina Abdul Majid 2, JasniMohamad Zain 2 Faculty of Computer System and Software

More information

Policy Iteration for Learning an Exercise Policy for American Options

Policy Iteration for Learning an Exercise Policy for American Options Policy Iteration for Learning an Exercise Policy for American Options Yuxi Li, Dale Schuurmans Department of Computing Science, University of Alberta Abstract. Options are important financial instruments,

More information

Learning the Demand Curve in Posted-Price Digital Goods Auctions

Learning the Demand Curve in Posted-Price Digital Goods Auctions Learning the Demand Curve in Posted-Price Digital Goods Auctions ABSTRACT Meenal Chhabra Rensselaer Polytechnic Inst. Dept. of Computer Science Troy, NY, USA chhabm@cs.rpi.edu Online digital goods auctions

More information

Fast Convergence of Regress-later Series Estimators

Fast Convergence of Regress-later Series Estimators Fast Convergence of Regress-later Series Estimators New Thinking in Finance, London Eric Beutner, Antoon Pelsser, Janina Schweizer Maastricht University & Kleynen Consultants 12 February 2014 Beutner Pelsser

More information

The Non-stationary Stochastic Multi-armed Bandit Problem

The Non-stationary Stochastic Multi-armed Bandit Problem The Non-stationary Stochastic Multi-armed Bandit Problem Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard To cite this version: Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard The Non-stationary

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data

Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data Sitti Wetenriajeng Sidehabi Department of Electrical Engineering Politeknik ATI Makassar Makassar, Indonesia tenri616@gmail.com

More information

ANN Robot Energy Modeling

ANN Robot Energy Modeling IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) e-issn: 2278-1676,p-ISSN: 2320-3331, Volume 11, Issue 4 Ver. III (Jul. Aug. 2016), PP 66-81 www.iosrjournals.org ANN Robot Energy Modeling

More information

High Dimensional Bayesian Optimisation and Bandits via Additive Models

High Dimensional Bayesian Optimisation and Bandits via Additive Models 1/20 High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kandasamy, Jeff Schneider, Barnabás Póczos ICML 15 July 8 2015 2/20 Bandits & Optimisation Maximum Likelihood inference

More information

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials

More information

Reinforcement Learning. Monte Carlo and Temporal Difference Learning

Reinforcement Learning. Monte Carlo and Temporal Difference Learning Reinforcement Learning Monte Carlo and Temporal Difference Learning Manfred Huber 2014 1 Monte Carlo Methods Dynamic Programming Requires complete knowledge of the MDP Spends equal time on each part of

More information

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN

International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18,   ISSN International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL

More information

Definition Pricing Risk management Second generation barrier options. Barrier Options. Arfima Financial Solutions

Definition Pricing Risk management Second generation barrier options. Barrier Options. Arfima Financial Solutions Arfima Financial Solutions Contents Definition 1 Definition 2 3 4 Contenido Definition 1 Definition 2 3 4 Definition Definition: A barrier option is an option on the underlying asset that is activated

More information

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients

Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients American Journal of Data Mining and Knowledge Discovery 2018; 3(1): 1-12 http://www.sciencepublishinggroup.com/j/ajdmkd doi: 10.11648/j.ajdmkd.20180301.11 Naïve Bayesian Classifier and Classification Trees

More information

Computational Intelligence in the Development of Derivative s Pricing,Arbitrage and Hedging

Computational Intelligence in the Development of Derivative s Pricing,Arbitrage and Hedging Computational Intelligence in the Development of Derivative s Pricing,Arbitrage and Hedging Wo-Chiang Lee Department of Finance and Banking,Aletheia University AI-ECON Research Group August 20,2004 Outline

More information

arxiv: v1 [cs.lg] 23 Nov 2014

arxiv: v1 [cs.lg] 23 Nov 2014 Revenue Optimization in Posted-Price Auctions with Strategic Buyers arxiv:.0v [cs.lg] Nov 0 Mehryar Mohri Courant Institute and Google Research Mercer Street New York, NY 00 mohri@cims.nyu.edu Abstract

More information

Measuring DAX Market Risk: A Neural Network Volatility Mixture Approach

Measuring DAX Market Risk: A Neural Network Volatility Mixture Approach Measuring DAX Market Risk: A Neural Network Volatility Mixture Approach Kai Bartlmae, Folke A. Rauscher DaimlerChrysler AG, Research and Technology FT3/KL, P. O. Box 2360, D-8903 Ulm, Germany E mail: fkai.bartlmae,

More information

Non-linear logit models for high frequency currency exchange data

Non-linear logit models for high frequency currency exchange data Non-linear logit models for high frequency currency exchange data N. Sazuka 1 & T. Ohira 2 1 Department of Physics, Tokyo Institute of Technology, Japan 2 Sony Computer Science Laboratories, Japan Abstract

More information

Approximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets

Approximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets Approximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets Selvaprabu (Selva) Nadarajah, (Joint work with François Margot and Nicola Secomandi) Tepper School

More information

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I CS221 / Spring 2018 / Sadigh Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic

More information

American Option Pricing Formula for Uncertain Financial Market

American Option Pricing Formula for Uncertain Financial Market American Option Pricing Formula for Uncertain Financial Market Xiaowei Chen Uncertainty Theory Laboratory, Department of Mathematical Sciences Tsinghua University, Beijing 184, China chenxw7@mailstsinghuaeducn

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements

More information

MATH4143: Scientific Computations for Finance Applications Final exam Time: 9:00 am - 12:00 noon, April 18, Student Name (print):

MATH4143: Scientific Computations for Finance Applications Final exam Time: 9:00 am - 12:00 noon, April 18, Student Name (print): MATH4143 Page 1 of 17 Winter 2007 MATH4143: Scientific Computations for Finance Applications Final exam Time: 9:00 am - 12:00 noon, April 18, 2007 Student Name (print): Student Signature: Student ID: Question

More information

LOSS SEVERITY DISTRIBUTION ESTIMATION OF OPERATIONAL RISK USING GAUSSIAN MIXTURE MODEL FOR LOSS DISTRIBUTION APPROACH

LOSS SEVERITY DISTRIBUTION ESTIMATION OF OPERATIONAL RISK USING GAUSSIAN MIXTURE MODEL FOR LOSS DISTRIBUTION APPROACH LOSS SEVERITY DISTRIBUTION ESTIMATION OF OPERATIONAL RISK USING GAUSSIAN MIXTURE MODEL FOR LOSS DISTRIBUTION APPROACH Seli Siti Sholihat 1 Hendri Murfi 2 1 Department of Accounting, Faculty of Economics,

More information

Simulation Analysis for Evaluating Risk-sharing Pension Plans

Simulation Analysis for Evaluating Risk-sharing Pension Plans PBSS Webinar December 14, 2016 Simulation Analysis for Evaluating Risk-sharing Pension Plans Norio Hibiki Masaaki Ono Keio University Mizuho Pension Research Institute This slide can be downloaded from

More information

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1 Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic Low-level intelligence Machine

More information

Monte Carlo Methods in Financial Engineering

Monte Carlo Methods in Financial Engineering Paul Glassennan Monte Carlo Methods in Financial Engineering With 99 Figures

More information

Importance Sampling for Fair Policy Selection

Importance Sampling for Fair Policy Selection Importance Sampling for Fair Policy Selection Shayan Doroudi Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Carnegie Mellon University Pittsburgh, PA 15213 philipt@cs.cmu.edu

More information

Machine Learning for Quantitative Finance

Machine Learning for Quantitative Finance Machine Learning for Quantitative Finance Fast derivative pricing Sofie Reyners Joint work with Jan De Spiegeleer, Dilip Madan and Wim Schoutens Derivative pricing is time-consuming... Vanilla option pricing

More information

Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return

Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return Craig Sherstan 1, Dylan R. Ashley 2, Brendan Bennett 2, Kenny Young, Adam White, Martha White, Richard

More information

CS 6300 Artificial Intelligence Spring 2018

CS 6300 Artificial Intelligence Spring 2018 Expectimax Search CS 6300 Artificial Intelligence Spring 2018 Tucker Hermans thermans@cs.utah.edu Many slides courtesy of Pieter Abbeel and Dan Klein Expectimax Search Trees What if we don t know what

More information

Prediction Models of Financial Markets Based on Multiregression Algorithms

Prediction Models of Financial Markets Based on Multiregression Algorithms Computer Science Journal of Moldova, vol.19, no.2(56), 2011 Prediction Models of Financial Markets Based on Multiregression Algorithms Abstract The paper presents the results of simulations performed for

More information

Detail-free, Posted-Price Mechanisms for Limited Supply Online Auctions

Detail-free, Posted-Price Mechanisms for Limited Supply Online Auctions Detail-free, Posted-Price Mechanisms for Limited Supply Online Auctions Moshe Babaioff Shaddin Dughmi Aleksandrs Slivkins February 2010 Abstract We consider online posted-price mechanisms with limited

More information

Math 416/516: Stochastic Simulation

Math 416/516: Stochastic Simulation Math 416/516: Stochastic Simulation Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 13 Haijun Li Math 416/516: Stochastic Simulation Week 13 1 / 28 Outline 1 Simulation

More information

Reasoning with Uncertainty

Reasoning with Uncertainty Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Non-Deterministic Search

Non-Deterministic Search Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical

More information