Applying Monte Carlo Tree Search to Curling AI
|
|
- Gerard Richards
- 5 years ago
- Views:
Transcription
1 AI 1,a) 2,b) MDP Applying Monte Carlo Tree Search to Curling AI Katsuki Ohto 1,a) Tetsuro Tanaka 2,b) Abstract: We propose an action decision method based on Monte Carlo Tree Search for MDPs with continuous state space. We applied our method to agents of the UEC digital curling system, which is build for arguing curling strategies. The experimental results show that our method is effective for not only agents with a simple simulation policy, but also agents with a handmade complex one. 1. MDP [1][2] AI [3] [4] Expectimax-search 1 Graduate School of Arts and Sciences, The University of Tokyo 2 Information Technology Center, The University of Tokyo a) ohto@tanaka.ecc.u-tokyo.ac.jp b) ktanaka@tanaka.ecc.u-tokyo.ac.jp AI [3] [1] Box2D *1 *1 Box2D A 2D Physics Engine for Games Information Processing Society of Japan
2 *2 1 UEC *3 1 GAT * UCB1 [7] UCT[8] [9] 3.3 *2 *3 [2] UEC 1 *4 1 GAT UCBC[10] Hierarchical Optimistic Optimization HOO [11] UCBC HOO UCT HOO 1 UEC [12] 1 1 Yee Kernel Regression UCT[13] Yee 3.4 Double Progressive Widening DPW [14] 1 1 DPW UCB HOO Information Processing Society of Japan
3 State Tree [0, 1) [0, 0.5) [0.5, 1.0) [0, 0.25) [0.25, 0.5) [05, 0.75) [0.75, 1) D 2 D n stone 2 nstone 2 nstone d N ex (d) N ex (d) d 1 d d w(d) w(d) x a ñ(x, a) r(x, a) S x s d(s) s n(s, a) r(s, a) a ñ(x, a) = s S w(d(s))n(s, a) (1) r(x, a) = s S w(d(s))r(s, a) (2) x x A ñ(x) = a A ñ(x, a) (3) s x 0 x 1 x 0 x 1 ñ(x) = s S w(d(s))n(s) ñ(x, a) r(x, a) x a UCB ñ(x, a) 2016 Information Processing Society of Japan
4 ñ(x) > 1 n(x) < ñ(x) ñ(x) n(x) ñ(x, a) r(x, a) UCB1 n(x, a) r(x, a) n(x, a) = ñ(x, a) n(x) ñ(x) r(x, a) = r(x, a) n(x) ñ(x) (4) (5) n(x) n(x, a) r(x, a) UCB1 [7] a try a try = argmax a A ( r(x, a) n(x, a) + C UCB ) 2 ln n(x) n(x, a) 4.1 (6) 6 Chaslot [15] softmax a V pre (a) softmax T N pre W P tor() x a n (x, a) r (x, a) r (x, a) = r(x, a) + W P tor( n (x, a) = n(x, a) + N pre (7) e Vpre(a) T b A e Vpre(b) T )N pre (8) (, + ) [4] [4] 1336 *5 l ( l 2, 2l) ( l 4, ) *6 [1] *5 * Information Processing Society of Japan
5 1 2 2 No * 7 softmax 2 [4] Stochastic Gradient Descent i ϕ j j V ( ar() ) V ar(ϕ j) i+1 1 L1 0 L ( ) i+1 UEC GAT 10 GCCS CSACE 184 * a similar { a similar = argmin 2(vx (a) V x ) 2 + (v y (a) V y ) 2} a A (9) A a 1 v x (a) v y (a) a V x V y v x (a) V x 2016 Information Processing Society of Japan
6 2 curing log viewer *8 softmax T = (8) W P tor() *8 log viewer x y 1 x y d d Information Processing Society of Japan
7 x y mm 15.8m *9 d N ex (d) 6.2 [2] Box2D Box2D UCB1 C UCB = 1 softmax T = 0.8 N pre = 2 State - Tree State - Tree MCTS Pure MC N ex (d) = N exbase C d ex (10) N exbase C ex N exbase = 1 C ex = 1.3 w(d) w(d) = (d + 1) Cw (11) C w C w = 4 w(d) x ñ(x) n(x) C mod = 0.4 n(x) = ñ(x) C mod (12) UCB1 1 2 *9 l 2l State - Tree MCTS vs Pure MC 2 State - Tree MCTS vs Pure MC 3 Pure MC vs Pure MC 4 State - Tree MCTS vs State - Tree MCTS GAT * *10 [2] GAT (2016) 2016 Information Processing Society of Japan
8 State - Tree MCTS Pure MC State - Tree MCTS Pure MC (p = ) 5 State - Tree MCTS Pure MC (p = 0.003) 6 Pure MC (p = ) 7 State - Tree MCTS (p = 0.001) State - Tree MCTS Pure MC 5% % 7. [1],,,, 2014-GI-31, No. 2, pp. 1-5 (2014). [2]., ( ). [3], 2015,, 2016-GI-36, No. 2, pp. 1-6 (2016). [4],, AI, ( 104 ), pp (2015). [5] M. Yamamoto, S. Kato, H. Iizuka, Digital Curling Strategy on Game Tree Search, 2015 IEEE Conference on Computational Intelligence and Games, (2015). [6],,, 2015, pp (2015). [7] P. Auer, N. Cesa-Bianchi, and P. Fischer Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, Vol. 47, pp (2002). [8] L. Kocsis and C. Szepesvari Bandit based Monte-Carlo Planning. European conference on machine learning (ECML2006), pp (2006). [9] F. van Lishout, G. Chaslot, and J. Uiterwijk Monte- Carlo Tree Search in Backgammon, Computer Games Workshop, pp (2007). [10] P. Auer, R. Ortner and C. Szepesvari Improved Rates for the Stochastic Continuum-Armed Bandit Problem, International Conference on Computational Learning Theory, Springer, pp (2007). [11] S. Bubeck, R. Munos, G. Stoltz and C. Szepesvari Online Optimization in X-Armed Bandits. Advances in Neural Information Processing Systems (NIPS2009), pp (2009). [12],, 1 UEC,, 2015-GI-34, No. 2, pp. 1-6 (2015). [13] T. Yee, V. Lisy, and M. Bowling Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty. International Joint Conference on Artificial Intelligence. (2016). [14] A. Couetoux, J. Hoock, N. Sokolovska, O. Teytaud and N. Bonnard, Continuous upper confidence trees. International Conference on Learning and Intelligent Optimization, pp (2011). [15] G. Chaslot, C. Fiter, J.P. Hoock, A. Rimmel and O.Teytaud Adding expert knowledge and exploration in Monte-Carlo Tree Search, Advances in Computer Games, LNCS, Vol. 6048, pp (2009) Information Processing Society of Japan
Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationMonte-Carlo Planning: Introduction and Bandit Basics. Alan Fern
Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned
More informationBiasing Monte-Carlo Simulations through RAVE Values
Biasing Monte-Carlo Simulations through RAVE Values Arpad Rimmel, Fabien Teytaud, Olivier Teytaud To cite this version: Arpad Rimmel, Fabien Teytaud, Olivier Teytaud. Biasing Monte-Carlo Simulations through
More informationMonte-Carlo Planning Look Ahead Trees. Alan Fern
Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 44. Monte-Carlo Tree Search: Introduction Thomas Keller Universität Basel May 27, 2016 Board Games: Overview chapter overview: 41. Introduction and State of the Art
More informationCooperative Games with Monte Carlo Tree Search
Int'l Conf. Artificial Intelligence ICAI'5 99 Cooperative Games with Monte Carlo Tree Search CheeChian Cheng and Norman Carver Department of Computer Science, Southern Illinois University, Carbondale,
More informationMonte-Carlo Planning: Basic Principles and Recent Progress
Monte-Carlo Planning: Basic Principles and Recent Progress Alan Fern School of EECS Oregon State University Outline Preliminaries: Markov Decision Processes What is Monte-Carlo Planning? Uniform Monte-Carlo
More informationAdding Double Progressive Widening to Upper Confidence Trees to Cope with Uncertainty in Planning Problems
Adding Double Progressive Widening to Upper Confidence Trees to Cope with Uncertainty in Planning Problems Adrien Couëtoux 1,2 and Hassen Doghmen 1 1 TAO-INRIA, LRI, CNRS UMR 8623, Université Paris-Sud,
More informationBandit algorithms for tree search Applications to games, optimization, and planning
Bandit algorithms for tree search Applications to games, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord Europe Journées MAS
More informationTuning bandit algorithms in stochastic environments
Tuning bandit algorithms in stochastic environments Jean-Yves Audibert, CERTIS - Ecole des Ponts Remi Munos, INRIA Futurs Lille Csaba Szepesvári, University of Alberta The 18th International Conference
More informationMonte-Carlo Planning Look Ahead Trees. Alan Fern
Monte-Carlo Planning Look Ahead Trees Alan Fern 1 Monte-Carlo Planning Outline Single State Case (multi-armed bandits) A basic tool for other algorithms Monte-Carlo Policy Improvement Policy rollout Policy
More informationMDP Algorithms. Thomas Keller. June 20, University of Basel
MDP Algorithms Thomas Keller University of Basel June 20, 208 Outline of this lecture Markov decision processes Planning via determinization Monte-Carlo methods Monte-Carlo Tree Search Heuristic Search
More informationMonte-Carlo Beam Search
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 1 Monte-Carlo Beam Search Tristan Cazenave Abstract Monte-Carlo Tree Search is state of the art for multiple games and for solving puzzles
More informationOptimistic Planning for the Stochastic Knapsack Problem
Optimistic Planning for the Stochastic Knapsack Problem Anonymous Author Anonymous Author 2 Anonymous Author 3 Unknown Institution Unknown Institution 2 Unknown Institution 3 Abstract The stochastic knapsack
More informationBernoulli Bandits An Empirical Comparison
Bernoulli Bandits An Empirical Comparison Ronoh K.N1,2, Oyamo R.1,2, Milgo E.1,2, Drugan M.1 and Manderick B.1 1- Vrije Universiteit Brussel - Computer Sciences Department - AI Lab Pleinlaan 2 - B-1050
More informationMulti-Armed Bandit, Dynamic Environments and Meta-Bandits
Multi-Armed Bandit, Dynamic Environments and Meta-Bandits C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag Lab. of Computer Science CNRS INRIA Université Paris-Sud, Orsay, France Abstract This
More informationVariance Reduction in Monte-Carlo Tree Search
Variance Reduction in Monte-Carlo Tree Search Joel Veness University of Alberta veness@cs.ualberta.ca Marc Lanctot University of Alberta lanctot@cs.ualberta.ca Michael Bowling University of Alberta bowling@cs.ualberta.ca
More informationMonte Carlo Tree Search with Sampled Information Relaxation Dual Bounds
Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds Daniel R. Jiang, Lina Al-Kanj, Warren B. Powell April 19, 2017 Abstract Monte Carlo Tree Search (MCTS), most famously used in game-play
More informationAn algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
JMLR: Workshop and Conference Proceedings vol 49:1 5, 2016 An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits Peter Auer Chair for Information Technology Montanuniversitaet
More informationExploration for sequential decision making Application to games, tree search, optimization, and planning
Exploration for sequential decision making Application to games, tree search, optimization, and planning Rémi Munos SequeL project: Sequential Learning http://sequel.futurs.inria.fr/ INRIA Lille - Nord
More informationReinforcement Learning
Reinforcement Learning Michèle Sebag ; TP : Herilalaina Rakotoarison TAO, CNRS INRIA Université Paris-Sud Nov. 26th, 2018 Credit for slides: Richard Sutton, Freek Stulp, Olivier Pietquin 1 / 90 Where we
More informationBandit based Monte-Carlo Planning
Bandit based Monte-Carlo Planning Levente Kocsis and Csaba Szepesvári Computer and Automation Research Institute of the Hungarian Academy of Sciences, Kende u. 13-17, 1111 Budapest, Hungary kocsis@sztaki.hu
More informationAction Selection for MDPs: Anytime AO* vs. UCT
Action Selection for MDPs: Anytime AO* vs. UCT Blai Bonet 1 and Hector Geffner 2 1 Universidad Simón Boĺıvar 2 ICREA & Universitat Pompeu Fabra AAAI, Toronto, Canada, July 2012 Online MDP Planning and
More informationMarkov Decision Processes
Markov Decision Processes Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. AIMA 3. Chris Amato Stochastic domains So far, we have studied search Can use
More informationRollout Allocation Strategies for Classification-based Policy Iteration
Rollout Allocation Strategies for Classification-based Policy Iteration V. Gabillon, A. Lazaric & M. Ghavamzadeh firstname.lastname@inria.fr Workshop on Reinforcement Learning and Search in Very Large
More informationMonte-Carlo tree search for multi-player, no-limit Texas hold'em poker. Guy Van den Broeck
Monte-Carlo tree search for multi-player, no-limit Texas hold'em poker Guy Van den Broeck Should I bluff? Deceptive play Should I bluff? Is he bluffing? Opponent modeling Should I bluff? Is he bluffing?
More informationTreatment Allocations Based on Multi-Armed Bandit Strategies
Treatment Allocations Based on Multi-Armed Bandit Strategies Wei Qian and Yuhong Yang Applied Economics and Statistics, University of Delaware School of Statistics, University of Minnesota Innovative Statistics
More informationExtending MCTS
Extending MCTS 2-17-16 Reading Quiz (from Monday) What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCT b) UCT is a type of MCTS
More information1 Bandit View on Noisy Optimization
1 Bandit View on Noisy Optimization Jean-Yves Audibert audibert@certis.enpc.fr Imagine, Université Paris Est; Willow, CNRS/ENS/INRIA Paris, France Sébastien Bubeck sebastien.bubeck@inria.fr Sequel Project,
More informationForecasting Financial Volatility Using Nested Monte Carlo Expression Discovery
Forecasting Financial Volatility Using Nested Monte Carlo Expression Discovery Tristan Cazenave and Sana Ben Hamida LAMSADE Université Paris-Dauphine Paris, France Email: cazenave@lamsade.dauphine.fr sbenhami@u-paris10.fr
More informationReinforcement Learning and Simulation-Based Search
Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision
More informationDecision support for tactical planning A use case of the INFRALERT project
Proceedings of 7th Transport Research Arena TRA 2018, April 16-19, 2018, Vienna, Austria Decision support for tactical planning A use case of the INFRALERT project Ute Kandler a*, Axel Simroth a, João
More informationInverse reinforcement learning from summary data
Inverse reinforcement learning from summary data Antti Kangasrääsiö, Samuel Kaski Aalto University, Finland ECML PKDD 2018 journal track Published in Machine Learning (2018), 107:1517 1535 September 12,
More informationOptimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing
Optimal Search for Parameters in Monte Carlo Simulation for Derivative Pricing Prof. Chuan-Ju Wang Department of Computer Science University of Taipei Joint work with Prof. Ming-Yang Kao March 28, 2014
More informationApplication of Monte-Carlo Tree Search to Traveling-Salesman Problem
R4-14 SASIMI 2016 Proceedings Alication of Monte-Carlo Tree Search to Traveling-Salesman Problem Masato Shimomura Yasuhiro Takashima Faculty of Environmental Engineering University of Kitakyushu Kitakyushu,
More informationLearning for Revenue Optimization. Andrés Muñoz Medina Renato Paes Leme
Learning for Revenue Optimization Andrés Muñoz Medina Renato Paes Leme How to succeed in business with basic ML? ML $1 $5 $10 $9 Google $35 $1 $8 $7 $7 Revenue $8 $30 $24 $18 $10 $1 $5 Price $7 $8$9$10
More information2D5362 Machine Learning
2D5362 Machine Learning Reinforcement Learning MIT GALib Available at http://lancet.mit.edu/ga/ download galib245.tar.gz gunzip galib245.tar.gz tar xvf galib245.tar cd galib245 make or access my files
More informationReinforcement Learning
Reinforcement Learning Model-based RL and Integrated Learning-Planning Planning and Search, Model Learning, Dyna Architecture, Exploration-Exploitation (many slides from lectures of Marc Toussaint & David
More informationMachine Learning for Physicists Lecture 10. Summer 2017 University of Erlangen-Nuremberg Florian Marquardt
Machine Learning for Physicists Lecture 10 Summer 2017 University of Erlangen-Nuremberg Florian Marquardt Function/Image representation Image classification [Handwriting recognition] Convolutional nets
More informationBlazing the trails before beating the path: Sample-efficient Monte-Carlo planning
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning Jean-Bastien Grill Michal Valko SequeL team, INRIA Lille - Nord Europe, France jean-bastien.grill@inria.fr michal.valko@inria.fr
More informationLending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)
CS22 Artificial Intelligence Stanford University Autumn 26-27 Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) Overview Lending Club is an online peer-to-peer lending
More informationGeneralised Discount Functions applied to a Monte-Carlo AIµ Implementation
Generalised Discount Functions applied to a Monte-Carlo AIµ Implementation Sean Lamont 1, John Aslanides 1, Jan Leike 2, and Marcus Hutter 1 1 Research School of Computer Science, Australian National University
More information$tock Forecasting using Machine Learning
$tock Forecasting using Machine Learning Greg Colvin, Garrett Hemann, and Simon Kalouche Abstract We present an implementation of 3 different machine learning algorithms gradient descent, support vector
More information4 Reinforcement Learning Basic Algorithms
Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 4 Reinforcement Learning Basic Algorithms 4.1 Introduction RL methods essentially deal with the solution of (optimal) control problems
More informationAdaptive Market Design - The SHMart Approach
Adaptive Market Design - The SHMart Approach Harivardan Jayaraman hari81@cs.utexas.edu Sainath Shenoy sainath@cs.utexas.edu Department of Computer Sciences The University of Texas at Austin Abstract Markets
More informationApplication of Bayesian Network to stock price prediction
ORIGINAL RESEARCH Application of Bayesian Network to stock price prediction Eisuke Kita, Yi Zuo, Masaaki Harada, Takao Mizuno Graduate School of Information Science, Nagoya University, Japan Correspondence:
More informationIntroduction to Reinforcement Learning. MAL Seminar
Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology
More informationRegret Minimization against Strategic Buyers
Regret Minimization against Strategic Buyers Mehryar Mohri Courant Institute & Google Research Andrés Muñoz Medina Google Research Motivation Online advertisement: revenue of modern search engine and
More informationApplication of Importance Sampling using Contaminated Normal Distribution to Multidimensional Variation Analysis
1, 2 1 3, 4 1 3 1 Monte Carlo g(x) g(x) g(x) g(x) g(x) / 6-24 SRAM Monte Carlo 2 5 Application of Importance Sampling using Contaminated Normal Distribution to Multidimensional Variation Analysis Shiho
More informationOptimal Regret Minimization in Posted-Price Auctions with Strategic Buyers
Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers Mehryar Mohri Courant Institute and Google Research 251 Mercer Street New York, NY 10012 mohri@cims.nyu.edu Andres Muñoz Medina
More informationDynamic Programming and Reinforcement Learning
Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN
Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL NETWORKS K. Jayanthi, Dr. K. Suresh 1 Department of Computer
More informationAn introduction to Machine learning methods and forecasting of time series in financial markets
An introduction to Machine learning methods and forecasting of time series in financial markets Mark Wong markwong@kth.se December 10, 2016 Abstract The goal of this paper is to give the reader an introduction
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non-Deterministic Search 1 Example: Grid World A maze-like problem The agent lives
More informationRevenue optimization in AdExchange against strategic advertisers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationc 2004 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-2004), Budapest, Hungary, pp
c 24 IEEE. Reprinted from the Proceedings of the International Joint Conference on Neural Networks (IJCNN-24), Budapest, Hungary, pp. 197 112. This material is posted here with permission of the IEEE.
More informationAdaptive Experiments for Policy Choice. March 8, 2019
Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:
More informationDynamic Pricing with Limited Supply (extended abstract)
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationMonte Carlo Methods (Estimators, On-policy/Off-policy Learning)
1 / 24 Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) Julie Nutini MLRG - Winter Term 2 January 24 th, 2017 2 / 24 Monte Carlo Methods Monte Carlo (MC) methods are learning methods, used
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Markov Decision Processes II Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC
More informationMulti-armed bandit problems
Multi-armed bandit problems Stochastic Decision Theory (2WB12) Arnoud den Boer 13 March 2013 Set-up 13 and 14 March: Lectures. 20 and 21 March: Paper presentations (Four groups, 45 min per group). Before
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Markov Decision Processes Dan Klein, Pieter Abbeel University of California, Berkeley Non Deterministic Search Example: Grid World A maze like problem The agent lives in
More informationThe Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index
The Use of Artificial Neural Network for Forecasting of FTSE Bursa Malaysia KLCI Stock Price Index Soleh Ardiansyah 1, Mazlina Abdul Majid 2, JasniMohamad Zain 2 Faculty of Computer System and Software
More informationPolicy Iteration for Learning an Exercise Policy for American Options
Policy Iteration for Learning an Exercise Policy for American Options Yuxi Li, Dale Schuurmans Department of Computing Science, University of Alberta Abstract. Options are important financial instruments,
More informationLearning the Demand Curve in Posted-Price Digital Goods Auctions
Learning the Demand Curve in Posted-Price Digital Goods Auctions ABSTRACT Meenal Chhabra Rensselaer Polytechnic Inst. Dept. of Computer Science Troy, NY, USA chhabm@cs.rpi.edu Online digital goods auctions
More informationFast Convergence of Regress-later Series Estimators
Fast Convergence of Regress-later Series Estimators New Thinking in Finance, London Eric Beutner, Antoon Pelsser, Janina Schweizer Maastricht University & Kleynen Consultants 12 February 2014 Beutner Pelsser
More informationThe Non-stationary Stochastic Multi-armed Bandit Problem
The Non-stationary Stochastic Multi-armed Bandit Problem Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard To cite this version: Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard The Non-stationary
More informationLecture 17: More on Markov Decision Processes. Reinforcement learning
Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture
More informationStatistical and Machine Learning Approach in Forex Prediction Based on Empirical Data
Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data Sitti Wetenriajeng Sidehabi Department of Electrical Engineering Politeknik ATI Makassar Makassar, Indonesia tenri616@gmail.com
More informationANN Robot Energy Modeling
IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) e-issn: 2278-1676,p-ISSN: 2320-3331, Volume 11, Issue 4 Ver. III (Jul. Aug. 2016), PP 66-81 www.iosrjournals.org ANN Robot Energy Modeling
More informationHigh Dimensional Bayesian Optimisation and Bandits via Additive Models
1/20 High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kandasamy, Jeff Schneider, Barnabás Póczos ICML 15 July 8 2015 2/20 Bandits & Optimisation Maximum Likelihood inference
More informationLogistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week
CS 473: Artificial Intelligence Markov Decision Processes Dan Weld University of Washington [Slides originally created by Dan Klein & Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials
More informationReinforcement Learning. Monte Carlo and Temporal Difference Learning
Reinforcement Learning Monte Carlo and Temporal Difference Learning Manfred Huber 2014 1 Monte Carlo Methods Dynamic Programming Requires complete knowledge of the MDP Spends equal time on each part of
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, ISSN
International Journal of Computer Engineering and Applications, Volume XII, Issue II, Feb. 18, www.ijcea.com ISSN 31-3469 AN INVESTIGATION OF FINANCIAL TIME SERIES PREDICTION USING BACK PROPAGATION NEURAL
More informationDefinition Pricing Risk management Second generation barrier options. Barrier Options. Arfima Financial Solutions
Arfima Financial Solutions Contents Definition 1 Definition 2 3 4 Contenido Definition 1 Definition 2 3 4 Definition Definition: A barrier option is an option on the underlying asset that is activated
More informationNaïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients
American Journal of Data Mining and Knowledge Discovery 2018; 3(1): 1-12 http://www.sciencepublishinggroup.com/j/ajdmkd doi: 10.11648/j.ajdmkd.20180301.11 Naïve Bayesian Classifier and Classification Trees
More informationComputational Intelligence in the Development of Derivative s Pricing,Arbitrage and Hedging
Computational Intelligence in the Development of Derivative s Pricing,Arbitrage and Hedging Wo-Chiang Lee Department of Finance and Banking,Aletheia University AI-ECON Research Group August 20,2004 Outline
More informationarxiv: v1 [cs.lg] 23 Nov 2014
Revenue Optimization in Posted-Price Auctions with Strategic Buyers arxiv:.0v [cs.lg] Nov 0 Mehryar Mohri Courant Institute and Google Research Mercer Street New York, NY 00 mohri@cims.nyu.edu Abstract
More informationMeasuring DAX Market Risk: A Neural Network Volatility Mixture Approach
Measuring DAX Market Risk: A Neural Network Volatility Mixture Approach Kai Bartlmae, Folke A. Rauscher DaimlerChrysler AG, Research and Technology FT3/KL, P. O. Box 2360, D-8903 Ulm, Germany E mail: fkai.bartlmae,
More informationNon-linear logit models for high frequency currency exchange data
Non-linear logit models for high frequency currency exchange data N. Sazuka 1 & T. Ohira 2 1 Department of Physics, Tokyo Institute of Technology, Japan 2 Sony Computer Science Laboratories, Japan Abstract
More informationApproximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets
Approximate Dynamic Programming for the Merchant Operations of Commodity and Energy Conversion Assets Selvaprabu (Selva) Nadarajah, (Joint work with François Margot and Nicola Secomandi) Tepper School
More informationCS221 / Spring 2018 / Sadigh. Lecture 9: Games I
CS221 / Spring 2018 / Sadigh Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic
More informationAmerican Option Pricing Formula for Uncertain Financial Market
American Option Pricing Formula for Uncertain Financial Market Xiaowei Chen Uncertainty Theory Laboratory, Department of Mathematical Sciences Tsinghua University, Beijing 184, China chenxw7@mailstsinghuaeducn
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 9: MDPs 2/16/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements
More informationMATH4143: Scientific Computations for Finance Applications Final exam Time: 9:00 am - 12:00 noon, April 18, Student Name (print):
MATH4143 Page 1 of 17 Winter 2007 MATH4143: Scientific Computations for Finance Applications Final exam Time: 9:00 am - 12:00 noon, April 18, 2007 Student Name (print): Student Signature: Student ID: Question
More informationLOSS SEVERITY DISTRIBUTION ESTIMATION OF OPERATIONAL RISK USING GAUSSIAN MIXTURE MODEL FOR LOSS DISTRIBUTION APPROACH
LOSS SEVERITY DISTRIBUTION ESTIMATION OF OPERATIONAL RISK USING GAUSSIAN MIXTURE MODEL FOR LOSS DISTRIBUTION APPROACH Seli Siti Sholihat 1 Hendri Murfi 2 1 Department of Accounting, Faculty of Economics,
More informationSimulation Analysis for Evaluating Risk-sharing Pension Plans
PBSS Webinar December 14, 2016 Simulation Analysis for Evaluating Risk-sharing Pension Plans Norio Hibiki Masaaki Ono Keio University Mizuho Pension Research Institute This slide can be downloaded from
More informationLecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1
Lecture 9: Games I Course plan Search problems Markov decision processes Adversarial games Constraint satisfaction problems Bayesian networks Reflex States Variables Logic Low-level intelligence Machine
More informationMonte Carlo Methods in Financial Engineering
Paul Glassennan Monte Carlo Methods in Financial Engineering With 99 Figures
More informationImportance Sampling for Fair Policy Selection
Importance Sampling for Fair Policy Selection Shayan Doroudi Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Carnegie Mellon University Pittsburgh, PA 15213 philipt@cs.cmu.edu
More informationMachine Learning for Quantitative Finance
Machine Learning for Quantitative Finance Fast derivative pricing Sofie Reyners Joint work with Jan De Spiegeleer, Dilip Madan and Wim Schoutens Derivative pricing is time-consuming... Vanilla option pricing
More informationComparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return
Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return Craig Sherstan 1, Dylan R. Ashley 2, Brendan Bennett 2, Kenny Young, Adam White, Martha White, Richard
More informationCS 6300 Artificial Intelligence Spring 2018
Expectimax Search CS 6300 Artificial Intelligence Spring 2018 Tucker Hermans thermans@cs.utah.edu Many slides courtesy of Pieter Abbeel and Dan Klein Expectimax Search Trees What if we don t know what
More informationPrediction Models of Financial Markets Based on Multiregression Algorithms
Computer Science Journal of Moldova, vol.19, no.2(56), 2011 Prediction Models of Financial Markets Based on Multiregression Algorithms Abstract The paper presents the results of simulations performed for
More informationDetail-free, Posted-Price Mechanisms for Limited Supply Online Auctions
Detail-free, Posted-Price Mechanisms for Limited Supply Online Auctions Moshe Babaioff Shaddin Dughmi Aleksandrs Slivkins February 2010 Abstract We consider online posted-price mechanisms with limited
More informationMath 416/516: Stochastic Simulation
Math 416/516: Stochastic Simulation Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 13 Haijun Li Math 416/516: Stochastic Simulation Week 13 1 / 28 Outline 1 Simulation
More informationReasoning with Uncertainty
Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally
More informationReinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration
Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision
More informationNon-Deterministic Search
Non-Deterministic Search MDP s 1 Non-Deterministic Search How do you plan (search) when your actions might fail? In general case, how do you plan, when the actions have multiple possible outcomes? 2 Example:
More informationReinforcement Learning
Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical
More information