Applying Monte Carlo Tree Search to Curling AI

Size: px

Start display at page:

Download "Applying Monte Carlo Tree Search to Curling AI"

Gerard Richards
5 years ago
Views:

1 AI 1,a) 2,b) MDP Applying Monte Carlo Tree Search to Curling AI Katsuki Ohto 1,a) Tetsuro Tanaka 2,b) Abstract: We propose an action decision method based on Monte Carlo Tree Search for MDPs with continuous state space. We applied our method to agents of the UEC digital curling system, which is build for arguing curling strategies. The experimental results show that our method is effective for not only agents with a simple simulation policy, but also agents with a handmade complex one. 1. MDP [1][2] AI [3] [4] Expectimax-search 1 Graduate School of Arts and Sciences, The University of Tokyo 2 Information Technology Center, The University of Tokyo a) ohto@tanaka.ecc.u-tokyo.ac.jp b) ktanaka@tanaka.ecc.u-tokyo.ac.jp AI [3] [1] Box2D *1 *1 Box2D A 2D Physics Engine for Games Information Processing Society of Japan

2 *2 1 UEC *3 1 GAT * UCB1 [7] UCT[8] [9] 3.3 *2 *3 [2] UEC 1 *4 1 GAT UCBC[10] Hierarchical Optimistic Optimization HOO [11] UCBC HOO UCT HOO 1 UEC [12] 1 1 Yee Kernel Regression UCT[13] Yee 3.4 Double Progressive Widening DPW [14] 1 1 DPW UCB HOO Information Processing Society of Japan

3 State Tree [0, 1) [0, 0.5) [0.5, 1.0) [0, 0.25) [0.25, 0.5) [05, 0.75) [0.75, 1) D 2 D n stone 2 nstone 2 nstone d N ex (d) N ex (d) d 1 d d w(d) w(d) x a ñ(x, a) r(x, a) S x s d(s) s n(s, a) r(s, a) a ñ(x, a) = s S w(d(s))n(s, a) (1) r(x, a) = s S w(d(s))r(s, a) (2) x x A ñ(x) = a A ñ(x, a) (3) s x 0 x 1 x 0 x 1 ñ(x) = s S w(d(s))n(s) ñ(x, a) r(x, a) x a UCB ñ(x, a) 2016 Information Processing Society of Japan

4 ñ(x) > 1 n(x) < ñ(x) ñ(x) n(x) ñ(x, a) r(x, a) UCB1 n(x, a) r(x, a) n(x, a) = ñ(x, a) n(x) ñ(x) r(x, a) = r(x, a) n(x) ñ(x) (4) (5) n(x) n(x, a) r(x, a) UCB1 [7] a try a try = argmax a A ( r(x, a) n(x, a) + C UCB ) 2 ln n(x) n(x, a) 4.1 (6) 6 Chaslot [15] softmax a V pre (a) softmax T N pre W P tor() x a n (x, a) r (x, a) r (x, a) = r(x, a) + W P tor( n (x, a) = n(x, a) + N pre (7) e Vpre(a) T b A e Vpre(b) T )N pre (8) (, + ) [4] [4] 1336 *5 l ( l 2, 2l) ( l 4, ) *6 [1] *5 * Information Processing Society of Japan

5 1 2 2 No * 7 softmax 2 [4] Stochastic Gradient Descent i ϕ j j V ( ar() ) V ar(ϕ j) i+1 1 L1 0 L ( ) i+1 UEC GAT 10 GCCS CSACE 184 * a similar { a similar = argmin 2(vx (a) V x ) 2 + (v y (a) V y ) 2} a A (9) A a 1 v x (a) v y (a) a V x V y v x (a) V x 2016 Information Processing Society of Japan

6 2 curing log viewer *8 softmax T = (8) W P tor() *8 log viewer x y 1 x y d d Information Processing Society of Japan

7 x y mm 15.8m *9 d N ex (d) 6.2 [2] Box2D Box2D UCB1 C UCB = 1 softmax T = 0.8 N pre = 2 State - Tree State - Tree MCTS Pure MC N ex (d) = N exbase C d ex (10) N exbase C ex N exbase = 1 C ex = 1.3 w(d) w(d) = (d + 1) Cw (11) C w C w = 4 w(d) x ñ(x) n(x) C mod = 0.4 n(x) = ñ(x) C mod (12) UCB1 1 2 *9 l 2l State - Tree MCTS vs Pure MC 2 State - Tree MCTS vs Pure MC 3 Pure MC vs Pure MC 4 State - Tree MCTS vs State - Tree MCTS GAT * *10 [2] GAT (2016) 2016 Information Processing Society of Japan

8 State - Tree MCTS Pure MC State - Tree MCTS Pure MC (p = ) 5 State - Tree MCTS Pure MC (p = 0.003) 6 Pure MC (p = ) 7 State - Tree MCTS (p = 0.001) State - Tree MCTS Pure MC 5% % 7. [1],,,, 2014-GI-31, No. 2, pp. 1-5 (2014). [2]., ( ). [3], 2015,, 2016-GI-36, No. 2, pp. 1-6 (2016). [4],, AI, ( 104 ), pp (2015). [5] M. Yamamoto, S. Kato, H. Iizuka, Digital Curling Strategy on Game Tree Search, 2015 IEEE Conference on Computational Intelligence and Games, (2015). [6],,, 2015, pp (2015). [7] P. Auer, N. Cesa-Bianchi, and P. Fischer Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, Vol. 47, pp (2002). [8] L. Kocsis and C. Szepesvari Bandit based Monte-Carlo Planning. European conference on machine learning (ECML2006), pp (2006). [9] F. van Lishout, G. Chaslot, and J. Uiterwijk Monte- Carlo Tree Search in Backgammon, Computer Games Workshop, pp (2007). [10] P. Auer, R. Ortner and C. Szepesvari Improved Rates for the Stochastic Continuum-Armed Bandit Problem, International Conference on Computational Learning Theory, Springer, pp (2007). [11] S. Bubeck, R. Munos, G. Stoltz and C. Szepesvari Online Optimization in X-Armed Bandits. Advances in Neural Information Processing Systems (NIPS2009), pp (2009). [12],, 1 UEC,, 2015-GI-34, No. 2, pp. 1-6 (2015). [13] T. Yee, V. Lisy, and M. Bowling Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty. International Joint Conference on Artificial Intelligence. (2016). [14] A. Couetoux, J. Hoock, N. Sokolovska, O. Teytaud and N. Bonnard, Continuous upper confidence trees. International Conference on Learning and Intelligent Optimization, pp (2011). [15] G. Chaslot, C. Fiter, J.P. Hoock, A. Rimmel and O.Teytaud Adding expert knowledge and exploration in Monte-Carlo Tree Search, Advances in Computer Games, LNCS, Vol. 6048, pp (2009) Information Processing Society of Japan

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern 1 Large Worlds We have considered basic model-based planning algorithms Model-based planning: assumes MDP model is available Methods we learned