Reactive Synthesis Without Regret

Size: px

Start display at page:

Download "Reactive Synthesis Without Regret"

MargaretMargaret Andrews
5 years ago
Views:

1 Reactive Synthesis Without Regret (Non, rien de rien... ) Paul Hunter, Guillermo A. Pérez, Jean-François Raskin CONCUR Madrid September, 215

2 Outline 1 Regret 2 Playing against a positional adversary 3 Playing against an eloquent adversary 4 Playing against any adversary G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

3 What is this talk about? Key Words Weighted arenas, ve, dam, Infinite plays, and Payoff functions G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

4 What is this talk about? Key Words Weighted arenas, ve, dam, Infinite plays, and Payoff functions 1 u 6 v x ve dam G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

5 Which payoff functions? A payoff function is of the form Val : Q ω R. Classical Payoff Functions sup, inf, lim sup, lim inf, mean-payoff In this talk: mostly mean-payoff G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

6 What do you mean by regret minimisation? In words... We want to find the strategy of ve that minimises the difference between her actual payoff and the payoff she could have achieved if she had known the strategy of dam in advance. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

7 What do you mean by regret minimisation? In words... We want to find the strategy of ve that minimises the difference between her actual payoff and the payoff she could have achieved if she had known the strategy of dam in advance. Halpern and Pass: A better solution concept than NE Zwick and Paterson: Competitive analysis of online metrical task systems finite window online string matching selection with limited storage Learning in a static, unknown environment Determinisation by pruning and good-for-games automata G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

8 A formal definition of regret Let G be a weighted arena, Val be a payoff function, and Σ and Σ be sets of strategies for ve and dam respectively. The regret of σ Σ reg σ Σ,Σ (G) := sup τ Σ ( sup σ Σ Val(σ, τ) Val(σ, τ)) The regret of ve in G Reg Σ,Σ (G) := inf σ Σ reg σ Σ,Σ (G) We will be making assumptions about Σ. Σ is the set of all strategies of ve. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

9 Outline 1 Regret 2 Playing against a positional adversary 3 Playing against an eloquent adversary 4 Playing against any adversary G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

10 Example 1: a mean-payoff game Assume dam plays positionally. 1 u 6 v x G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

11 Results Theorem (Hardness) For r Q, weighted arena G and payoff function lim inf or mean-payoff, determining whether the regret value is less than r is PSPACE-hard. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

12 Results Theorem (Hardness) For r Q, weighted arena G and payoff function lim inf or mean-payoff, determining whether the regret value is less than r is PSPACE-hard. Theorem (Algorithm) For all payoff functions, computing the regret value can be done in polynomial space. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

13 From MP Regret games to MPGs We construct a new arena Ĝ s.t. Reg(G) = aval(ĝ) the vertices record the witnessed choices of dam ˆV := V P(E) 1 cval is the co-operative value G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

14 From MP Regret games to MPGs We construct a new arena Ĝ s.t. Reg(G) = aval(ĝ) the vertices record the witnessed choices of dam ˆV := V P(E) the new weight function uses this info to reduce the value of potential alternatives ŵ ( (u, C), (v, D) ) := w(u, v) cval(g D) 1 cval is the co-operative value G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

15 Example 1: a mean-payoff game Assume dam plays positionally. 1 u 6 v x G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

16 From MP Regret games to MPGs (example) u, {xv, xu} u, {xu} v, {xu} 2 2 v, {xv, xu} 2 4 x, {xv} 4 2 x, {xu} x, {xv, xu} 1 v, {xv} 1 u, {xv} 1 G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

17 Outline 1 Regret 2 Playing against a positional adversary 3 Playing against an eloquent adversary 4 Playing against any adversary G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

18 Playing on an automaton Assume now that the choices of dam are made by him choosing letters from a finite alphabet A. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

19 Playing on an automaton Assume now that the choices of dam are made by him choosing letters from a finite alphabet A. So the choices of ve can be thought of as her resolving non-determinism on an automaton. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

20 Playing on an automaton Assume now that the choices of dam are made by him choosing letters from a finite alphabet A. So the choices of ve can be thought of as her resolving non-determinism on an automaton. Word strategies A strategy of dam is a word strategy if it is of the form τ : N A. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

21 The GFG game A play here defines a run and a word. ve wins the play if, whenever the word is accepted, her run is accepting. a b x a a x a x a b x b 1 G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

22 Relationships with other concepts Good-for-games [HP6] A quantitative automaton A is α-gfg if Simulator, against any word x A ω spelled by Spoiler, can resolve non-determinism in A so that the resulting run has value v and d(a(x), v) α, for some metric d. a a Also generalizes determinisation by pruning of a refinement [AKL1]. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

23 Relationships with other concepts Good-for-games [HP6] A quantitative automaton A is α-gfg if Simulator, against any word x A ω spelled by Spoiler, can resolve non-determinism in A so that the resulting run has value v and d(a(x), v) α, for some metric d. a a Also generalizes determinisation by pruning of a refinement [AKL1]. Theorem A quantitative automaton A is α-gfg if and only if ve can ensure regret value at most α, w.r.t. metric d, in the regret game played on A. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

24 Results Theorem (General case) For r Q, automaton G and payoff functions inf, sup, lim inf, and lim sup, determining whether the regret value is less than r is EXP-complete. For mean-payoff it is undecidable. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

25 Results Theorem (General case) For r Q, automaton G and payoff functions inf, sup, lim inf, and lim sup, determining whether the regret value is less than r is EXP-complete. For mean-payoff it is undecidable. Theorem (Restricting the memory of ve) For r Q and m N, automaton G and any payoff function, determining whether ve has a strategy using memory of at most m to ensure regret value less than r is NTIME(m). G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

26 Outline 1 Regret 2 Playing against a positional adversary 3 Playing against an eloquent adversary 4 Playing against any adversary G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

27 Results Theorem (Hardness) For all payoff functions, computing the regret of a game is at least as hard as computing the antagonistic value of a (polynomial-size) game with the same payoff function. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

28 Results Theorem (Hardness) For all payoff functions, computing the regret of a game is at least as hard as computing the antagonistic value of a (polynomial-size) game with the same payoff function. Theorem (Algorithm) For all payoff functions, computing the regret reduces to computing the antagonistic value of a (polynomial-size) game with the same payoff function. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

29 Any questions? Summary sup inf lim sup lim inf MP Any poly-time equiv to regular game Positional PSPACE PSPACE-c Eloquent EXP-c undec. Thank you for your attention. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

30 From MP Regret games to MPGs 1 Label dam edges as w (e) = and ve edges as follows: w (e) = max{cval v : (u, v ) E \ {e}}. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

31 From MP Regret games to MPGs 1 Label dam edges as w (e) = and ve edges as follows: w (e) = max{cval v : (u, v ) E \ {e}}. 2 Let G b be the restriction of G to edges e with w (e) b. G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

32 From MP Regret games to MPGs 1 Label dam edges as w (e) = and ve edges as follows: w (e) = max{cval v : (u, v ) E \ {e}}. 2 Let G b be the restriction of G to edges e with w (e) b. G b 1 v v b 1 I w(e) b 1 if w (e) > b 1 v 1. v 2W 1 v bn I w(e) b n if w (e) > b n G bn G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

33 From MP Regret games to MPGs (example) v 1 u 1 v 1 x y 1 v 1 1 G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

34 From QBF (3CNF) to MP Regret games (< 2) C 1, x i C 1, x k 4 3 C 1, x j G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

35 From QBF to MP Regret games x x x 1 x 1 x n x n Φ 2 C i C j G.A. Pérez (ULB) Reactive Synthesis Without Regret September, / 2

Reactive Synthesis Without Regret

Reactive Synthesis Without Regret (Non, rien de rien... ) Guillermo A. Pérez Highlights @ Prague September, 2015 That feeling when... Example: doing your laundry sc s G.A. Pérez (ULB) Reactive Synthesis