Reduced Complexity Approaches to Asymmetric Information Games

Size: px

Start display at page:

Download "Reduced Complexity Approaches to Asymmetric Information Games"

Buck Conley
5 years ago
Views:

1 Reduced Complexity Approaches to Asymmetric Information Games Jeff Shamma and Lichun Li Georgia Institution of Technology ARO MURI Annual Review November 19, 2014

Research Thrust: Obtaining Actionable Cyber-Attack Forecasts

Its Application in Network Interdiction Problems Resilience

Jeff Shamma and Lichun Li Georgia Institution of Technology

2 Research Thrust: Obtaining Actionable Cyber-Attack Forecasts Today s talk Value Iteration of Repeated Asymmetric Games and Its Application in Network Interdiction Problems Resilience of LTE Networks against Smart Jamming Attacks. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 2 / 16

Project structure Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow,

dependencies between assets and missions Data Data Analyze and Characterize Attackers Data

Analysis Data Create semantically-rich view of cyber-mission status Jeff Shamma and Lichun Li

3 Project structure Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analysis to determine dependencies between assets and missions Data Data Analyze and Characterize Attackers Data Mission Model Data Cyber-Assets Model Predict Future Actions COAs Sensor Alerts Data Impact Analysis Data Create semantically-rich view of cyber-mission status Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 3 / 16

4 Games with different information patterns Player 1's info Player 2's info Markovian Repeated One shot Info pattern Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 4 / 16

5 Network Interdiction Problem: An Asymmetric Game channel 1:10 Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 5 / 16

6 Network Interdiction Problem: An Asymmetric Game channel 1:10 channel 2:1 Attacker s actions Observe which channel is in use, but not able to measure the capacity. This action is effortless. Block one of the channels. This action has a cost of 1. Channel 1 has high capacity? Channel 2 has high capacity? observe attack 1 attack 2 observe attack 1 attack 2 use use Goal: transmit as much info as possible, the sooner the better. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 5 / 16

7 Abstraction of The Game: A Discounted Asymmetric Repeated Game Asymmetric repeated games: Three finite sets: state set S, i.e. high capacity channel is channel 1 or 2? defender s action set I, i.e. use channel 1 or 2? attacker s action set J, i.e. observe, block channel 1 or 2? An initial belief (probability) p 0 over state s, i.e. [0.5; 0.5]. A payoff function g : S I J R, i.e. g(1, 1, 2) = 11. The play rule: Stage 1: Choose state s p 0. State s is told to the defender. Both players independently choose their actions. Both Actions are announced. Stage 2 and forward: Both players independently choose their actions. Both Actions are announced. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 6 / 16

8 Abstraction of The Game: A Discounted Asymmetric Repeated Game Asymmetric repeated games: Three finite sets: state set S, i.e. high capacity channel is channel 1 or 2? defender s action set I, i.e. use channel 1 or 2? attacker s action set J, i.e. observe, block channel 1 or 2? An initial belief (probability) p 0 over state s, i.e. [0.5; 0.5]. A payoff function g : S I J R, i.e. g(1, 1, 2) = 11. The play rule: Stage 1: Choose state s p 0. State s is told to the defender. Both players independently choose their actions. Both Actions are announced. Stage 2 and forward: Both players independently choose their actions. Both Actions are announced. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 6 / 16

9 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. Payoff: γ(p 0, σ, τ) = E p0,σ,τ ( t=1 g(s, i t, j t )). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16

10 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. ( Discounted payoff: γ λ (p 0, σ, τ) = E p0,σ,τ t=1 λ(1 λ)t 1 g(s, i t, j t ) ). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16

11 Discounted Asymmetric Repeated Games (continued) Behavior strategy and discounted payoff Defender s behavior strategy σ : S (I J) t (I), σ Σ. Attacker s behavior strategy τ : (I J) t (J), τ T. ( Discounted payoff: γ λ (p 0, σ, τ) = E p0,σ,τ t=1 λ(1 λ)t 1 g(s, i t, j t ) ). The λ-discounted asymmetric game Γ λ (p 0 ) and its value The λ-discounted game Γ λ (p 0 ): a repeated asymmetric game with initial distribution p 0, strategy spaces Σ and T, and payoff function γ λ (p 0, σ, τ). Game value v λ (p 0 ): v λ (p 0 ) = v λ (p 0 ) =. v λ (p 0 ), v λ (p 0 ) = sup inf γ λ(p 0, σ, τ). σ Σ τ T v λ (p 0 ) = inf sup γ λ (p 0, σ, τ) τ T σ Σ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 7 / 16

12 The KEY of the discounted asymmetric game Smart attacker learns from history actions ( p p + s x s ) t (i) (p, x t, i) = x p,xt (i) s S (1) p + : the current (at the beginning of stage t + 1) belief of what the state is. p: the previous (at the beginning of stage t) belief of what the state is. x t : the previous probability distribution over defender s action set. i: the previous action defender took. Defender fully monitors attacker s learning Game value exists. Recursive formula: v λ (p 0 ) = max x t (I) S = min y t (J) min (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) y t (J) max (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) x t (I) S The defender s optimal strategy depends on the attacker s belief only. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 8 / 16

13 The KEY of the discounted asymmetric game Smart attacker learns from history actions ( p p + s x s ) t (i) (p, x t, i) = x p,xt (i) s S (1) p + : the current (at the beginning of stage t + 1) belief of what the state is. p: the previous (at the beginning of stage t) belief of what the state is. x t : the previous probability distribution over defender s action set. i: the previous action defender took. Defender fully monitors attacker s learning Game value exists. Recursive formula: v λ (p 0 ) = max x t (I) S = min y t (J) min (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) y t (J) max (λg(p 0, x t, y t ) + (1 λ)t p0,x t (v λ )) x t (I) S The defender s optimal strategy depends on the attacker s belief only. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 8 / 16

14 Value Iteration: A Learning Process Value iteration v λ n+1 (p) = max x n (I) S min (λg(p, x n, y n ) + (1 λ)t p,xn (v λ n )). y n (J) Contraction functional M: M(f ) M( f ) sup a f f sup for some a [0, 1). Q v x(p) = min y (J) {λ s S ps (x s ) T G s y + (1 λ)t p,x (v)} is a contraction functional with contraction constant 1 λ. H v (p) = max x (I) S Q v x(p) is a contraction functional with contraction constant 1 λ. The approximated value function v λ n converges to v λ exponentially with rate 1 λ, i.e. v λ v λ n sup (1 λ) v λ v λ n sup. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 9 / 16

15 Value Iteration: A Learning Process Value iteration v λ n+1 (p) = max x n (I) S min (λg(p, x n, y n ) + (1 λ)t p,xn (v λ n )). y n (J) Contraction functional M: M(f ) M( f ) sup a f f sup for some a [0, 1). Q v x(p) = min y (J) {λ s S ps (x s ) T G s y + (1 λ)t p,x (v)} is a contraction functional with contraction constant 1 λ. H v (p) = max x (I) S Q v x(p) is a contraction functional with contraction constant 1 λ. The approximated value function v λ n converges to v λ exponentially with rate 1 λ, i.e. v λ v λ n sup (1 λ) v λ v λ n sup. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 9 / 16

16 Suboptimal Policy σ λ n and Its Error Bound Suboptimal policy based on v λ n : σ λ n = arg max σ Σ The worst case payoff: min (λg(p, σ(p), y) + (1 λ)t p,σ(p)(v λ n )) y (J) J σ λ n (p) = min γ λ (p, σ λ n, τ). τ The game value J σ λ n induced by the sub-optimal policy σ λ n satisfies v λ J σ 2(1 λ) λ n sup v λ v λ λ n sup 2(1 λ)n+1 = v λ sup. λ Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 10 / 16

17 Linear Programming Formulation of v λ n (p) Compute v λ 1 (p) = max x (I) S min λ p s x st G s y. y (J) s S s.t. max λl x,l p s G st x s l1 s S 1 T x s =1, s S x s 0, s S z s =p s x s == s.t. max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 11 / 16

18 Linear Programming Formulation of v λ n (p) Compute v λ 1 (p) = max x (I) S min λ p s x st G s y. y (J) s S s.t. max λl x,l p s G st x s l1 s S 1 T x s =1, s S x s 0, s S z s =p s x s == s.t. max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 11 / 16

19 Linear Programming Formulation of v λ n (p) max z t h t,l t ht Compute v λ n n t=1 h t H t λ(1 λ) t 1 l t ht s.t. for all t = 1, 2,..., n, and all h t H t, (G s ) T z s t h t l t ht 1 s S 1 T z s t h t =z s t 1 h t 1 (i t 1 ), s S z s t h t 0, s S, Computational complexity: O( S J I n ). s.t. Compute v λ 1 max λl z,l G st z s l1 s S 1 T z s =p s, s S z s 0, Computational complexity of extensive form: O( S J n I n ). s S Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 12 / 16

20 Application in Network Interdiction Problems channel 1:10 channel 2 :1 Channel 1 has high capacity? observe attack 1 attack 2 use use Channel 2 has high capacity? observe attack 1 attack 2 use use p 0 = [0.5; 0.5]. λ = 0.1. Run a 100 stage simulation for 10 times At each stage, we update the current belief p, compute v λ 6 (p), σ λ 6 (p) and the worst case payoff, and choose an action according to σ λ 6 (p). The worst case payoff ranges from 3.61 to 3.97 with the average at With only one channel, the defender has a reward of 1. With a channel of capacity 1, the defender gains 2.77 more reward. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 13 / 16

21 Markovian Games (on the go) The transition matrix only depends on the defender. Game value exists. Value iteration converges exponentially to the game value. LP formulation to compute the value iteration and corresponding strategy. The transition matrix depends on both the defender and the attacker. Game value may not exist. 1 1 D. Rosenberg, E. Solan, N. Vieille Stochastic games with a single controller and incomplete information. SIAM J. Control Optim Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 14 / 16

22 Project Progress Last year: LP formulation of repeated asymmetric games with finite horizon. This year: LP formulation of Markovian asymmetric games with finite horizon. This year: Approximate policies for discounted repeated asymmetric games with infinite horizon. On the go: Approximate policies for discounted Markovian asymmetric games with infinite horizon. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 15 / 16

23 Thanks. Jeff Shamma and Lichun Li Georgia Institution of Technology Reduced Complexity (cybaware) Approaches to Asymmetric Information Games 16 / 16

An introduction on game theory for wireless networking [1]

An introduction on game theory for wireless networking [1] Ning Zhang 14 May, 2012 [1] Game Theory in Wireless Networks: A Tutorial 1 Roadmap 1 Introduction 2 Static games 3 Extensive-form games 4 Summary