CS 7180: Behavioral Modeling and Decision- making in AI

CS 7180: Behavioral Modeling and Decision- making in AI Algorithmic Game Theory Prof. Amy Sliva November 30, 2012

Prisoner s dilemma Two criminals are arrested, and each offered the same deal: If you defect and the other doesn t, you go free and they go to jail If both cooperate with each other, then both do a shorter stint in jail If both defect, then they both get a medium jail sentence Cooperate Defect Cooperate - 1, - 1-4, 0 Defect 0,- 4-3, - 3 Should you cooperate? Or defect?

Games in normal form A (Ninite, n- person) normal- form game includes the following: 1. An ordered set N = (1, 2, 3,..., n) of agents or players: 2. Each agent i has a Ninite set A i of possible actions An action pro3ile is an n- tuple a = (a 1, a 2,..., a n ), s.t. a 1 A 1, a 2 A 2,..., a n An The set of all possible action proniles is A = A 1 A n 3. Each agent i has a real- valued utility (or payoff) function u i (a 1,..., a n ) = i s payoff if the action pronile is (a 1,..., a n ) Most other game representations can be reduced to normal form Usually represented by an n- dimensional payoff (or utility) matrix For each action pronile, shows the utilities of all the agents Cooperate Defect Cooperate Defect - 1, - 1-4, 0 0,- 4-3, - 3

General form of prisoners dilemma General form of prisoners dilemma requires certain preference ordering relation Cooperate Defect Cooperate a, a b, c Defect c, b d, d where c > a > d > b and 2a > b + c

Common- payoff games Common- payoff game for all action proniles a A 1 A n and any agents i and j, u i (a) = u j (a) Agents have same payoff for every action pronile Pure coordination or team games no connlicting interests Example: Drivers heading directly towards each other in country with no trafnic laws. Which way to swerve to avoid the other car? If they make the same coordinated decision, then all is well! Otherwise Left Left Right 1, 1 0, 0 Right 0, 0 1, 1

Zero- sum games Purely competitive games! Constant- sum game for every action pronile, the sum of the payoffs is the same, i.e., there is a constant c such for every action pronile a = (a 1,..., a n ), u 1 (a) +... + u n (a) = c Any constant- sum game can be transformed into an equivalent game in which the sum of the payoffs is always 0 Thus constant- sum games usually called zero- sum games

Matching pennies game Each player has a penny Decides to show either heads or tails If they are the same, player 1 gets both pennies, otherwise player 2 takes them both Heads Tails Heads 1, - 1-1, 1 Tails - 1, 1 1, - 1

Rochambo (rock, paper, scissors) If both players choose the same action, there is no payoff Otherwise, one action will win and others lose Rock Paper Scissors Rock 0, 0-1, 1 1, - 1 Paper 1, - 1 0, 0-1, 1 Scissors - 1, 1 1, - 1 0, 0

Or any Rochambo variant Rock, paper, scissors, lizard, Spock Scissors cuts paper Paper covers rock Rock crushes lizard Lizard poisons Spock Spock smashes scissors Scissors decapitates lizard Lizard eats paper Paper disproves Spock Spock vaporizes rock Rock crushes scissors

Most games are non- constant sum In general, games involve cooperation and competition Prisoner s dilemma is one example Battle of the sexes husband and wife deciding on a movie. He wants to see an action move (AM) and she wants to see a romantic comedy (RC). They both prefer to go together! AM RC AM 2, 1 0, 0 RC 0, 0 1, 2

Strategies in normal form games Pure strategy select a single action and play it Each row/column of a payoff matrix represents a pure strategy Mixed strategy randomize over the set of available actions according to some probability distribution s i (a j ) = probability that action a j will be played in mixed strategy s i The support of s i = {actions that have probability > 0 in s i } Pure strategy is special case where support is a single action A strategy s i is fully mixed if its support is A i i.e., nonzero probability for every action available to agent i Strategy pro3ile n- tuple s = (s 1,..., s n ) of strategies for each agent

Expected uqlity in a mixed strategy game Payoff matrix only gives payoffs for pure- strategy proniles Generalization to mixed strategies uses expected utility Calculate probability of each outcome, given the strategy pronile (involves all agents) Then calculate average payoff for agent i, weighted by probabilities Given strategy pronile s = (s 1,..., s n ) expected utility is sum, over all action proniles, of the utility times its probability: i.e., u i (s) = Σ a A u i (a)p[a s] u i (s 1,..., s n ) = Σ (a1,..., an) A u i (a 1,..., a n ) Π n j=1 s j (a j )

How to reason about games In single- agent decision theory, look at an optimal strategy Maximize the agent s expected payoff in its environment With multiple agents, strategy depends on others choices Deal with this by identifying certain subsets of outcomes called solution concepts Today we will discuss two concepts: Pareto optimality Nash equilibrium

Pareto opqmality A strategy pronile s Pareto dominates a strategy pronile sʹ if No agent gets a worse payoff with s than with sʹ, i.e., u i (s) u i (sʹ) for all i At least one agent gets a better payoff with s than with sʹ, i.e., u i (s) > u i (sʹ) for at least one i A strategy pronile s is Pareto optimal (or Pareto ef3icient) if there is no strategy pronile s' that Pareto dominates s Every game has at least one Pareto optimal pronile Always at least one Pareto optimal pronile with pure strategies

Examples Prisoner s Dilemma (D,C) is Pareto optimal: no pronile gives player 1 a higher payoff (D,C) is Pareto optimal: no pronile gives player 2 a higher payoff Cooperate Defect Cooperate - 1, - 1-4, 0 0,- 4 (C,C) is Pareto optimal: no pronile gives both players a higher payoff (D,D) isn t Pareto optimal: (C,C) Pareto dominates it Defect - 3, - 3 Which side of the road (Left,Left) and (Right,Right) are Pareto optimal In common- payoff games, all Pareto optimal strategy proniles have the same payoffs If (Left,Left) had payoffs (2,2), then (Right,Right) wouldn t be Pareto optimal Left Right Left Right 1, 1 0, 0 0, 0 1, 1

Best response to other agents behavior Suppose agent i knows how the others are going to play Then i has an optimization problem maximize expected utility Use s i to mean a strategy pronile for all of the agents except i s i = (s 1,..., s i 1, s i+1,..., s n ) Let s i be any strategy for agent I (s i, s i ) = (s 1,..., s i 1, s i, s i+1,..., s n ) s i is a best response to s i if for every strategy s iʹ available to agent i, u i (s i,s i ) u i (s iʹ, s i ) There is always at least one best response A best response s i is unique if u i (s i,s i ) > u i (s iʹ, s i ) for every s iʹ s i

Nash equilibrium In general an agent will not know for sure what strategies other agents will adopt On its own, best response is not a solution concept because it does not identify an interesting set of outcomes in general case Leverage best response to denine another solution concept s = (s 1,..., s n ) is a Nash equilibrium if for every i, s i is a best response to s I Every agent s strategy is a best response to the other agents strategies No agent can do better by unilaterally changing his/her strategy Theorem (Nash, 1951): Every game with a Ninite number of agents and action proniles has at least one Nash equilibrium

Examples of Nash equilibria Which side of the road (Left,Left) and (Right,Right) are Nash equilibria Left Right Left Right 1, 1 0, 0 0, 0 1, 1 Prisoner s Dilemma (D,D) is a Nash equilibrium Ironically, it s the only pure- strategy pronile that isn t Pareto optimal Cooperate Cooperate Defect - 1, - 1-4, 0 Defect 0,- 4-3, - 3

Strict Nash Equilibrium A Nash equilibrium s = (s 1,..., s n ) is strict if for every i, s i is the only best response to s I i.e., any agent who unilaterally changes strategy will do worse! Recall that if a best response is unique, it must be pure It follows that in a strict Nash equilibrium, all strategies are pure! But if a Nash equilibrium is pure, it isn t necessarily strict

Weak Nash Equilibrium If a Nash equilibrium s is not strict, then it is weak At least one agent i has more than one best response to s i If a Nash equilibrium includes a mixed strategy, it is weak If a mixture of k > 2 actions is a best response to s i, then any other mixture of the actions is also a best response If a Nash equilibrium consists only of pure strategies, it might still be weak Weak Nash equilibria are less stable than strict Nash equilibria If a Nash equilibrium is weak, then at least one agent has inninitely many best responses, and only one of them is in s

Finding mixed strategy Nash equilibria In general, it s tricky to compute mixed- strategy Nash equilibria But easier if we can identify the support of the equilibrium strategies In 2x2 games, we can do this easily! Let (s 1, s 2 ) be a Nash equilibrium If s 1 is not pure, its support must include both of agent 1 s actions, and they must have the same expected utility Otherwise agent 1 s best response would be just one of them Find s 2 such that u 1 (a 1, s 2 ) = u 1 (a' 1, s 2 ), where a 1, a' 1 are 1 s actions Two equations, two unknowns Similarly, Nind s 1 such that u 2 (s 1, a 2 ) = u 2 (s 1, a' 2 )

Finding a mixed strategy in baxle of the sexes If there s a mixed- strategy equilibrium Both strategies must be mixtures of {ActionMovie, RomanticComedy} Each must be a best response to the other RC RC AM 2, 1 0, 0 Suppose the husband s strategy is s h = {(p, RC), (1 p, AM) AM Expected utilities of the wife s actions: u w (RC, s h ) = 2p; u w (AM, s h ) = 1(1 p) 0, 0 1, 2 If the wife mixes the two actions, they must have the same expected utility Otherwise the best response would be to always use the action whose expected utility is higher Thus 2p = 1 p, so p = 1/3 So the husband s mixed strategy is s h = {(1/3, RC), (2/3, AM)}

Finding a mixed strategy in baxle of the sexes Similarly, we can show the wife s mixed strategy is s w = {(2/3, RC), (1/3, AM)} So the mixed- strategy Nash equilibrium is (s w, s h ), where s w = {(2/3, RC), (1/3, AM)} s h = {(1/3, AM), (2/3, RC)} RC AM RC AM 2, 1 0, 0 0, 0 1, 2 Like all mixed- strategy Nash equilibria, (sw, sh) is weak Both players have inninitely many other best- response strategies What are they? How do we know that (s w, s h ) really is a Nash equilibrium?

Fairness of a Nash equilibrium s w = {(2/3, RC), (1/3, AM)} s h = {(1/3, RC), (2/3, AM)} Wife s expected utility is 2(2/9) + 1(2/9) + 0(5/9) = 2/3 AM Husband s expected utility is also 2/3 2, 1 0, 0 0, 0 1, 2 This is fair in the sense that both players have the same expected payoff But it s Pareto- dominated by both of the pure- strategy equilibria In each of them, one agent gets 1 and the other gets 2 RC RC 1/3 * 1/3 = 1/9 2/3 * 1/3 = 2/9 2/3 * 2/3 = 4/9 AM 1/3 * 2/3 = 2/9

Nash equilibrium in matching pennies Easy to see that in this game, no pure strategy could be part of a Nash equilibrium For each combination of pure strategies, one of the agents can do better by changing his/her strategy Thus there is not a strict Nash equilibrium Heads Tails Heads Tails 1, - 1-1, 1-1, 1 1, - 1 But again there is a mixed- strategy equilibrium Can be derived the same way as in the Battle of the Sexes Result is (s,s), where s = {(1 2, Heads), (1 2, Tails)}

What does a mixed strategy really mean? Are people actually computing probability distributions in their heads? Some researchers say yes! Another interpretation is that agent i s mixed strategy is everyone else s assessment of how likely i is to play each pure strategy Suppose agent i has a deterministic method for picking a strategy, but it depends on factors that are notpart of the game itself If i plays a game several times, i may pick different strategies If the other players do not know how i picks a strategy, they will be uncertain what i s strategy will be Example: In a series of soccer penalty kicks, the kicker could kick left or right in a deterministic pattern that the goalie thinks is random

Games, strategies, and computaqons Games are strategic interactions between rational entities Solution concepts what is going to happen? Pareto optimality Nash equilibrium. But the big question is can it be computed? Can we predict what will happen in a large system (i.e., many players and actions)? Can we do this ef3iciently in a large system? If your computer cannot Nind a solution, the market/agents probably cannot either

Controlling global polluqon Pollution can be modeled as a many- player version of the prisoner s dilemma n countries each has two actions Pass legislation to control pollution: - 3 Continue polluting: - 1 for ALL countries Costs more to control pollution than to not! If k countries are polluters, they pay a total cost of k Those with pollution laws pay 3 + k since have to pay for own control Stable solution no countries control pollution and all pay cost of n (i.e., all defect) If all had controlled, then each would only have to pay 3

Tragedy of the commons Sharing a common resource All agents play their selnish equilibrium strategy based on others best responses leads to resource depletion Network bandwidth is a tradeoff n users sharing communication channel with max capacity 1 Each user i needs a strategy to send x i units of Nlow s.t. x i [0,1] Each user wants to use large fraction of channel, but quality deteriorates with total bandwidth used If total bandwidth Σ j x j > 1 then no one gets any benenit If Σ j x j < 1, then payoff for i is x i (1 - Σ j x j ) What is the equilibrium strategy for each user?

Tragedy of the commons Network bandwidth is a tradeoff n users sharing communication channel with max capacity 1 Each user i needs a strategy to send x i units of Nlow s.t. x i [0,1] Each user wants to use large fraction of channel, but quality deteriorates with total bandwidth used If total bandwidth Σ j x j > 1 then no one gets any benenit If Σ j x j < 1, then payoff for i is x i (1 - Σ j x j ) What is the equilibrium strategy for each user? t = Σ j i x j < 1 is bandwidth used by all other users User i needs x i (1 - t - x i ), or x i = (1 - t)/2 = 1/(n +1) 2 for all i Why is this solution a tragedy?

InterpretaQons of mixed strategies Agent i s mixed strategy is everyone else s assessment of how likely i is to play each pure strategy Suppose agent i has a deterministic method for picking a strategy, but it depends on factors that are notpart of the game itself If i plays a game several times, i may pick different strategies If the other players do not know how i picks a strategy, they will be uncertain what i s strategy will be Proportion strategy occurs in large populations In battle of the sexes, proportion of strategies over large samples of men and women Fraction of polluters and non- polluters

Nash equilibrium s = (s 1,..., s n ) is a Nash equilibrium if for every i, s i is a best response to s I Every agent s strategy is a best response to the other agents strategies No agent can do better by unilaterally changing his/her strategy Theorem (Nash, 1951): Every game with a Ninite number of agents and action proniles has at least one Nash equilibrium How can we Nind a mixed strategy equilibrium ef3iciently? Predict behavior of players/agents in a system!

Linear programming algorithm for NE Consider 2- player, zero sum game with payoff matrix A Only represent row (Player 1) payoff Two probability distributions p* and q* over actions representing Nash equilibrium strategies for player 1 and 2 Expected utility for player 2 is u 2 = p*aq* H T H T - 1 1 1-1 Player 1 (row) can choose strategy p with possible payoffs pa for each A A Once p is known, Player 2 will want to minimize loss (i.e., Player 1 s gain) so will play strategy q corresponding to minimum pa Player 1 wants to Nind the strategy p that maximizes the minimums available to Player 2 Just like Minimax search! Solve with constraint satisfaction

Linear programming algorithm for NE Finding a strategy Player 1 can choose strategy p with possible payoffs pa for each A A Once p is known, Player 2 will want to minimize loss (i.e., Player 1 s gain) so will play strategy q corresponding to minimum pa Player 1 wants to Nind the strategy p that maximizes the minimums available to Player 2 Solving for Nash with linear constraints Player 1 Player 2 u 1 = max u u 2 = min u p 0 q 0 Σ i p i = 1 Σ j q j = 1 (pa) j u for all j (Aq) i u for all i

Linear programming algorithm for NE Finding a strategy Player 1 can choose strategy p with possible payoffs pa for each A A Once p is known, Player 2 will want to minimize loss (i.e., Player 1 s gain) so will play strategy q corresponding to minimum pa Player 1 wants to Nind the strategy p that maximizes the minimums available to Player 2 Solving for Nash with linear constraints Player 1 jth entry Player 2 u 1 = max u of vector u 2 = min u p 0 pa q 0 (Player 2 Σ i p i = 1 action) Σ j q j = 1 (pa) j u for all j (Aq) i u for all i ith entry of vector Aq (Player 1 action)

Linear programming algorithm for NE Solving for Nash with linear constraints Player 1 Player 2 u 1 = max u u 2 = min u p 0 q 0 Σ i p i = 1 Σ j q j = 1 (pa) j u for all j (Aq) i u for all i Guessing supports 1. Guess (or sample, or iterate) over possible supports of each player 2. Solve LPs for Player 1 and Player 2 3. p and q will be a Nash equilibrium Lemke- Howson simplex type approach Works well in practice Exponential in the worst case

ComputaQonal challenges of NE Finding a good Nash equilibrium Checking for NE with total payoff > T: NP- hard Maximizing individual player s payoff in a NE: NP- hard Deciding whether a particular strategy is played in a NE: NP- hard Checking if a NE is unique: NP- hard Approximate NE ε- Nash equilibrium a strategy pronile such that no one can gain > ε by deviating Current best: ε = 0.339

AdaptaQon in large games Many large systems are iterative games Best response strategy will often converge to Nash Equilibrium in Ninite number of steps Prisoner s dilemma, battle of the sexes More complex games best response does not work as equilibrium Tragedy of the commons does not converge in Ninite steps Matching pennies will cycle through strategy vectors as each player keeps making best move Best response does not guarantee convergence Instead develop improving response learning strategies React to frequency of opponent moves played so far in game history Iterated Prisoner s Dilemma competition!

Uncertainty increases complexity Partial information games make decisions without full knowledge of the state and other player Do not know preferences or strategies of other players (i.e., what cards are they holding?) Have probability distribution over possible strategies Bayesian games N players s.t. for each player i we have An action set A i A type set T i describing what type an agent is how acts in a certain state Probability distribution p i over types s.t. p i (t - i t i ) is belief in other agents types given type of player i Payoff function u i (A,T) utility given action and type proniles of all players Use Bayes rule to update beliefs in types and what action is best

How do agents do all of this computaqon? Even when considering algorithmic features of Ninding equilibria, these computation costs do not change the payoff matrix in the game However, someone (the agent, your laptop, etc.) has to actually execute this analysis! Real- world agents often balance complexity and payoff Military strategy, market behavior, gambling, etc. Accurate models require incorporating this computation cost into the utility calculation in a game!

Bayesian machine games Add additional components to a Bayesian game N players s.t. for each player i we have An action set A i A type set T i describing what type an agent is how acts in a certain state Probability distribution p i over types s.t. p i (t - i t i ) is belief in other agents types given type of player I Turing machine M i from set of possible machines s.t. M i (t i ) = a computes agent i s action Complexity function C i that gives complexity of using machine M i and type t i to compute an action Payoff function u i (A,T,M,C) utility given action and type proniles of all players Utility depends on complexity of all players i gains utility by computing his strategy faster than j

Rochambo including computaqon complexity Type space has 1 type (no belief states) Nash equilibrium: mixed strategy with 1/3 probability for each action Rock Paper Rock Paper Scissors 0, 0-1, 1 1, - 1 1, - 1 0, 0-1, 1 What if we include computation time (i.e., formulate this as a machine game)? Scissors - 1, 1 1, - 1 0, 0

Rochambo including computaqon complexity Complexity costs Pure strategy = 0 Mixed strategy = 1 Utility for each player u i = payoff - complexity Rock Paper Rock Paper Scissors 0, 0-1, 1 1, - 1 1, - 1 0, 0-1, 1 More complicated to Nigure out probabilities for randomization then to just pick a pure strategy Scissors - 1, 1 1, - 1 0, 0 Now no Nash equilibrium!

RandomizaQon is more expensive Choosing a pure strategy is computationally cheap, but Ninding the right mixed strategy is expensive Empirical results often do not match theoretical equilibria Winners in Rochambo World Championships do not use truly random strategies Players in iterated prisoner s dilemma do not always defect In IPD, best strategy is to cooperate for t- 1 moves and defect the last move at time t Requires computation! (i.e., how many moves until end? What if end is unknown?) If that computation is more expensive than gain from defecting, might as well just keep cooperating and use a pure strategy

Algorithmic game theory Game theory is normative in its basic formulation To make it more descriptive or prescriptive need to account for algorithmic factors Real world systems are very large (many agents and actions) Computation is not free! both for Ninding a strategy or for analyzing other agents decisions Explains empirical results that diverge from theoretical equilibria Think algorithmically when talking about solution concepts How do we actually compute a Nash equilibrium? EfNiciently? What are the main algorithmic challenges? What internal algorithms/processes are other agents using to come up with a decision can I do it faster?