Topics in Computational Sustainability CS 325 Spring 2016

Size: px

Start display at page:

Download "Topics in Computational Sustainability CS 325 Spring 2016"

Bennett Little
5 years ago
Views:

Topics in Computational Sustainability CS 325 Spring 2016 Note to other teachers and users of these

Feel free to use these slides verbatim, or to modify them to fit your own needs.

If you make use of a significant portion of these slides in your own lecture, please include this

1 Topics in Computational Sustainability CS 325 Spring 2016 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew s tutorials: Comments and corrections gratefully received. Making Choices: Sequential Decision Making

2 Stochastic programming? decision Probabilistic model wet a b dry c {a(pa),b(pb),c(pc)} maximizes expected utility minimizes expected cost

3 Problem Setup Conserved parcels "! "#$%# Available parcels Current territories Potential territories Given limited budget, what parcels should I conserve to maximize the expected number of occupied territories in 50 years?

4 Metapopulation = Cascade i j k l m i j k l m i j k l m i j k l m i j k l m Metapopulation model can be viewed as a cascade in the layered graph representing territories over time Target nodes: territories at final time step Patches

5 Management Actions Conserving parcels adds nodes to the network to create new pathways for the cascade Initial network Parcel 1 Parcel 2

6 Management Actions Conserving parcels adds nodes to the network to create new pathways for the cascade Initial network Parcel 1 Parcel 2

7 Management Actions Conserving parcels adds nodes to the network to create new pathways for the cascade Initial network Parcel 1 Parcel 2

8 Cascade Optimization Problem Given: Patch network Initially occupied territories Colonization and extinction probabilities Management actions Already-conserved parcels List of available parcels and their costs Time horizon T Budget B Find set of parcels with total cost at most B that maximizes the expected number of occupied territories at time T. Can we make our decision adaptively?

9 Sequential decision making We have a systems that changes state over time Can (partially) control the system state transitions by taking actions Problem gives an objective that specifies which states (or state sequences) are more/less preferred Problem: At each time step select an action to optimize the overall (long-term) objective Produce most preferred sequences of states 9

10 Discounted Rewards/Costs An assistant professor gets paid, say, 20K per year. How much, in total, will the A.P. earn in their life? = Infinity $ $ What s wrong with this argument?

11 Discounted Rewards A reward (payment) in the future is not worth quite as much as a reward now. Because of chance of obliteration Because of inflation Example: Being promised $10,000 next year is worth only 90% as much as receiving $10,000 right now. Assuming payment n years in future is worth only (0.9) n of payment now, what is the AP s Future Discounted Sum of Rewards?

12 Infinite Sum Assuming a discount rate of 0.9, how much does the assistant professor get in total? = ( ) x = x x = 20/(.1) = 200

13 Discount Factors People in economics and probabilistic decisionmaking do this all the time. The Discounted sum of future rewards using discount factor g is (reward now) + g (reward in 1 time step) + g 2 (reward in 2 time steps) + g 3 (reward in 3 time steps) + : : (infinite sum)

14 Markov System: the Academic Life 0.6 A. Assistant Prof B. Assoc. Prof T. Tenured Prof Define: S. On the Street D. Dead 0 J A = Expected discounted future rewards starting in state A J B = Expected discounted future rewards starting in state B J T = T J S = S J D = D How do we compute J A, J B, J T, J S, J D? 0.3

15 Working Backwards 0.6 A. Assistant Prof.: B. Associate Prof.: S. Out on the 0 Street: 10 D. Dead: T. Tenured Prof.: Discount factor 0.9

16 Reincarnation? 0.6 A. Assistant Prof.: B. Associate Prof.: 60 S. Out on the Street: 10 D. Dead: T. Tenured Prof.: 100 Discount factor

17 System of Equations L(A) = (.6 L(A) +.2 L(B) +.2 L(S)) L(B) = (.6 L(B) +.2 L(S) +.2 L(T)) L(S) = (.7 L(S) +.3 L(D)) L(T) = (.7 L(T) +.3 L(D)) L(D) = (.5 L(D) +.5 L(A))

18 Solving a Markov System with Matrix Inversion Upside: You get an exact answer Downside: If you have 100,000 states you re solving a 100,000 by 100,000 system of equations.

19 Value Iteration: another way to solve a Markov System Define J 1 (S i ) = Expected discounted sum of rewards over the next 1 time step. J 2 (S i ) = Expected discounted sum rewards during next 2 steps J 3 (S i ) = Expected discounted sum rewards during next 3 steps : J k (S i ) = Expected discounted sum rewards during next k steps J 1 (S i ) = (what?) J 2 (S i ) = J k+1 (S i ) = (what?) (what?)

20 Value Iteration: another way to solve a Markov System Define J 1 (S i ) = Expected discounted sum of rewards over the next 1 time step. J 2 (S i ) = Expected discounted sum rewards during next 2 steps J 3 (S i ) = Expected discounted sum rewards during next 3 steps : J k (S i ) = Expected discounted sum rewards during next k steps N = Number of states J 1 (S i ) = r i (what?) J 2 (S i ) = : J k+1 (S i ) = (what?) (what?) r i r i g g N j 1 N j 1 p ij p ij J J 1 ( s k j ( s ) j )

21 Let s do Value Iteration SUN +4 WIND 0 HAIL.::.:.:: -8 g = 0.5 k J k (SUN) J k (WIND) J k (HAIL)

22 Let s do Value Iteration SUN +4 WIND 0 HAIL.::.:.:: -8 g = 0.5 k J k (SUN) J k (WIND) J k (HAIL)

23 Value Iteration for solving Markov Systems Compute J 1 (S i ) for each i Compute J 2 (S i ) for each i : Compute J k (S i ) for each i As k J k (S i ) J*(S i ) When to stop? When Max J k+1 (S i ) J k (S i ) i < ξ This is faster than matrix inversion (N 3 style) if the transition matrix is sparse What if we have a way to interact with the Markov system?

24 A Markov Decision Process You run a startup company. In every state you must choose between Saving money (S) or Advertising (A). 1 S Poor & Unknown +0 Rich & Unknown +10 A S Poor & Famous +0 S Rich & Famous A g = 0.9 1

25 Markov Decision Processes An MDP has A set of states {s 1 S N } A set of actions {a 1 a M } A set of rewards {r 1 r N } (one for each state) A transition probability function P k ij Prob Next k P ij j This i and I useaction On each step: 0. Call current state S i 1. Receive reward r i 2. Choose action {a 1 a M } 3. If you choose action a k you ll move to state S j with probability 4. All future rewards are discounted by g k What s a solution to an MDP? A sequence of actions?

26 Policy Number 2: Policy Number 1: A Policy A policy is a mapping from states to actions. 1 Examples S STATE ACTION PU PF RU RF S A S A PU 0 RU +10 PF 0 RF A 1 STATE ACTION PU PF RU RF A A A A PU 0 RU 10 How many possible policies in our example? Which of the above two policies is best? How do you compute the optimal policy? A A PF 0 RF 10 A A 1 1

27 Interesting Fact For every M.D.P. there exists an optimal policy. It s a policy such that for every possible start state there is no better option than to follow the policy.

28 Computing the Optimal Policy Idea One: Run through all possible policies. Select the best. What s the problem??

29 Optimal Value Function Define J*(S i ) = Expected Discounted Future Rewards, starting from state S i, assuming we use the optimal policy S /3 1 1/3 1/3 1 S 2 +3 S 3 +2 B 0 Question: What is an optimal policy for that MDP? (assume g = 0.9) What is J*(S 1 )? What is J*(S 2 )? What is J*(S 3 )? 1

30 Define Computing the Optimal Value Function with Value Iteration J k (S i ) = Maximum possible expected sum of discounted rewards I can get if I start at state S i and I live for k time steps. Note that J 1 (S i ) = r i

31 Let s compute J k (S i ) for our example k J k (PU) J k (PF) J k (RU) J k (RF)

k J k (PU) J k (PF) J k (RU) J k (RF) 1 0 0

32 k J k (PU) J k (PF) J k (RU) J k (RF)

33 Bellman s Equation N n 1 a n J S max i ri g Pij J S j a j 1 Value Iteration for solving MDPs Compute J 1 (S i ) for all i Compute J 2 (S i ) for all i : Compute J n (S i ) for all i..until converged convergedwhen max J i Also known as Dynamic Programming n 1 n S J S i i

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Outline Sequential Decision Processes Markov chains Highlight Markov property Discounted rewards Value iteration Markov