CEC login. Student Details Name SOLUTIONS

Similar documents
Q1. [?? pts] Search Traces

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

The exam is closed book, closed calculator, and closed notes except your three crib sheets.

CS 188 Fall Introduction to Artificial Intelligence Midterm 1. ˆ You have approximately 2 hours and 50 minutes.

To earn the extra credit, one of the following has to hold true. Please circle and sign.

CS360 Homework 14 Solution

Midterm I. Introduction to Artificial Intelligence. CS 188 Fall You have approximately 3 hours.

Non-Deterministic Search

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

CSE 473: Artificial Intelligence

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Introduction to Fall 2011 Artificial Intelligence Midterm Exam

CS 188: Artificial Intelligence

CS188 Spring 2012 Section 4: Games

Markov Decision Processes

Introduction to Artificial Intelligence Midterm 1. CS 188 Spring You have approximately 2 hours.

Reinforcement Learning

CS 188: Artificial Intelligence. Outline

Algorithms and Networking for Computer Games

CS221 / Spring 2018 / Sadigh. Lecture 9: Games I

Decision making in the presence of uncertainty

Introduction to Fall 2011 Artificial Intelligence Midterm Exam

Lecture 9: Games I. Course plan. A simple game. Roadmap. Machine learning. Example: game 1

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence

91.420/543: Artificial Intelligence UMass Lowell CS Fall 2010

CS 188: Artificial Intelligence Spring Announcements

Introduction to Fall 2007 Artificial Intelligence Final Exam

CSEP 573: Artificial Intelligence

Logistics. CS 473: Artificial Intelligence. Markov Decision Processes. PS 2 due today Midterm in one week

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

COMP417 Introduction to Robotics and Intelligent Systems. Reinforcement Learning - 2

Lecture 17: More on Markov Decision Processes. Reinforcement learning

343H: Honors AI. Lecture 7: Expectimax Search 2/6/2014. Kristen Grauman UT-Austin. Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted

17 MAKING COMPLEX DECISIONS

Markov Decision Process

CS 188 Fall Introduction to Artificial Intelligence Midterm 1

Introduction to Artificial Intelligence Spring 2019 Note 2

Issues. Senate (Total = 100) Senate Group 1 Y Y N N Y 32 Senate Group 2 Y Y D N D 16 Senate Group 3 N N Y Y Y 30 Senate Group 4 D Y N D Y 22

The exam is closed book, closed notes except a two-page crib sheet. Non-programmable calculators only.

Announcements. Today s Menu

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

Essays on Some Combinatorial Optimization Problems with Interval Data

POMDPs: Partially Observable Markov Decision Processes Advanced AI

Sublinear Time Algorithms Oct 19, Lecture 1

Algorithmic Game Theory and Applications. Lecture 11: Games of Perfect Information

Yao s Minimax Principle

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning)

Expectimax and other Games

Complex Decisions. Sequential Decision Making

CS 5522: Artificial Intelligence II

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Markov Decision Processes

Making Decisions. CS 3793 Artificial Intelligence Making Decisions 1

Monte-Carlo Planning Look Ahead Trees. Alan Fern

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Microeconomics of Banking: Lecture 5

CS 4100 // artificial intelligence

16 MAKING SIMPLE DECISIONS

Introduction to Decision Making. CS 486/686: Introduction to Artificial Intelligence

Extending MCTS

CS 343: Artificial Intelligence

Handout 4: Deterministic Systems and the Shortest Path Problem

Markov Decision Processes

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Quantities. Expectimax Pseudocode. Expectimax Pruning?

Reinforcement learning and Markov Decision Processes (MDPs) (B) Avrim Blum

Expectimax Search Trees. CS 188: Artificial Intelligence Fall Expectimax Example. Expectimax Pseudocode. Expectimax Pruning?

Chapter wise Question bank

CS 188: Artificial Intelligence Fall 2011

16 MAKING SIMPLE DECISIONS

Worst-Case vs. Average Case. CSE 473: Artificial Intelligence Expectimax, Uncertainty, Utilities. Expectimax Search. Worst-Case vs.

Uncertain Outcomes. CS 188: Artificial Intelligence Uncertainty and Utilities. Expectimax Search. Worst-Case vs. Average Case

Making Complex Decisions

Probabilities. CSE 473: Artificial Intelligence Uncertainty, Utilities. Reminder: Expectations. Reminder: Probabilities

Lecture 10: The knapsack problem

Lecture l(x) 1. (1) x X

CS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma

Prisoner s Dilemma. CS 331: Artificial Intelligence Game Theory I. Prisoner s Dilemma. Prisoner s Dilemma. Prisoner s Dilemma.

Subject : Computer Science. Paper: Machine Learning. Module: Decision Theory and Bayesian Decision Theory. Module No: CS/ML/10.

COSC160: Data Structures Binary Trees. Jeremy Bolton, PhD Assistant Teaching Professor

Markov Decision Processes. Lirong Xia

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

Lecture 2: Making Good Sequences of Decisions Given a Model of World. CS234: RL Emma Brunskill Winter 2018

Mechanism Design and Auctions

Monte-Carlo Planning Look Ahead Trees. Alan Fern

CSCE 750, Fall 2009 Quizzes with Answers

Lecture 7: Bayesian approach to MAB - Gittins index

CS 188: Artificial Intelligence Fall 2011

Finding Equilibria in Games of No Chance

CMPSCI 311: Introduction to Algorithms Second Midterm Practice Exam SOLUTIONS

2D5362 Machine Learning

6.231 DYNAMIC PROGRAMMING LECTURE 3 LECTURE OUTLINE

Deep RL and Controls Homework 1 Spring 2017

Max Registers, Counters and Monotone Circuits

MDPs and Value Iteration 2/20/17

1 The EOQ and Extensions

CS 6300 Artificial Intelligence Spring 2018

Reinforcement Learning and Simulation-Based Search

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Transcription:

Student Details Name SOLUTIONS CEC login Instructions You have roughly 1 minute per point, so schedule your time accordingly. There is only one correct answer per question. Good luck!

Question 1. Searching Circle the correct answer for each question (there is exactly one): a) [2] What is total number of nodes that iterative deepening visits? (if a node is visited multiple times, count it multiple times). Assume the tree has a branch factor of "b", depth "d", and the goal is at depth "g". a. O(bd) b. O(b d ) c. O(bg) d. O(b g+1 ) O(b g+1 ) === This answer could be O(b g ) but in class we derived it loosely as O(b g+1 ). By definition of O- notation, O(b g ) is O(b g+1 ) ( but not the other way around) b) [2] Which of the following statements are true about Breadth First graph search? Assume the tree has a branch factor of "b", depth "d", and the goal is at depth "g". a. Breadth First graph search is complete on problems with finite search graphs. b. Breadth First graph search uses O(bg) space. c. Both (a) and (b) are true. d. Both (a) and (b) are false. BFS searches the complete state space, so on finite graphs, using graph search to not re- explore nodes, it will explore all nodes and therefore find a solution c) [4] Consider the A* search. g is the cumulative path cost of a node t, h is a lower bound on the shortest path to a goal state, and n' is the parent of n. Assume all costs are positive. Note: Enqueuing == putting a node onto the fringe, dequeuing == removing then expanding a node from the fringe Which of the following search algorithms are guaranteed to be optimal? i. A*, but apply the goal test before enqueuing nodes rather than after dequeuing. ii. A*, but prioritize n by g(n) only. iii. A*, but prioritize n by h(n) only. iv. A*, but prioritize n by g(n) + h(n') v. A*, but prioritize n by g(n') + h(n) a. All but i. b. ii. and v. c. iv. and ii. d. iv. and iii.

e. Only iv. f. Only ii. ii is equivalent to A* with the heuristic h = 0 for all nodes. v is equivalent to A* with heuristic h (n) = h(n) (step(n, n)). So if h is admissible, so is h. iv. is wrong because it uses the true cost to node n + the heuristic from the parent of n. This could overestimate the total path length through node n, and therefore never expand a node on the optimal path. [IS THIS TRUE?] We apply a variety of queue- based graph- search algorithms to the state- graph on the right. Initially the fringe contains the start state A. When there are multiple options, nodes are enqueued (put onto the fringe) in alphabetical order. d) [2] In the BFS algorithm, we perform the goal test when we enqueue a node. How many nodes have been dequeued when we find the solution? a. 2 b. 3 c. 4 d. 5 - - - in this case, all nodes ABCDE will be dequeued. e. 6 e) [2] In the DFS algorithm, we perform the goal test when we enqueue a node. What is the sequence of dequeued nodes? a. A,B,E,G b. A,B,C,D,E c. A,B,E d. A,D,E - - - dequeue A, add BCD. Dequeue D, and ACE. Dequeue E, add BCDG. As we add G, we perform the goal test. So dequeued nodes are ADE. e. None of the above f) [2] In the UCS algorithm, we perform the goal test when we dequeue (expand) a node. How many nodes have been dequeued when we find the solution? (Do not count the dequeuing of the goal state itself) a. 3 b. 4 (explanation below).

c. 5 d. 6 e. None of the above Expand A, to have fringer: (B,1), (C,3), (D,7). Expand B to get fringe: (E,2), (C,3), (D,7) Expand E to get fringe: (C,3),(G,4), (D,7) Expand C to get fringe: (G,4), (D,6) Dequeue G and it passes goal test. A B C D E G H1 2 2 2 2 0 0 H2 3 2 2 2 1 0 H3 5 4 3 2 1 0 g) [2] The above table shows 3 heuristics, H1, H2, H3 and their values at each node. (For example, H1(A) = 2, H1(B) = 2, ) Which of these heuristics are admissible? (The graph is copied again below for your convenience) a. H1 and H2. H3 not admissible because H3(B) == 4 but distance is 3 from goal. b. H2 only c. H3 only d. All are admissible e. None are admissible. h) [2] Which of these heuristics are consistent? a. H1 and H2 b. H2 only - - - H1,H3 not consistent: H1(B) H1(E) = 2, H3(B)- H3(E) = 3 c. H3 only d. All are consistent e. None are consistent

Question 2. CSPs Circle the correct answer for each question (there is exactly one): a) [2] Which of the following statements are true about the runtime for CSPs? a. Tree- structured CSPs may be solved in time that is linear in the number of variables. b. Arc- consistency may be checked in time that is polynomial in the number of variables. c. Both (a) and (b) are true. i. Tree structured CSPs can be solved in time O(nd 2 ), arc- consistency is O(n 2 d 2 ), although in class we showed an algorithm that is O(n 2 d 3 ) d. Both (a) and (b) are false. b) [2] When solving a CSP by backtracking, which of the following are good heuristics? a. Pick the value that is least likely to fail. b. Pick the variable that is least constrained. c. Both (a) and (b) are good heuristics. d. Both (a) and (b) are bad heuristics. c) [2] Suppose you have a highly efficient solver for tree- structured CSPs. Given a CSP with the following binary constraints, for which re- formulations will you find a fast and correct solution? A D H C G B E F I a. Set the value of E, solve the remaining CSP, and try another value for E if no solution is found. b. Replace variables D and E with variable DE {(d,e) d D and e E}, then solve. c. Ignore variable either variable D or E, solve, then pick a consistent value. d. Both (a) and (b). i. These both create a tree- like structure that can be solved quickly. e. Both (a) and (c).

d) [2] Which of the following statements are true? a. Additional constraints always make CSPs easier to solve. b. CSP solvers incorporating randomness are always a bad idea. c. Both (a) and (b) are true. d. Both (a) and (b) are false. i. [WE NEVER DISCUSSED RANDOMNESS IN CLASS, SO EVERYONE GOT CREDIT FOR ALL ANSWERS TO THIS QUESTION] e) [2] Which of the following are true about CSPs a. If a CSP is arc- consistent, it can be solved without backtracking b. A CSP with only binary constraints can be solved in time polynomial in n (the number of variables) and d (the number of options per variable). c. Both (a) and (b) are true. d. Both (a) and (b) are false. i. (a) not true, could have every arc be consistent, but no possible solution. (b) not true, general CSP with binary constraints is NP- hard.

Question 3. Adversarial Search Circle the correct answer for each question (there is exactly one): a) [2] Which statement is true about reflex agents: a. Reflex agents can be learned with Q- Learning. b. You can design reflex agents that play optimally. c. Both a) and b) are true. i. Q- learning defines optimal moves in every state, and therefore defines table for reflex agent to follow. Q- learning is one way to define reflex agents that play optimally. d. Both a) and b) are false. b) [2] Which statement is true about multi- player games? a. Each multi- player game is also a search problem. b. Multi- player games are easier (in complexity) than general search problems. c. Both a) and b) are true. i. Easier because things like alpha- beta pruning let you explore less of the search tree. d. Both a) and b) are false. c) [2] When doing Alpha- Beta pruning on a game tree visited left to right, a. the right- most branch will always be pruned. b. the left- most branch will always be pruned. c. Both a) and b) are true. d. Both a) and b) are false. i. The left- most branch is NEVER pruned, but otherwise there are no guarantees. d) [2] When applying alpha- beta pruning to minimax game trees. a. Pruning nodes does not change the value of the root to the max player. b. Alpha- beta pruning can prune different numbers of nodes if the children of the root node are reordered. c. Both a) and b) are true. d. Both a) and b) are false.

e) [2] Normally, alpha- beta- pruning is not used with expectimax. Which one of the following conditions allows you to perform pruning with expectimax: a. All values are positive. b. Children of expectation nodes have values within a finite pre- specified range. i. The finite range allows lower and upper bounds to be computed and used as in alpha/beta pruning. c. All transition probabilities are within finite a pre- specified range. d. The probabilities sum- up to one, and you only ever prune the last child. f) [2] You have game that you are play sometimes against a talented opponent, and sometimes against a random opponent, so you have implemented both Minimax and Expectimax. You discover that your evaluation function had a bug, and instead of returning the actual value of a terminal state, it returns the square- root of the value of the terminal state. All terminal states have positive values. Which of the following statements is true: a. The resulting policy might be sub- optimal for Minimax. b. The resulting policy might be sub- optimal for Expectimax. i. While minimax just looks at the sorted order, which doesn t change if you take the square root of positive values, expectimax needs to compute the average, which is affected by the square root. c. Both a) and b) are true. d. Both a) and b) are false.

Consider the following game tree (which is evaluated left- to- right): max This is LEFT This is RIGHT min min max max max max 4 0 5 8 0 2 1 8 g) [2] what is the minimax value at the root node of this tree? 4 h) [4] How many leaf nodes would alpha- beta pruning prune? 3 i) [4] Suppose player #2 (formerly min) switches to a new strategy and picks the left action with probability (1/4) and right with probability (3/4). What is the maximum expected utility of player #1? 6 Left side. Max chooses options 4,8 so expected value is 6 Right side, Max chooses 2,8, so expected value is 5 Max chooses better of these to get 6

Question 4. MDP + RL Circle the correct answer for each question (there is exactly one): a) [1] A rational agent (who uses dollar amounts as utility) prefers to be given an envelope containing $X rather than one containing either $0 or $10 (with equal probability). What is the smallest $X such that the agent may be acting in accordance with the principle of maximum expected utility? a. There is no such minimum $X. b. $0. c. $5. - - - money is not always a good model for utility of choices for people, but this problem explicitly states that dollar amounts are the utility function. d. $10. b) [2] Which of the following are true about Markov Decision Processes: a. If the only difference between two MDPs is the value of the discount factor then they must have the same optimal policy. b. Rational policies can be learned before values converge. i. This is why we use policy iteration, because the policy may stay constant while the value is still changing. (a) is false - - - if the discount factor is small, the MDP may avoid long paths with the (larger) payoff only at the end [CHECK THIS ANSWER] c. Both (a) and (b) are true d. Neither (a) nor (b) are true c) [2] Which of the following are true about Q- learning? a. Q- learning will only learn the optimal Q- values if actions are eventually selected according to the optimal policy. b. In a deterministic MDP (i.e. one in which each state / action leads to a single deterministic next state), the Q- learning update with a learning rate of α = 1 will correctly learn the optimal Q- values. i. : True. The learning rate is only there because we are trying to approximate a summation with a single sample. In a deterministic MDP where s is always that we always get to after applying action a in state s, then the update rule: 1. Q(s,a) = R(s,a,s ) + max_a Q(s,a ), which is exactly the update we make. ii. (a) is false because any strategy that visits all states will eventually allow Q- learning to converge to the optimal Q- values. c. Both (a) and (b) are true d. Neither (a) nor (b) are true

d) The above MDP has states 1,2,3,4,G, and M. Where G, M are terminal states. The reward for transitioning from any state to G is 1 (you scored a goal!). The reward for transitioning from any state to M is 0. All other rewards are zero. There is no discount rate (so γ = 1). The transition distributions are: From state i, if you shoot, you have a probability i/4 of scoring, so: T(i, S, G) = i/4, and otherwise you miss: T(i, S, M) = 1- i/4 If you dribble from state i, you have a ¾ probability to get to state i+1, and a ¼ probability of losing the ball and going to state M (unless you are in state 4, when the goalie stops you every time), so: T(i,D,i+1) = ¾, and T(I,D,M) = ¼ for i = 1,2,3, and T(4,D,M) = 1 a. [3] Let π be the policy that always shoots. What is V π (1)? 1/4 V(1) = T(1,S,G) * R(1,S,G) + T(1,S,M) * R(1,S,M) = 1/4 * 1 + 3/4 * 0 = 1/4 b. [3] Define Q* to be Q- values under the optimal policy; what is Q*(3,D)? 3/4 Q(3,D) is dribbling from step 3, so actions will be dribble then shoot. Rewards for missing are zero, so those terms can be dropped as soon as we see them. Thus: Plugging in values: T(3,D,4) = ¾. T(4,S,G) = 1, R(4,S,G) = 1 so Q*(3,D) = 3/4 c. [3] If you use value iteration to compute the values V* for each node, what is the sequence of values after the first three iterations for for V*(1)? (Your answer should be a set of three values, such as 1/12, 1/3, 1/2, and you may have to compute the value iteration values for all states to compute these. 1/4, 6/16 OR 3/8, 27/64 Box below is just for your work. The only thing that will be graded is what you put here ^^^^.

Iteration# V*(1) V*(2) V*(3) V*(4) 0 (initialization) 0 0 0 0 1 ¼ 2/4 3/4 4/4 2 6/16 9/16 3/4 4/4 3 27/64 9/16 3/4 4/4 Question 5. Probabilities We continue the same soccer problem, but now imagine that sometimes there is a defender D between the agent A and the goal. A has no way of detecting whether D is present, but does know statistics of the environment: D is present 2/3 of the time. D does not affect shots at all, only dribbling. When D is absent, the chance of dribbling forward successfully is 3/4 (as it was in the problem above), When D is present, the chance of dribbling forward is 1/4. In either case, if dribbling forward fails, the game goes to the M (missed) state. a. [2] If the defender is present, what is the optimal action from state 1? S b. [4] Suppose that A dribbles twice successfully from state 1 to state 3, then shoots and scores. Given this observation, what is the probability that the defender D was present? 2/11 We can use Bayes rule, where d is a random variable denoting the presence of the defender, and e is the evidence that A dribbled twice and then scored: Want to compute P(d e). By Bayes rule that can be expressed as: P(d e) = P(e d) P(d) / P(e) Building up these pieces,we have: P(e) = P(e d) * P(d) + P(e ~d) P(~d) P(e d) = probability of our actions given defender = ¼ * ¼ * ¾ = 3/64 P(e ~d) = probability of our actions without defender = ¾ * ¾ * ¾ = 27/64 P(e) = 3/64 * 2/3 + 27/64 * 1/3 = 2/64 + 9/64 = 11/64 P(d e) = (3/64) * (2/3) / (11/64) = 3*(2/3) / 11 = 2/11