Inverse reinforcement learning from summary data

Size: px
Start display at page:

Download "Inverse reinforcement learning from summary data"

Transcription

1 Inverse reinforcement learning from summary data Antti Kangasrääsiö, Samuel Kaski Aalto University, Finland ECML PKDD 2018 journal track Published in Machine Learning (2018), 107: September 12, 2018 Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

2 Modelling human decision-making: Motivation Our overarching goal is to have accurate white-box models of human decision-making Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

3 Modelling human decision-making: Motivation Our overarching goal is to have accurate white-box models of human decision-making Applications of high-fidelity user models Replicating demonstrated behavior (imitation learning) Optimizing user interfaces (human-computer interaction) Estimating cognitive state/goals of humans (chatbots) Understanding human cognition (cognitive science) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

4 Modelling human decision-making: Problem How to infer the parameters of sequential decision-making models when the available observation data is limited? Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

5 Modelling human decision-making: Problem How to infer the parameters of sequential decision-making models when the available observation data is limited? Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

6 Modelling human decision-making: Problem How to infer the parameters of sequential decision-making models when the available observation data is limited? Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

7 Modelling human decision-making: Problem How to infer the parameters of sequential decision-making models when the available observation data is limited? Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

8 Modelling human decision-making: Problem How to infer the parameters of sequential decision-making models when the available observation data is limited? Main contribution: We demonstrate that posterior inference is possible for realistic models of decision-making, even with very limited observations of human behavior Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

9 Reinforcement learning models We use the RL framework for modelling sequential decision-making The main assumption is that human decisions can be approximated by an optimal policy trained for a certain decision problem (eg. MDP, POMDP) Humans make rational decisions within the limitations they have Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

10 Inverse reinforcement learning (IRL) Inverse reinforcement learning: Given a set of observations, which MDP has a matching optimal policy? Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

11 Inverse reinforcement learning (IRL) Inverse reinforcement learning: Given a set of observations, which MDP has a matching optimal policy? Traditional IRL problem Given an MDP with reward-function R(s; θ), θ unknown a set of state-action trajectories Ξ = {ξ 1,..., ξ N } demonstrating optimal behavior, where ξ i = (s i 0, ai 1,..., ai T i 1, si T i ) a prior P(θ) Determine a point estimate ˆθ or the posterior P(θ Ξ) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

12 Existing solutions Traditional IRL has been gradient descent on the likelihood L(θ Ξ) = N i=1 T i 1 P(s0) i πθ (si t, at)p(s i t+1 s i t, i at) i t=0 Tractable when all states and actions are observed what about when this is not the case? 1 Activity forecasting, Kitani et al EM for IRL with hidden data, Bogert et al Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

13 Existing solutions Traditional IRL has been gradient descent on the likelihood L(θ Ξ) = N i=1 T i 1 P(s0) i πθ (si t, at)p(s i t+1 s i t, i at) i t=0 Tractable when all states and actions are observed what about when this is not the case? Previous work: If state observations are corrupted with i.i.d. noise 1 or part of them are missing 2, EM-approach can be used to estimate the true states, after which standard IRL methods apply 1 Activity forecasting, Kitani et al EM for IRL with hidden data, Bogert et al Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

14 Existing solutions Traditional IRL has been gradient descent on the likelihood L(θ Ξ) = N i=1 T i 1 P(s0) i πθ (si t, at)p(s i t+1 s i t, i at) i t=0 Tractable when all states and actions are observed what about when this is not the case? Previous work: If state observations are corrupted with i.i.d. noise 1 or part of them are missing 2, EM-approach can be used to estimate the true states, after which standard IRL methods apply However, this approach is not feasible in the more realistic cases, with complex non-i.i.d. noise or most of the states and actions missing 1 Activity forecasting, Kitani et al EM for IRL with hidden data, Bogert et al Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

15 IRL from summary data (IRL-SD) We ask whether IRL is possible in realistic cases, where the true trajectories ξ i are filtered through a generic summarizing function σ, yielding summaries ξ iσ σ(ξ i ) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

16 IRL from summary data (IRL-SD) We ask whether IRL is possible in realistic cases, where the true trajectories ξ i are filtered through a generic summarizing function σ, yielding summaries ξ iσ σ(ξ i ) Example: Alice walks to work every day along her preferred secret route. Could we infer Alice s scenery preferences given only the durations of the commutes and the location of her work and home? Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

17 IRL from summary data (IRL-SD) We ask whether IRL is possible in realistic cases, where the true trajectories ξ i are filtered through a generic summarizing function σ, yielding summaries ξ iσ σ(ξ i ) Example: Alice walks to work every day along her preferred secret route. Could we infer Alice s scenery preferences given only the durations of the commutes and the location of her work and home? IRL from summary data (IRL-SD) problem Given an MDP with unknown parameters θ a set of summaries Ξ σ = {ξ 1σ,..., ξ Nσ } from optimal behavior the summary function σ a prior P(θ) Determine a point estimate ˆθ or the posterior P(θ Ξ σ ). Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

18 Exact solution The likelihood corresponding to an IRL-SD problem is N L(θ Ξ σ ) = P(ξ iσ ξ i )P(ξ i θ), i=1 ξ i Ξ ap where we marginalize over the unobserved true ξ i Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

19 Exact solution The likelihood corresponding to an IRL-SD problem is L(θ Ξ σ ) = N i=1 ξ i Ξ ap P(ξ iσ ξ i )P(ξ i θ), where we marginalize over the unobserved true ξ i The set of all plausible true trajectories is Ξ ap S Tmax +1 A Tmax Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

20 Exact solution The likelihood corresponding to an IRL-SD problem is L(θ Ξ σ ) = N i=1 ξ i Ξ ap P(ξ iσ ξ i )P(ξ i θ), where we marginalize over the unobserved true ξ i The set of all plausible true trajectories is Ξ ap S Tmax +1 A Tmax P(ξ iσ ξ i ) is determined by the summary function σ Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

21 Exact solution The likelihood corresponding to an IRL-SD problem is L(θ Ξ σ ) = N i=1 ξ i Ξ ap P(ξ iσ ξ i )P(ξ i θ), where we marginalize over the unobserved true ξ i The set of all plausible true trajectories is Ξ ap S Tmax +1 A Tmax P(ξ iσ ξ i ) is determined by the summary function σ The likelihood of a trajectory is as before T i 1 P(ξ i θ) = P(s0) i πθ (si t, at)p(s i t+1 s i t, i at) i t=0 Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

22 Exact solution The likelihood corresponding to an IRL-SD problem is L(θ Ξ σ ) = N i=1 ξ i Ξ ap P(ξ iσ ξ i )P(ξ i θ), where we marginalize over the unobserved true ξ i The set of all plausible true trajectories is Ξ ap S Tmax +1 A Tmax P(ξ iσ ξ i ) is determined by the summary function σ The likelihood of a trajectory is as before T i 1 P(ξ i θ) = P(s0) i πθ (si t, at)p(s i t+1 s i t, i at) i t=0 Takeaway: L(θ Ξ σ ) can be evaluated, but it is very expensive to do so due to Ξ ap being generally large or challenging to determine Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

23 Monte-Carlo approximation We can estimate L(θ Ξ σ ) by solving π θ and then sampling N MC trajectories, Ξ MC, leading to the Monte-Carlo estimate N ˆL(θ Ξ σ ) = 1 N MC i=1 ξ n Ξ MC P(ξ iσ ξ n ) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

24 Monte-Carlo approximation We can estimate L(θ Ξ σ ) by solving π θ and then sampling N MC trajectories, Ξ MC, leading to the Monte-Carlo estimate N ˆL(θ Ξ σ ) = 1 N MC i=1 ξ n Ξ MC P(ξ iσ ξ n ) However P(ξ iσ ξ n ) may be 0 for all ξ n Ξ MC, forcing ˆL(θ Ξ σ ) to be 0 Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

25 Monte-Carlo approximation We can estimate L(θ Ξ σ ) by solving π θ and then sampling N MC trajectories, Ξ MC, leading to the Monte-Carlo estimate ˆL(θ Ξ σ ) N ( 1 i=1 N MC ) P(ξ iσ ξ n )+η ξ n Ξ MC However P(ξ iσ ξ n ) may be 0 for all ξ n Ξ MC, forcing ˆL(θ Ξ σ ) to be 0 (can be fixed with a prior η) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

26 Monte-Carlo approximation We can estimate L(θ Ξ σ ) by solving π θ and then sampling N MC trajectories, Ξ MC, leading to the Monte-Carlo estimate ˆL(θ Ξ σ ) N ( 1 i=1 N MC ) P(ξ iσ ξ n )+η ξ n Ξ MC However P(ξ iσ ξ n ) may be 0 for all ξ n Ξ MC, forcing ˆL(θ Ξ σ ) to be 0 (can be fixed with a prior η) σ needs to be known as a distribution P(ξ iσ ξ n ) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

27 Monte-Carlo approximation We can estimate L(θ Ξ σ ) by solving π θ and then sampling N MC trajectories, Ξ MC, leading to the Monte-Carlo estimate ˆL(θ Ξ σ ) N ( 1 i=1 N MC ) P(ξ iσ ξ n )+η ξ n Ξ MC However P(ξ iσ ξ n ) may be 0 for all ξ n Ξ MC, forcing ˆL(θ Ξ σ ) to be 0 (can be fixed with a prior η) σ needs to be known as a distribution P(ξ iσ ξ n ) Takeaway: L(θ Ξ σ ) can be estimated with Monte-Carlo, but there are few technical issues we would like to avoid Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

28 Approximate Bayesian computation ABC also performs inference using on Monte-Carlo sampling Instead of estimating the likelihood of each trajectory ξ i separately, the likelihood of the entire observation set Ξ is estimated together Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

29 Approximate Bayesian computation ABC also performs inference using on Monte-Carlo sampling Instead of estimating the likelihood of each trajectory ξ i separately, the likelihood of the entire observation set Ξ is estimated together How ABC works: Simulate observations using the MC sample: Ξ sim σ = {σ(ξ MC,n )} (only requires us to sample from σ) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

30 Approximate Bayesian computation ABC also performs inference using on Monte-Carlo sampling Instead of estimating the likelihood of each trajectory ξ i separately, the likelihood of the entire observation set Ξ is estimated together How ABC works: Simulate observations using the MC sample: Ξ sim σ = {σ(ξ MC,n )} (only requires us to sample from σ) Estimate discrepancy: δ(ξ σ, Ξ sim σ ) [0, ) (matches distributions; reduces effect of individual rare observations) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

31 Approximate Bayesian computation ABC also performs inference using on Monte-Carlo sampling Instead of estimating the likelihood of each trajectory ξ i separately, the likelihood of the entire observation set Ξ is estimated together How ABC works: Simulate observations using the MC sample: Ξ sim σ = {σ(ξ MC,n )} (only requires us to sample from σ) Estimate discrepancy: δ(ξ σ, Ξ sim σ ) [0, ) (matches distributions; reduces effect of individual rare observations) The ε-approximate ABC likelihood: L ε (θ Ξ σ ) = P(δ(Ξ σ, Ξ sim σ ) ε θ) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

32 Approximate Bayesian computation ABC also performs inference using on Monte-Carlo sampling Instead of estimating the likelihood of each trajectory ξ i separately, the likelihood of the entire observation set Ξ is estimated together How ABC works: Simulate observations using the MC sample: Ξ sim σ = {σ(ξ MC,n )} (only requires us to sample from σ) Estimate discrepancy: δ(ξ σ, Ξ sim σ ) [0, ) (matches distributions; reduces effect of individual rare observations) The ε-approximate ABC likelihood: L ε (θ Ξ σ ) = P(δ(Ξ σ, Ξ sim σ ) ε θ) Intuition: If simulating observations with θ leads to small prediction error, then likelihood of θ is high and vice versa Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

33 Approximate Bayesian computation ABC also performs inference using on Monte-Carlo sampling Instead of estimating the likelihood of each trajectory ξ i separately, the likelihood of the entire observation set Ξ is estimated together How ABC works: Simulate observations using the MC sample: Ξ sim σ = {σ(ξ MC,n )} (only requires us to sample from σ) Estimate discrepancy: δ(ξ σ, Ξ sim σ ) [0, ) (matches distributions; reduces effect of individual rare observations) The ε-approximate ABC likelihood: L ε (θ Ξ σ ) = P(δ(Ξ σ, Ξ sim σ ) ε θ) Intuition: If simulating observations with θ leads to small prediction error, then likelihood of θ is high and vice versa Takeaway: The issues with MC (numerical problems with rare observations, σ known as a distribution) can be avoided by using ABC Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

34 Inference Now we can estimate L(θ Ξ) at any θ, but how to find the best θ Θ? Evaluating the functions is still expensive The functions don t have accessible gradients Due to limited observability (σ), parameter uncertainty is likely large Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

35 Inference Now we can estimate L(θ Ξ) at any θ, but how to find the best θ Θ? Evaluating the functions is still expensive The functions don t have accessible gradients Due to limited observability (σ), parameter uncertainty is likely large We estimate the log-likelihoods using a GP surrogate model, fit using Bayesian optimization. Mean and shape of distribution estimated from MCMC-samples. Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

36 Inference Now we can estimate L(θ Ξ) at any θ, but how to find the best θ Θ? Evaluating the functions is still expensive The functions don t have accessible gradients Due to limited observability (σ), parameter uncertainty is likely large We estimate the log-likelihoods using a GP surrogate model, fit using Bayesian optimization. Mean and shape of distribution estimated from MCMC-samples. Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

37 Inference Now we can estimate L(θ Ξ) at any θ, but how to find the best θ Θ? Evaluating the functions is still expensive The functions don t have accessible gradients Due to limited observability (σ), parameter uncertainty is likely large We estimate the log-likelihoods using a GP surrogate model, fit using Bayesian optimization. Mean and shape of distribution estimated from MCMC-samples. Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

38 Inference Now we can estimate L(θ Ξ) at any θ, but how to find the best θ Θ? Evaluating the functions is still expensive The functions don t have accessible gradients Due to limited observability (σ), parameter uncertainty is likely large We estimate the log-likelihoods using a GP surrogate model, fit using Bayesian optimization. Mean and shape of distribution estimated from MCMC-samples. Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

39 Simulation experiment We used grid world environments to validate our approach Task was to infer reward weights for state features: R(s) = φ(s) T θ Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

40 Simulation experiment We used grid world environments to validate our approach Task was to infer reward weights for state features: R(s) = φ(s) T θ We only knew the start and end locations of the agent and the length of the trajectory: ξ σ = (s 0, s T, T ) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

41 Simulation experiment We used grid world environments to validate our approach Task was to infer reward weights for state features: R(s) = φ(s) T θ We only knew the start and end locations of the agent and the length of the trajectory: ξ σ = (s 0, s T, T ) Miniature example: What kind of terrain might the agent prefer, given that moving from A to B took it T steps? Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

42 Inferred distributions (example) Takeaways The parameter values can be inferred based on summary observations Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

43 Inferred distributions (example) Takeaways The parameter values can be inferred based on summary observations The approximate distributions are similar to the true distribution Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

44 Efficiency Takeaways Summing over all plausible trajectories is expensive with larger MDPs Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

45 Efficiency Takeaways Summing over all plausible trajectories is expensive with larger MDPs The approximate methods scale significantly better Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

46 Accuracy and model fit Takeaways Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

47 Accuracy and model fit Takeaways Good approximation performance while outperforming a random baseline Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

48 Accuracy and model fit Takeaways Good approximation performance while outperforming a random baseline Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

49 Accuracy and model fit Takeaways Good approximation performance while outperforming a random baseline Approximate methods continue performing well even with larger MDPs Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

50 Realistic Experiment We performed experiments using an RL model from cognitive science User searched repeatedly for target items from drop-down menus Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

51 Realistic Experiment We performed experiments using an RL model from cognitive science User searched repeatedly for target items from drop-down menus The MDP contained a simple model of human vision and short-term memory Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

52 Realistic Experiment We performed experiments using an RL model from cognitive science User searched repeatedly for target items from drop-down menus The MDP contained a simple model of human vision and short-term memory Goal: infer values of three model parameters based on observing task completion times (TCT) and whether the target item was present in the menu: ξ σ = (target present?, TCT ) Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

53 Realistic Experiment We performed experiments using an RL model from cognitive science User searched repeatedly for target items from drop-down menus The MDP contained a simple model of human vision and short-term memory Goal: infer values of three model parameters based on observing task completion times (TCT) and whether the target item was present in the menu: ξ σ = (target present?, TCT ) visual fixation duration f dur item selection duration d sel menu layout recall probability p rec Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

54 Model fit ABC Hold-out data Task Completion Time (abs) 430 ms 470 ms Task Completion Time (pre) 980 ms 970 ms abs = target absent from menu, pre = target present in menu Takeaways Predictions with parameters inferred by ABC match to hold-out observation data, indicating good model fit Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

55 Model fit ABC Hold-out data Task Completion Time (abs) 430 ms 470 ms Task Completion Time (pre) 980 ms 970 ms Number of Saccades (abs) Number of Saccades (pre) abs = target absent from menu, pre = target present in menu Takeaways Predictions with parameters inferred by ABC match to hold-out observation data, indicating good model fit Also unobserved features match approximately to predictions Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

56 Approximate posterior Takeaway Posterior indicates good identification of model parameter values Remaining parameter uncertainty is easy to visualize Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

57 Conclusions We proposed two approximate methods (MC, ABC) for solving the problem of trajectory-level observation noise in IRL More scalable than exact likelihood Good approximation quality Full posterior inference, which is important due to noisy observations Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

58 Conclusions We proposed two approximate methods (MC, ABC) for solving the problem of trajectory-level observation noise in IRL More scalable than exact likelihood Good approximation quality Full posterior inference, which is important due to noisy observations We demonstrated applicability for a realistic cognitive science model based on real observation data Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

59 Conclusions We proposed two approximate methods (MC, ABC) for solving the problem of trajectory-level observation noise in IRL More scalable than exact likelihood Good approximation quality Full posterior inference, which is important due to noisy observations We demonstrated applicability for a realistic cognitive science model based on real observation data Next steps: improve scalability Still requires solving RL problems in the inner loop Scalability of GP and BO to high dimensions Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

60 Conclusions We proposed two approximate methods (MC, ABC) for solving the problem of trajectory-level observation noise in IRL More scalable than exact likelihood Good approximation quality Full posterior inference, which is important due to noisy observations We demonstrated applicability for a realistic cognitive science model based on real observation data Next steps: improve scalability Still requires solving RL problems in the inner loop Scalability of GP and BO to high dimensions More details at the poster tomorrow Antti Kangasrääsiö, Samuel Kaski (Aalto) ECML PKDD 2018 September 12, / 20

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning MDP March May, 2013 MDP MDP: S, A, P, R, γ, µ State can be partially observable: Partially Observable MDPs () Actions can be temporally extended: Semi MDPs (SMDPs) and Hierarchical

More information

Machine Learning for Quantitative Finance

Machine Learning for Quantitative Finance Machine Learning for Quantitative Finance Fast derivative pricing Sofie Reyners Joint work with Jan De Spiegeleer, Dilip Madan and Wim Schoutens Derivative pricing is time-consuming... Vanilla option pricing

More information

Reinforcement Learning and Simulation-Based Search

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Outline 1 Reinforcement Learning 2 3 Planning Under Uncertainty Reinforcement Learning Markov Decision Process Definition A Markov Decision

More information

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior (5) Multi-parameter models - Summarizing the posterior Models with more than one parameter Thus far we have studied single-parameter models, but most analyses have several parameters For example, consider

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections 1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:

More information

Reasoning with Uncertainty

Reasoning with Uncertainty Reasoning with Uncertainty Markov Decision Models Manfred Huber 2015 1 Markov Decision Process Models Markov models represent the behavior of a random process, including its internal state and the externally

More information

Stochastic Volatility (SV) Models

Stochastic Volatility (SV) Models 1 Motivations Stochastic Volatility (SV) Models Jun Yu Some stylised facts about financial asset return distributions: 1. Distribution is leptokurtic 2. Volatility clustering 3. Volatility responds to

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions Frequentist Methods: 7.5 Maximum Likelihood Estimators

More information

Top-down particle filtering for Bayesian decision trees

Top-down particle filtering for Bayesian decision trees Top-down particle filtering for Bayesian decision trees Balaji Lakshminarayanan 1, Daniel M. Roy 2 and Yee Whye Teh 3 1. Gatsby Unit, UCL, 2. University of Cambridge and 3. University of Oxford Outline

More information

INVERSE REWARD DESIGN

INVERSE REWARD DESIGN INVERSE REWARD DESIGN Dylan Hadfield-Menell, Smith Milli, Pieter Abbeel, Stuart Russell, Anca Dragan University of California, Berkeley Slides by Anthony Chen Inverse Reinforcement Learning (Review) Inverse

More information

(5) Multi-parameter models - Summarizing the posterior

(5) Multi-parameter models - Summarizing the posterior (5) Multi-parameter models - Summarizing the posterior Spring, 2017 Models with more than one parameter Thus far we have studied single-parameter models, but most analyses have several parameters For example,

More information

Reinforcement Learning. Monte Carlo and Temporal Difference Learning

Reinforcement Learning. Monte Carlo and Temporal Difference Learning Reinforcement Learning Monte Carlo and Temporal Difference Learning Manfred Huber 2014 1 Monte Carlo Methods Dynamic Programming Requires complete knowledge of the MDP Spends equal time on each part of

More information

Estimation of the Markov-switching GARCH model by a Monte Carlo EM algorithm

Estimation of the Markov-switching GARCH model by a Monte Carlo EM algorithm Estimation of the Markov-switching GARCH model by a Monte Carlo EM algorithm Maciej Augustyniak Fields Institute February 3, 0 Stylized facts of financial data GARCH Regime-switching MS-GARCH Agenda Available

More information

Introduction to Reinforcement Learning. MAL Seminar

Introduction to Reinforcement Learning. MAL Seminar Introduction to Reinforcement Learning MAL Seminar 2014-2015 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Trial & Error Combines ideas from psychology

More information

Equity correlations implied by index options: estimation and model uncertainty analysis

Equity correlations implied by index options: estimation and model uncertainty analysis 1/18 : estimation and model analysis, EDHEC Business School (joint work with Rama COT) Modeling and managing financial risks Paris, 10 13 January 2011 2/18 Outline 1 2 of multi-asset models Solution to

More information

Extracting Information from the Markets: A Bayesian Approach

Extracting Information from the Markets: A Bayesian Approach Extracting Information from the Markets: A Bayesian Approach Daniel Waggoner The Federal Reserve Bank of Atlanta Florida State University, February 29, 2008 Disclaimer: The views expressed are the author

More information

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50) Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I January

More information

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas)

Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) CS22 Artificial Intelligence Stanford University Autumn 26-27 Lending Club Loan Portfolio Optimization Fred Robson (frobson), Chris Lucas (cflucas) Overview Lending Club is an online peer-to-peer lending

More information

Lecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 12: MDP1. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 12: MDP1 Victor R. Lesser CMPSCI 683 Fall 2010 Biased Random GSAT - WalkSat Notice no random restart 2 Today s lecture Search where there is Uncertainty in Operator Outcome --Sequential Decision

More information

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples 1.3 Regime switching models A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples (or regimes). If the dates, the

More information

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model

Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model Algorithmic Trading using Reinforcement Learning augmented with Hidden Markov Model Simerjot Kaur (sk3391) Stanford University Abstract This work presents a novel algorithmic trading system based on reinforcement

More information

Robust Longevity Risk Management

Robust Longevity Risk Management Robust Longevity Risk Management Hong Li a,, Anja De Waegenaere a,b, Bertrand Melenberg a,b a Department of Econometrics and Operations Research, Tilburg University b Netspar Longevity 10 3-4, September,

More information

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty George Photiou Lincoln College University of Oxford A dissertation submitted in partial fulfilment for

More information

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29

Chapter 5 Univariate time-series analysis. () Chapter 5 Univariate time-series analysis 1 / 29 Chapter 5 Univariate time-series analysis () Chapter 5 Univariate time-series analysis 1 / 29 Time-Series Time-series is a sequence fx 1, x 2,..., x T g or fx t g, t = 1,..., T, where t is an index denoting

More information

Adaptive Experiments for Policy Choice. March 8, 2019

Adaptive Experiments for Policy Choice. March 8, 2019 Adaptive Experiments for Policy Choice Maximilian Kasy Anja Sautmann March 8, 2019 Introduction The goal of many experiments is to inform policy choices: 1. Job search assistance for refugees: Treatments:

More information

Unobserved Heterogeneity Revisited

Unobserved Heterogeneity Revisited Unobserved Heterogeneity Revisited Robert A. Miller Dynamic Discrete Choice March 2018 Miller (Dynamic Discrete Choice) cemmap 7 March 2018 1 / 24 Distributional Assumptions about the Unobserved Variables

More information

1 Bayesian Bias Correction Model

1 Bayesian Bias Correction Model 1 Bayesian Bias Correction Model Assuming that n iid samples {X 1,...,X n }, were collected from a normal population with mean µ and variance σ 2. The model likelihood has the form, P( X µ, σ 2, T n >

More information

Estimation of a Ramsay-Curve IRT Model using the Metropolis-Hastings Robbins-Monro Algorithm

Estimation of a Ramsay-Curve IRT Model using the Metropolis-Hastings Robbins-Monro Algorithm 1 / 34 Estimation of a Ramsay-Curve IRT Model using the Metropolis-Hastings Robbins-Monro Algorithm Scott Monroe & Li Cai IMPS 2012, Lincoln, Nebraska Outline 2 / 34 1 Introduction and Motivation 2 Review

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections 1 / 31 : Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood

More information

Intro to Reinforcement Learning. Part 3: Core Theory

Intro to Reinforcement Learning. Part 3: Core Theory Intro to Reinforcement Learning Part 3: Core Theory Interactive Example: You are the algorithm! Finite Markov decision processes (finite MDPs) dynamics p p p Experience: S 0 A 0 R 1 S 1 A 1 R 2 S 2 A 2

More information

Introduction to Sequential Monte Carlo Methods

Introduction to Sequential Monte Carlo Methods Introduction to Sequential Monte Carlo Methods Arnaud Doucet NCSU, October 2008 Arnaud Doucet () Introduction to SMC NCSU, October 2008 1 / 36 Preliminary Remarks Sequential Monte Carlo (SMC) are a set

More information

Identifying Long-Run Risks: A Bayesian Mixed-Frequency Approach

Identifying Long-Run Risks: A Bayesian Mixed-Frequency Approach Identifying : A Bayesian Mixed-Frequency Approach Frank Schorfheide University of Pennsylvania CEPR and NBER Dongho Song University of Pennsylvania Amir Yaron University of Pennsylvania NBER February 12,

More information

Mini-Minimax Uncertainty Quantification for Emulators

Mini-Minimax Uncertainty Quantification for Emulators Mini-Minimax Uncertainty Quantification for Emulators http://arxiv.org/abs/1303.3079 Philip B. Stark and Jeffrey C. Regier Department of Statistics University of California, Berkeley 2nd ISNPS Conference

More information

1 Explaining Labor Market Volatility

1 Explaining Labor Market Volatility Christiano Economics 416 Advanced Macroeconomics Take home midterm exam. 1 Explaining Labor Market Volatility The purpose of this question is to explore a labor market puzzle that has bedeviled business

More information

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo

Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Markov Decision Processes (MDPs) CS 486/686 Introduction to AI University of Waterloo Outline Sequential Decision Processes Markov chains Highlight Markov property Discounted rewards Value iteration Markov

More information

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning)

Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) 1 / 24 Monte Carlo Methods (Estimators, On-policy/Off-policy Learning) Julie Nutini MLRG - Winter Term 2 January 24 th, 2017 2 / 24 Monte Carlo Methods Monte Carlo (MC) methods are learning methods, used

More information

The Time-Varying Effects of Monetary Aggregates on Inflation and Unemployment

The Time-Varying Effects of Monetary Aggregates on Inflation and Unemployment 経営情報学論集第 23 号 2017.3 The Time-Varying Effects of Monetary Aggregates on Inflation and Unemployment An Application of the Bayesian Vector Autoregression with Time-Varying Parameters and Stochastic Volatility

More information

Chapter 3. Dynamic discrete games and auctions: an introduction

Chapter 3. Dynamic discrete games and auctions: an introduction Chapter 3. Dynamic discrete games and auctions: an introduction Joan Llull Structural Micro. IDEA PhD Program I. Dynamic Discrete Games with Imperfect Information A. Motivating example: firm entry and

More information

F19: Introduction to Monte Carlo simulations. Ebrahim Shayesteh

F19: Introduction to Monte Carlo simulations. Ebrahim Shayesteh F19: Introduction to Monte Carlo simulations Ebrahim Shayesteh Introduction and repetition Agenda Monte Carlo methods: Background, Introduction, Motivation Example 1: Buffon s needle Simple Sampling Example

More information

Analysis of truncated data with application to the operational risk estimation

Analysis of truncated data with application to the operational risk estimation Analysis of truncated data with application to the operational risk estimation Petr Volf 1 Abstract. Researchers interested in the estimation of operational risk often face problems arising from the structure

More information

Estimating a Dynamic Oligopolistic Game with Serially Correlated Unobserved Production Costs. SS223B-Empirical IO

Estimating a Dynamic Oligopolistic Game with Serially Correlated Unobserved Production Costs. SS223B-Empirical IO Estimating a Dynamic Oligopolistic Game with Serially Correlated Unobserved Production Costs SS223B-Empirical IO Motivation There have been substantial recent developments in the empirical literature on

More information

Exact Inference (9/30/13) 2 A brief review of Forward-Backward and EM for HMMs

Exact Inference (9/30/13) 2 A brief review of Forward-Backward and EM for HMMs STA561: Probabilistic machine learning Exact Inference (9/30/13) Lecturer: Barbara Engelhardt Scribes: Jiawei Liang, He Jiang, Brittany Cohen 1 Validation for Clustering If we have two centroids, η 1 and

More information

Quarterly Storage Model of U.S. Cotton Market: Estimation of the Basis under Rational Expectations. Oleksiy Tokovenko 1 Lewell F.

Quarterly Storage Model of U.S. Cotton Market: Estimation of the Basis under Rational Expectations. Oleksiy Tokovenko 1 Lewell F. Quarterly Storage Model of U.S. Cotton Market: Estimation of the Basis under Rational Expectations Oleksiy Tokovenko 1 Lewell F. Gunter Selected Paper prepared for presentation at the American Agricultural

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein

Reinforcement Learning. Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Slides based on those used in Berkeley's AI class taught by Dan Klein Reinforcement Learning Basic idea: Receive feedback in the form of rewards Agent s utility is defined by the

More information

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties Posterior Inference Example. Consider a binomial model where we have a posterior distribution for the probability term, θ. Suppose we want to make inferences about the log-odds γ = log ( θ 1 θ), where

More information

Machine Learning in Computer Vision Markov Random Fields Part II

Machine Learning in Computer Vision Markov Random Fields Part II Machine Learning in Computer Vision Markov Random Fields Part II Oren Freifeld Computer Science, Ben-Gurion University March 22, 2018 Mar 22, 2018 1 / 40 1 Some MRF Computations 2 Mar 22, 2018 2 / 40 Few

More information

Relevant parameter changes in structural break models

Relevant parameter changes in structural break models Relevant parameter changes in structural break models A. Dufays J. Rombouts Forecasting from Complexity April 27 th, 2018 1 Outline Sparse Change-Point models 1. Motivation 2. Model specification Shrinkage

More information

TDT4171 Artificial Intelligence Methods

TDT4171 Artificial Intelligence Methods TDT47 Artificial Intelligence Methods Lecture 7 Making Complex Decisions Norwegian University of Science and Technology Helge Langseth IT-VEST 0 helgel@idi.ntnu.no TDT47 Artificial Intelligence Methods

More information

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration

Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Reinforcement Learning (1): Discrete MDP, Value Iteration, Policy Iteration Piyush Rai CS5350/6350: Machine Learning November 29, 2011 Reinforcement Learning Supervised Learning: Uses explicit supervision

More information

Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks

Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Pakes (1986): Patents as Options: Some Estimates of the Value of Holding European Patent Stocks Spring 2009 Main question: How much are patents worth? Answering this question is important, because it helps

More information

Importance Sampling for Fair Policy Selection

Importance Sampling for Fair Policy Selection Importance Sampling for Fair Policy Selection Shayan Doroudi Carnegie Mellon University Pittsburgh, PA 15213 shayand@cs.cmu.edu Philip S. Thomas Carnegie Mellon University Pittsburgh, PA 15213 philipt@cs.cmu.edu

More information

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50) Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 1 Introduction January 16, 2018 M. Wiktorsson

More information

To earn the extra credit, one of the following has to hold true. Please circle and sign.

To earn the extra credit, one of the following has to hold true. Please circle and sign. CS 188 Fall 2018 Introduction to rtificial Intelligence Practice Midterm 2 To earn the extra credit, one of the following has to hold true. Please circle and sign. I spent 2 or more hours on the practice

More information

Part II: Computation for Bayesian Analyses

Part II: Computation for Bayesian Analyses Part II: Computation for Bayesian Analyses 62 BIO 233, HSPH Spring 2015 Conjugacy In both birth weight eamples the posterior distribution is from the same family as the prior: Prior Likelihood Posterior

More information

Model Estimation. Liuren Wu. Fall, Zicklin School of Business, Baruch College. Liuren Wu Model Estimation Option Pricing, Fall, / 16

Model Estimation. Liuren Wu. Fall, Zicklin School of Business, Baruch College. Liuren Wu Model Estimation Option Pricing, Fall, / 16 Model Estimation Liuren Wu Zicklin School of Business, Baruch College Fall, 2007 Liuren Wu Model Estimation Option Pricing, Fall, 2007 1 / 16 Outline 1 Statistical dynamics 2 Risk-neutral dynamics 3 Joint

More information

CPS 270: Artificial Intelligence Markov decision processes, POMDPs

CPS 270: Artificial Intelligence  Markov decision processes, POMDPs CPS 270: Artificial Intelligence http://www.cs.duke.edu/courses/fall08/cps270/ Markov decision processes, POMDPs Instructor: Vincent Conitzer Warmup: a Markov process with rewards We derive some reward

More information

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling

Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling Bayesian Hierarchical/ Multilevel and Latent-Variable (Random-Effects) Modeling 1: Formulation of Bayesian models and fitting them with MCMC in WinBUGS David Draper Department of Applied Mathematics and

More information

BIASES OVER BIASED INFORMATION STRUCTURES:

BIASES OVER BIASED INFORMATION STRUCTURES: BIASES OVER BIASED INFORMATION STRUCTURES: Confirmation, Contradiction and Certainty Seeking Behavior in the Laboratory Gary Charness Ryan Oprea Sevgi Yuksel UCSB - UCSB UCSB October 2017 MOTIVATION News

More information

Sequential Sampling for Selection: The Undiscounted Case

Sequential Sampling for Selection: The Undiscounted Case Sequential Sampling for Selection: The Undiscounted Case Stephen E. Chick 1 Peter I. Frazier 2 1 Technology & Operations Management, INSEAD 2 Operations Research & Information Engineering, Cornell University

More information

Statistical estimation

Statistical estimation Statistical estimation Statistical modelling: theory and practice Gilles Guillot gigu@dtu.dk September 3, 2013 Gilles Guillot (gigu@dtu.dk) Estimation September 3, 2013 1 / 27 1 Introductory example 2

More information

Non-informative Priors Multiparameter Models

Non-informative Priors Multiparameter Models Non-informative Priors Multiparameter Models Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Prior Types Informative vs Non-informative There has been a desire for a prior distributions that

More information

Importance sampling and Monte Carlo-based calibration for time-changed Lévy processes

Importance sampling and Monte Carlo-based calibration for time-changed Lévy processes Importance sampling and Monte Carlo-based calibration for time-changed Lévy processes Stefan Kassberger Thomas Liebmann BFS 2010 1 Motivation 2 Time-changed Lévy-models and Esscher transforms 3 Applications

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Temporal Difference Learning Used Materials Disclaimer: Much of the material and slides

More information

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics Chapter 12 American Put Option Recall that the American option has strike K and maturity T and gives the holder the right to exercise at any time in [0, T ]. The American option is not straightforward

More information

Intro to Decision Theory

Intro to Decision Theory Intro to Decision Theory Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 3 1 Please be patient with the Windows machine... 2 Topics Loss function Risk Posterior Risk Bayes

More information

Analysis of the Bitcoin Exchange Using Particle MCMC Methods

Analysis of the Bitcoin Exchange Using Particle MCMC Methods Analysis of the Bitcoin Exchange Using Particle MCMC Methods by Michael Johnson M.Sc., University of British Columbia, 2013 B.Sc., University of Winnipeg, 2011 Project Submitted in Partial Fulfillment

More information

Probabilistic Meshless Methods for Bayesian Inverse Problems. Jon Cockayne July 8, 2016

Probabilistic Meshless Methods for Bayesian Inverse Problems. Jon Cockayne July 8, 2016 Probabilistic Meshless Methods for Bayesian Inverse Problems Jon Cockayne July 8, 2016 1 Co-Authors Chris Oates Tim Sullivan Mark Girolami 2 What is PN? Many problems in mathematics have no analytical

More information

The method of Maximum Likelihood.

The method of Maximum Likelihood. Maximum Likelihood The method of Maximum Likelihood. In developing the least squares estimator - no mention of probabilities. Minimize the distance between the predicted linear regression and the observed

More information

The Option-Critic Architecture

The Option-Critic Architecture The Option-Critic Architecture Pierre-Luc Bacon, Jean Harb, Doina Precup Reasoning and Learning Lab McGill University, Montreal, Canada AAAI 2017 Intelligence: the ability to generalize and adapt efficiently

More information

Experience with the Weighted Bootstrap in Testing for Unobserved Heterogeneity in Exponential and Weibull Duration Models

Experience with the Weighted Bootstrap in Testing for Unobserved Heterogeneity in Exponential and Weibull Duration Models Experience with the Weighted Bootstrap in Testing for Unobserved Heterogeneity in Exponential and Weibull Duration Models Jin Seo Cho, Ta Ul Cheong, Halbert White Abstract We study the properties of the

More information

Information Processing and Limited Liability

Information Processing and Limited Liability Information Processing and Limited Liability Bartosz Maćkowiak European Central Bank and CEPR Mirko Wiederholt Northwestern University January 2012 Abstract Decision-makers often face limited liability

More information

ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit

ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY. A. Ben-Tal, B. Golany and M. Rozenblit ROBUST OPTIMIZATION OF MULTI-PERIOD PRODUCTION PLANNING UNDER DEMAND UNCERTAINTY A. Ben-Tal, B. Golany and M. Rozenblit Faculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel ABSTRACT

More information

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation.

Choice Probabilities. Logit Choice Probabilities Derivation. Choice Probabilities. Basic Econometrics in Transportation. 1/31 Choice Probabilities Basic Econometrics in Transportation Logit Models Amir Samimi Civil Engineering Department Sharif University of Technology Primary Source: Discrete Choice Methods with Simulation

More information

Integrating Contract Risk with Schedule and Cost Estimates

Integrating Contract Risk with Schedule and Cost Estimates Integrating Contract Risk with Schedule and Cost Estimates Breakout Session # B01 Donald E. Shannon, Owner, The Contract Coach December 14, 2015 2:15pm 3:30pm 1 1 The Importance of Estimates Estimates

More information

Likelihood-based Optimization of Threat Operation Timeline Estimation

Likelihood-based Optimization of Threat Operation Timeline Estimation 12th International Conference on Information Fusion Seattle, WA, USA, July 6-9, 2009 Likelihood-based Optimization of Threat Operation Timeline Estimation Gregory A. Godfrey Advanced Mathematics Applications

More information

STA 532: Theory of Statistical Inference

STA 532: Theory of Statistical Inference STA 532: Theory of Statistical Inference Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 2 Estimating CDFs and Statistical Functionals Empirical CDFs Let {X i : i n}

More information

Week 7 Quantitative Analysis of Financial Markets Simulation Methods

Week 7 Quantitative Analysis of Financial Markets Simulation Methods Week 7 Quantitative Analysis of Financial Markets Simulation Methods Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 November

More information

6.825 Homework 3: Solutions

6.825 Homework 3: Solutions 6.825 Homework 3: Solutions 1 Easy EM You are given the network structure shown in Figure 1 and the data in the following table, with actual observed values for A, B, and C, and expected counts for D.

More information

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics

Missing Data. EM Algorithm and Multiple Imputation. Aaron Molstad, Dootika Vats, Li Zhong. University of Minnesota School of Statistics Missing Data EM Algorithm and Multiple Imputation Aaron Molstad, Dootika Vats, Li Zhong University of Minnesota School of Statistics December 4, 2013 Overview 1 EM Algorithm 2 Multiple Imputation Incomplete

More information

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES Small business banking and financing: a global perspective Cagliari, 25-26 May 2007 ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES C. Angela, R. Bisignani, G. Masala, M. Micocci 1

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Monte Carlo Methods Mark Schmidt University of British Columbia Winter 2018 Last Time: Markov Chains We can use Markov chains for density estimation, p(x) = p(x 1 ) }{{} d p(x

More information

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N

Markov Decision Processes: Making Decision in the Presence of Uncertainty. (some of) R&N R&N Markov Decision Processes: Making Decision in the Presence of Uncertainty (some of) R&N 16.1-16.6 R&N 17.1-17.4 Different Aspects of Machine Learning Supervised learning Classification - concept learning

More information

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book.

Introduction Dickey-Fuller Test Option Pricing Bootstrapping. Simulation Methods. Chapter 13 of Chris Brook s Book. Simulation Methods Chapter 13 of Chris Brook s Book Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 April 26, 2017 Christopher

More information

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی یادگیري ماشین توزیع هاي نمونه و تخمین نقطه اي پارامترها Sampling Distributions and Point Estimation of Parameter (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی درس هفتم 1 Outline Introduction

More information

Conjugate Models. Patrick Lam

Conjugate Models. Patrick Lam Conjugate Models Patrick Lam Outline Conjugate Models What is Conjugacy? The Beta-Binomial Model The Normal Model Normal Model with Unknown Mean, Known Variance Normal Model with Known Mean, Unknown Variance

More information

Generating Random Numbers

Generating Random Numbers Generating Random Numbers Aim: produce random variables for given distribution Inverse Method Let F be the distribution function of an univariate distribution and let F 1 (y) = inf{x F (x) y} (generalized

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Monte Carlo Methods Mark Schmidt University of British Columbia Winter 2019 Last Time: Markov Chains We can use Markov chains for density estimation, d p(x) = p(x 1 ) p(x }{{}

More information

Monitoring Accrual and Events in a Time-to-Event Endpoint Trial. BASS November 2, 2015 Jeff Palmer

Monitoring Accrual and Events in a Time-to-Event Endpoint Trial. BASS November 2, 2015 Jeff Palmer Monitoring Accrual and Events in a Time-to-Event Endpoint Trial BASS November 2, 2015 Jeff Palmer Introduction A number of things can go wrong in a survival study, especially if you have a fixed end of

More information

The Monte Carlo Method in High Performance Computing

The Monte Carlo Method in High Performance Computing The Monte Carlo Method in High Performance Computing Dieter W. Heermann Monte Carlo Methods 2015 Dieter W. Heermann (Monte Carlo Methods)The Monte Carlo Method in High Performance Computing 2015 1 / 1

More information

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50) Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February

More information

Distribution of state of nature: Main problem

Distribution of state of nature: Main problem State of nature concept Monte Carlo Simulation II Advanced Herd Management Anders Ringgaard Kristensen The hyper distribution: An infinite population of flocks each having its own state of nature defining

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 14th February 2006 Part VII Session 7: Volatility Modelling Session 7: Volatility Modelling

More information

THAT COSTS WHAT! PROBABILISTIC LEARNING FOR VOLATILITY & OPTIONS

THAT COSTS WHAT! PROBABILISTIC LEARNING FOR VOLATILITY & OPTIONS THAT COSTS WHAT! PROBABILISTIC LEARNING FOR VOLATILITY & OPTIONS MARTIN TEGNÉR (JOINT WITH STEPHEN ROBERTS) 6 TH OXFORD-MAN WORKSHOP, 11 JUNE 2018 VOLATILITY & OPTIONS S&P 500 index S&P 500 [USD] 0 500

More information

Personalized screening intervals for biomarkers using joint models for longitudinal and survival data

Personalized screening intervals for biomarkers using joint models for longitudinal and survival data Personalized screening intervals for biomarkers using joint models for longitudinal and survival data Dimitris Rizopoulos, Jeremy Taylor, Joost van Rosmalen, Ewout Steyerberg, Hanneke Takkenberg Department

More information

Dynamic Programming and Reinforcement Learning

Dynamic Programming and Reinforcement Learning Dynamic Programming and Reinforcement Learning Daniel Russo Columbia Business School Decision Risk and Operations Division Fall, 2017 Daniel Russo (Columbia) Fall 2017 1 / 34 Supervised Machine Learning

More information

Computational social choice

Computational social choice Computational social choice Statistical approaches Lirong Xia Sep 26, 2013 Last class: manipulation Various undesirable behavior manipulation bribery control NP- Hard 2 Example: Crowdsourcing...........

More information

GPD-POT and GEV block maxima

GPD-POT and GEV block maxima Chapter 3 GPD-POT and GEV block maxima This chapter is devoted to the relation between POT models and Block Maxima (BM). We only consider the classical frameworks where POT excesses are assumed to be GPD,

More information

A study of the Surplus Production Model. Université Laval.

A study of the Surplus Production Model. Université Laval. A study of the Surplus Production Model Université Laval Jérôme Lemay jerome.lemay@gmail.com June 28, 2007 Contents 1 Introduction 4 1.1 The Surplus Production Model........................... 5 1.1.1

More information