The Kalman filter - and other methods

Similar documents
The Kalman filter - and other methods

Monitoring - revisited

Interval estimation. September 29, Outline Basic ideas Sampling variation and CLT Interval estimation using X More general problems

Monitoring and data filtering I. Classical Methods

Introduction to Statistics I

Distribution of state of nature: Main problem

Characterization of the Optimum

Statistics and Probability

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

Estimation of Volatility of Cross Sectional Data: a Kalman filter approach

CSC Advanced Scientific Programming, Spring Descriptive Statistics

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Confidence Intervals Introduction

Lecture # 35. Prof. John W. Sutherland. Nov. 16, 2005

STAT Chapter 7: Confidence Intervals

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

FEEG6017 lecture: The normal distribution, estimation, confidence intervals. Markus Brede,

STAT 157 HW1 Solutions

Reading the Tea Leaves: Model Uncertainty, Robust Foreca. Forecasts, and the Autocorrelation of Analysts Forecast Errors

Business Statistics 41000: Probability 3

Risk Management, Qualtity Control & Statistics, part 2. Article by Kaan Etem August 2014

Of the tools in the technician's arsenal, the moving average is one of the most popular. It is used to

Volatility of Asset Returns

STA Module 3B Discrete Random Variables

Chapter 16. Random Variables. Copyright 2010 Pearson Education, Inc.

Making Hard Decision. ENCE 627 Decision Analysis for Engineering. Identify the decision situation and understand objectives. Identify alternatives

Likelihood-based Optimization of Threat Operation Timeline Estimation

Chaos Barometer. Chaos Measurement Oscillator for Financial Markets.

Confidence Intervals. σ unknown, small samples The t-statistic /22

Chapter 4: Estimation

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 23

Chapter 8. Introduction to Statistical Inference

yuimagui: A graphical user interface for the yuima package. User Guide yuimagui v1.0

Introduction to Statistical Data Analysis II

5.3 Interval Estimation

Time Observations Time Period, t

Sampling and sampling distribution

The Assumption(s) of Normality

To apply SP models we need to generate scenarios which represent the uncertainty IN A SENSIBLE WAY, taking into account

Sensitivity analysis for risk-related decision-making

Counting Basics. Venn diagrams

Copyright 2011 Pearson Education, Inc. Publishing as Addison-Wesley.

Statistics Class 15 3/21/2012

Chapter 4 Variability

Value at Risk Ch.12. PAK Study Manual

Martingales, Part II, with Exercise Due 9/21

RISK ANALYSIS OF LIFE INSURANCE PRODUCTS

Sampling variability. Data Science Team

Chapter 8 Statistical Intervals for a Single Sample

Measuring the Amount of Asymmetric Information in the Foreign Exchange Market

Estimation of dynamic term structure models

Statistics for Business and Economics

the display, exploration and transformation of the data are demonstrated and biases typically encountered are highlighted.

Portfolio Analysis with Random Portfolios

Comments on Foreign Effects of Higher U.S. Interest Rates. James D. Hamilton. University of California at San Diego.

Statistical estimation

UNIVERSITY OF OSLO. Please make sure that your copy of the problem set is complete before you attempt to answer anything.

The test has 13 questions. Answer any four. All questions carry equal (25) marks.

Homework: Due Wed, Nov 3 rd Chapter 8, # 48a, 55c and 56 (count as 1), 67a

The Control Chart for Attributes

Portfolio Optimization

This homework assignment uses the material on pages ( A moving average ).

Random Variables and Probability Distributions

Chapter 4 Probability Distributions

Data Analysis and Statistical Methods Statistics 651

STA 220H1F LEC0201. Week 7: More Probability: Discrete Random Variables

Diploma in Business Administration Part 2. Quantitative Methods. Examiner s Suggested Answers

Expected Value of a Random Variable

19. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE

5.3 Statistics and Their Distributions

STA Rev. F Learning Objectives. What is a Random Variable? Module 5 Discrete Random Variables

Version A. Problem 1. Let X be the continuous random variable defined by the following pdf: 1 x/2 when 0 x 2, f(x) = 0 otherwise.

Lecture 9. Probability Distributions. Outline. Outline

ST440/550: Applied Bayesian Analysis. (5) Multi-parameter models - Summarizing the posterior

The Central Limit Theorem. Sec. 8.2: The Random Variable. it s Distribution. it s Distribution

1/3/12 AP STATS. WARM UP: How was your New Year? EQ: HW: Pg 381 #1, 2, 3, 6, 9, 10, 17, 18, 24, 25, 31. Chapter

Lecture 16: Estimating Parameters (Confidence Interval Estimates of the Mean)

The Forex Report CORE CONCEPTS. J A N U A R Y Signal Selection By Scott Owens

Lecture 9. Probability Distributions

8.1 Estimation of the Mean and Proportion

Economics 883: The Basic Diffusive Model, Jumps, Variance Measures. George Tauchen. Economics 883FS Spring 2015

NEWCASTLE UNIVERSITY. School SEMESTER /2013 ACE2013. Statistics for Marketing and Management. Time allowed: 2 hours

Game Theory and Economics Prof. Dr. Debarshi Das Department of Humanities and Social Sciences Indian Institute of Technology, Guwahati.

Final exam solutions

Section 0: Introduction and Review of Basic Concepts

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

Chapter 8 Estimation

Modeling the extremes of temperature time series. Debbie J. Dupuis Department of Decision Sciences HEC Montréal

Audit Sampling: Steering in the Right Direction

Chapter 9. Idea of Probability. Randomness and Probability. Basic Practice of Statistics - 3rd Edition. Chapter 9 1. Introducing Probability

Econ 101A Final Exam We May 9, 2012.

Chapter 6 Simple Correlation and

Data that can be any numerical value are called continuous. These are usually things that are measured, such as height, length, time, speed, etc.

When we look at a random variable, such as Y, one of the first things we want to know, is what is it s distribution?

Conover Test of Variances (Simulation)

Midterm Exam III Review

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

BUSINESS MATHEMATICS & QUANTITATIVE METHODS

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Transcription:

The Kalman filter - and other methods Anders Ringgaard Kristensen Slide 1 Outline Filtering techniques applied to monitoring of daily gain in slaughter pigs: Introduction Basic monitoring Shewart control charts DLM and the Kalman filter Simple case Seasonality Online monitoring Used as input to decision support Slide 2 1

E-kontrol, slaughter pigs Quarterly calculated production results Presented as a table A result for each of the most recent quarters and aggregated Sometimes comparison with expected (target) values Offered by two companies: Dansk Landbrugsrådgivning, Landscentret (as shown) AgroSoft A/S One of the most important key figures: Average daily gain Slide 3 Average daily gain, slaughter pigs We have: 4 quarterly results 1 annual result 1 target value How do we interpret the results? Question 1: How is the figure calculated? Slide 4 2

How is the figure calculated? The basic principles are: Total (live) weight of pigs delivered: xxxx Total weight of piglets inserted: xxxx Valuation weight at end of the quarter: +xxxx Valuation weight at beginning of the quarter: xxxx Total gain during the quarter yyyy Daily gain = (Total gain)/(days in feed) Registration sources? * Slaughter house rather precise ** Scale very precise ***??? anything from very precise to very uncertain * ** *** *** Slide 5 First finding: Observation error All measurements are encumbered with uncertainty (error), but it is most prevalent for the valuation weights. We define a (very simple) model: κ = τ + e o, where: κ is the calculated daily gain (as it appears in the report) τ is the true daily gain (which we wish to estimate) e o is the observation error which we assume is normally distributed N(0, σo 2) The structure of the model (qualitative knowledge) is the equation The parameters (quantitative knowledge) is the value of σ o (the standard deviation of the observation error). It depends on the observation method. Slide 6 3

Observation error τ κ κ = τ + e o, e o N(0, σ o 2 ) What we measure is κ What we wish to know is τ The difference between the two variables is undesired noise We wish to filter the noise away, i.e. we wish to estimate τ from κ Slide 7 Second finding: Randomness The true daily gains τ vary at random. Even if we produce under exactly the same conditions in two successive quarters the results will differ. We shall denote the phenomenon as the sample error. We have, τ = θ + e s, where e s is the sample error expressing random variation. We assume e s N(0, σ s 2) θ is the underlying permanent (and true) value This supplementary qualitative knowledge should be reflected in the stucture of the model: κ = τ + e = θ + e + e o s o The parameters of the model are now: σ og σ o s Slide 8 4

Sample error and measurement error θ τ κ What we measure is κ What we wish to know is θ The difference between the two variables is undesired noise: Sample noise Observation noise We wish to filter the noise away, i.e. we wish to estimate θ from κ Slide 9 The model in practice: Preconditions The model is necessary for any meaningful interpretation of calculated production results. The standard deviation on the sample error, σ s, depends on the natural individual variation between pigs in a herd and the herd size. The standard deviation of the observation error, σ o, depends on the measurement method of valuation weights. For the interpretation of the calculated results, it is the total uncertainty, σ, that matters (σ 2 = σ s 2 + σ ο 2 ) Competent guesses of the value of σ using different observation methods (1250 pigs): Weighing of all pigs: σ = 3 g Stratified sample: σ = 7 g Random sample: σ = 20 g Visual assessment: σ = 29 g Slide 10 5

Different observation methods θ τ κ κ κ κ σ = 3 g σ = 7 g σ = 20 g σ = 29 g Slide 11 The model in practice: Interpretation Calculated daily gain in a herd was 750 g, whereas the expected target value was 775 g. Shall we be worried? It depends on the observation method! A lower control limit (LCL) is the target minus 2 times the standard deviation, i.e. 775 2σ Using each of the 4 observation methods, we obtain the following LCLs: Weighing of all pigs: 775 g 2 x 3 g = 769 Stratified sample: 775 g 2 x 7 g = 761 Random sample: 775 g 2 x 20 g = 735 Visual assessment: 775 g 2 x 29 g = 717 Slide 12 6

Third finding: Dynamics, time Daily gain, slaughter pigs g 950 900 850 800 750 700 650 600 2. quarter 97 3. quarter 97 4. quarter 97 1. quarter 98 2. quarter 98 3. quarter 98 4. quarter 98 1. quarter 99 2. quarter 99 3. quarter 99 4. quarter 99 1. quarter 00 2. quarter 00 3. quarter 00 4. quarter 00 1. quarter 01 2. quarter 01 Quarter Daily gain in a herd over 4 years. Is this good or bad? Slide 13 Modeling dynamics We extend our model to include time. At time n we model the calculated result as follows: κ n = τ sn + e on = θ + e sn + e on Only change from before is that we know we have a new result each quarter. We can calculate control limits for each quarter and plot everything in a diagram: A Shewart Control Chart θ τ 1 τ 2 τ 3 τ 4 κ 1 κ 2 κ 3 κ 4 Slide 14 7

A simple Shewart control chart: Weighing all pigs Daily gain, slaughter pigs g 950 900 850 800 750 700 650 600 2. kvartal 97 3. kvartal 97 4. kvartal 97 1. kvartal 98 2. kvartal 98 3. kvartal 98 4. kvartal 98 1. kvartal 99 2. kvartal 99 3. kvartal 99 4. kvartal 99 1. kvartal 00 2. kvartal 00 3. kvartal 00 4. kvartal 00 1. kvartal 01 2. kvartal 01 Period Periode Observed gain Upper control limit Expected Lower control limit Slide 15 Simple Shewart control chart: Visual assessment Daily gain, slaughter pigs g 950 900 850 800 750 700 650 600 2. kvartal 97 3. kvartal 97 4. kvartal 97 1. kvartal 98 2. kvartal 98 3. kvartal 98 4. kvartal 98 1. kvartal 99 2. kvartal 99 3. kvartal 99 4. kvartal 99 1. kvartal 00 2. kvartal 00 3. kvartal 00 4. kvartal 00 1. kvartal 01 2. kvartal 01 Period Periode Observed gain Upper control limit Expected Lower control limit Slide 16 8

Interpretation: Conclusion Something is wrong! Possible explanations: The pig farmer has serious problems with fluctuating daily gains. Something is wrong with the model: Structure our qualitative knowledge Parameters the quantitative knowledge (standard deviations). Slide 17 More findings: κ n = θ + e sn + e on The true underlying daily gain in the herd, θ, may change over time: Trend Seasonal variation The sample error e sn may be auto correlated Temporary influences The observation error e on is obviously auto correlated: Valuation weight at the end of Quarter n is the same as the valuation weight at the start of Quarter n+1 Slide 18 9

Dynamisk e-kontrol Developed and described by Madsen & Ruby (2000). Principles: Avoid labor intensive valuation weighing. Calculate new daily gain every time pigs have been sent to slaughter (typically weekly) Use a simple Dynamic Linear Model to monitor daily gain κ n = θ n + e sn + e on = θ n + v n, where v n N(0, σ v2 ) θ n = θ n-1 + w n, where w n N(0, σ w2 ) The calculated results are filtered by the Kalman filter in order to remove random noise (sample error + observation error) Slide 19 Dynamisk E-kontrol, results Raw data to the left filtered data to the right Figures from: Madsen & Ruby (2000). An application for early detection of growth rate changes in the slaughter pig production unit. Computers and Electronics in Agriculture 25, 261-270. Still: Results only available after slaughter Slide 20 10

The Dynamic Linear Model (DLM) Example Observation equation κ n = θ n + v n, v n N(0, σ v2 ) System equation θ n = θ n-1 + w n, w n N(0, σ w2 ) General, first order Observation equation Y t = µ t + v t, v n N(0, σ v2 ) System equation µ t = µ t-1 + w n, w n N(0,σ w2 ) θ 1 θ 2 θ 3 θ 4 µ 1 µ 2 µ 3 µ 4 τ 1 τ 2 τ 3 τ 4 Y 1 Y 2 Y 3 Y 4 κ 1 κ 2 κ 3 κ 4 Slide 21 Extending the model F n θ n is the true level described as a vector product. A general level, θ 0n, and 4 seasonal effects θ 1n, θ 2n, θ 3n and θ 4n are included in the model. From the model we are able to predict the expected daily gain for next quarter. As long as the forecast errors are small, production is in control (no large change in true underlying level)! Slide 22 11

Observed and predicted Daily gain g 950 900 850 800 750 700 650 600 Blue: Observed Pink: Predicted 2. kvartal 97 4. kvartal 97 2. kvartal 98 4. kvartal 98 2. kvartal 99 4. kvartal 99 2. kvartal 00 4. kvartal 00 Quarter Slide 23 Analysis of prediction errors Daily gain g 100 80 60 40 20 0-20 -40-60 -80-100 2. kvartal 97 4. kvartal 97 2. kvartal 98 4. kvartal 98 2. kvartal 99 4. kvartal 99 2. kvartal 00 4. kvartal 00 Quarter Slide 24 12

The last model Dynamic Linear Model Structure of the model (qualitative knowledge): Seasonal variation allowed (no assumption about the size). The general level as well as the seasonal pattern may change over time. Are those assumptions correct? Parameters of the model: The observation and sample variance and the system variance. The model learns as observations are done, and adapts to the observations over time. Seasonal varation may be modeled more sophistically as demonstrated by Thomas Nejsum Madsen in Farm Watch Slide 25 Moral If we wish to analyze the daily gain of a herd you need to: Know exactly how the observations are done (and know the precision). Know how it may naturally develop over time. Without professional knowledge you may conclude anything. Without a model you may interpret the results inadequately. Through the structure of the model we apply our professional knowledge to the problem. Slide 26 13

On-line monitoring of slaughter pigs: PigVision Innovation project led by Danish Pig Production: Danish Institute of Agricultural Sciences Videometer (external assistance) Skov A/S LIFE, IPH, Production and Health Continuous monitoring of daily gain while still in herd: Dynamic Linear Models Chance of interference in the fattening period Adaptation of delivery policy Slide 27 PigVision: Principles A camera is placed above the pen. In case of movements a series of pictures are recorded and sent to a computer. The computer automatically identifies the pig (by use of a model) and calculates the area (seen from above). If the computer doesn t belief that a pig has been identified, the picture is ignored. The area is converted to live weight (using a model). Through many pictures, the average weight and the standard deviation are estimated. Figure by Teresia Heiskanen Slide 28 14

What is online weight assessment used for? Continuous monitoring of gain. Collection of evidence about growth capacity (learning) Adaptation of delivery policies depending on: Whether the pigs grow fast or slowly Whether the uniformity is small or big Whether a new batch of piglets is ready Prices Direct advice about pigs to deliver Slide 29 The decision support model Technique: A hierarchical Markov Decision Process (dynamic programming) with a Dynamic Linear Model (DLM) embedded. Every week, the average weight and the standard deviation is observed After each observation the parameters of the DLM are opdated using Kalman filtering: Permanent growth capacity of pigs, L Temporary deviation, e(t) Within-pen standard deviation, ρ(t) Decisions based on (state space): Number of pigs left Estimated values of the 3 parameters Decision: Deliver all pigs with live weight bigger than a threshold Uncertainty of knowledge is directly built into the model through the DLM Slide 30 15

On-line weight assessment Pen with n pigs is monitored. No identification of pigs. At any time t we have: The precision 1/σ 2 is assumed known Slide 31 Objectives Given the on-line weight estimates to assign an optimal delivery policy for the pigs in the pen. Sequential (weekly) decision problem with decisions at two levels: Slaughtering of individual pigs (the price is highest in a rather narrow interval) Terminating the batch (slaughter all remaining pigs and insert a new batch of weaners) Slide 32 16

Dynamic linear models Slide 33 A dynamic linear weight model, I Known average herd specific growth curve: True weights at time t distributed as: Slide 34 17

The scaling factor L In principle unknown and not directly observable Initial belief: The belief is updated each time we observe a set of live weights from the pen. Let L N(1, σ L 2 ) be the true average weight Then Slide 35 Observation & system equation 1 Full observation equation for mean: Auto-correlated sample error (system eq.): Slide 36 18

Observation & system equation 2 Far more information available from the observed live weights Sample variance not normally distributed. Use the 0.16 sample quantile: The symbol ρ(t) is the standard deviation of the observed values. System equation: Slide 37 Full equation set Slide 38 19

Learning, permanent growth capacity L = 1,00 1,15 L= 0,85 1,15 1,05 1,05 0,95 1 2 3 4 5 6 7 8 9 10 11 12 0,95 1 2 3 4 5 6 7 8 9 10 11 12 0,85 0,85 Sand værdi Lært værdi Sand værdi Lært værdi L = 1,07 L = 1,12 1,15 1,15 1,05 1,05 0,95 1 2 3 4 5 6 7 8 9 10 11 12 0,95 1 2 3 4 5 6 7 8 9 10 11 12 0,85 0,85 Sand værdi Lært værdi Sand værdi Lært værdi Slide 39 Learning: Homogeneity (standard deviation) Spredning = 3 Spredning = 11 21 18 15 12 9 6 3 1 2 3 4 5 6 7 8 9 10 11 12 21 18 15 12 9 6 3 1 2 3 4 5 6 7 8 9 10 11 12 Sand værdi Lært værdi Sand værdi Lært værdi Slide 40 20