Combining imperfect data, and an introduction to data assimilation Ross Bannister, NCEO, September 2010

Similar documents
5. Best Unbiased Estimators

14.30 Introduction to Statistical Methods in Economics Spring 2009

A random variable is a variable whose value is a numerical outcome of a random phenomenon.

Parametric Density Estimation: Maximum Likelihood Estimation

Introduction to Probability and Statistics Chapter 7

. (The calculated sample mean is symbolized by x.)

Topic 14: Maximum Likelihood Estimation

ECON 5350 Class Notes Maximum Likelihood Estimation

Chapter 8. Confidence Interval Estimation. Copyright 2015, 2012, 2009 Pearson Education, Inc. Chapter 8, Slide 1

Statistics for Economics & Business

FINM6900 Finance Theory How Is Asymmetric Information Reflected in Asset Prices?

point estimator a random variable (like P or X) whose values are used to estimate a population parameter

r i = a i + b i f b i = Cov[r i, f] The only parameters to be estimated for this model are a i 's, b i 's, σe 2 i

Problem Set 1a - Oligopoly

5 Statistical Inference

1 Random Variables and Key Statistics

Maximum Empirical Likelihood Estimation (MELE)

STAT 135 Solutions to Homework 3: 30 points

x satisfying all regularity conditions. Then

Sampling Distributions and Estimation

Unbiased estimators Estimators

A Bayesian perspective on estimating mean, variance, and standard-deviation from data

Exam 1 Spring 2015 Statistics for Applications 3/5/2015

These characteristics are expressed in terms of statistical properties which are estimated from the sample data.

ii. Interval estimation:

FOUNDATION ACTED COURSE (FAC)

Chpt 5. Discrete Probability Distributions. 5-3 Mean, Variance, Standard Deviation, and Expectation

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Inferential Statistics and Probability a Holistic Approach. Inference Process. Inference Process. Chapter 8 Slides. Maurice Geraghty,

Hopscotch and Explicit difference method for solving Black-Scholes PDE

Lecture 5 Point Es/mator and Sampling Distribu/on


18.S096 Problem Set 5 Fall 2013 Volatility Modeling Due Date: 10/29/2013

Probability and statistics

Today: Finish Chapter 9 (Sections 9.6 to 9.8 and 9.9 Lesson 3)

Variance and Standard Deviation (Tables) Lecture 10

AY Term 2 Mock Examination

Notes on Expected Revenue from Auctions

1 Basic Growth Models

Lecture 4: Parameter Estimation and Confidence Intervals. GENOME 560 Doug Fowler, GS

0.1 Valuation Formula:

4.5 Generalized likelihood ratio test

Summary. Recap. Last Lecture. .1 If you know MLE of θ, can you also know MLE of τ(θ) for any function τ?

. The firm makes different types of furniture. Let x ( x1,..., x n. If the firm produces nothing it rents out the entire space and so has a profit of

Topic-7. Large Sample Estimation

INTERVAL GAMES. and player 2 selects 1, then player 2 would give player 1 a payoff of, 1) = 0.

Simulation Efficiency and an Introduction to Variance Reduction Methods

Dr. Maddah ENMG 624 Financial Eng g I 03/22/06. Chapter 6 Mean-Variance Portfolio Theory

Chapter 8: Estimation of Mean & Proportion. Introduction

Math 312, Intro. to Real Analysis: Homework #4 Solutions

Research Article The Probability That a Measurement Falls within a Range of n Standard Deviations from an Estimate of the Mean

A New Constructive Proof of Graham's Theorem and More New Classes of Functionally Complete Functions

1. Suppose X is a variable that follows the normal distribution with known standard deviation σ = 0.3 but unknown mean µ.

Mixed and Implicit Schemes Implicit Schemes. Exercise: Verify that ρ is unimodular: ρ = 1.

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

We learned: $100 cash today is preferred over $100 a year from now

Lecture 4: Probability (continued)

Exam 2. Instructor: Cynthia Rudin TA: Dimitrios Bisias. October 25, 2011

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 2

Quantitative Analysis

BASIC STATISTICS ECOE 1323

Overlapping Generations

Online appendices from The xva Challenge by Jon Gregory. APPENDIX 10A: Exposure and swaption analogy.

Binomial Model. Stock Price Dynamics. The Key Idea Riskless Hedge

Estimating Proportions with Confidence

Kernel Density Estimation. Let X be a random variable with continuous distribution F (x) and density f(x) = d

NOTES ON ESTIMATION AND CONFIDENCE INTERVALS. 1. Estimation

(Hypothetical) Negative Probabilities Can Speed Up Uncertainty Propagation Algorithms

Threshold Function for the Optimal Stopping of Arithmetic Ornstein-Uhlenbeck Process

Solutions to Problem Sheet 1

CHAPTER 2 PRICING OF BONDS

Standard Deviations for Normal Sampling Distributions are: For proportions For means _

CAUCHY'S FORMULA AND EIGENVAULES (PRINCIPAL STRESSES) IN 3-D

Subject CT5 Contingencies Core Technical. Syllabus. for the 2011 Examinations. The Faculty of Actuaries and Institute of Actuaries.

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

Monetary Economics: Problem Set #5 Solutions

CHAPTER 8: CONFIDENCE INTERVAL ESTIMATES for Means and Proportions

Lecture 5: Sampling Distribution

Correlation possibly the most important and least understood topic in finance

BIOSTATS 540 Fall Estimation Page 1 of 72. Unit 6. Estimation. Use at least twelve observations in constructing a confidence interval

CAPITAL ASSET PRICING MODEL

AMS Portfolio Theory and Capital Markets

Bayes Estimator for Coefficient of Variation and Inverse Coefficient of Variation for the Normal Distribution

Sampling Distributions and Estimation

CHAPTER 8 Estimating with Confidence

Rafa l Kulik and Marc Raimondo. University of Ottawa and University of Sydney. Supplementary material

1 Estimating sensitivities

Control Charts for Mean under Shrinkage Technique

Chapter Four 1/15/2018. Learning Objectives. The Meaning of Interest Rates Future Value, Present Value, and Interest Rates Chapter 4, Part 1.

CAPITAL PROJECT SCREENING AND SELECTION

The material in this chapter is motivated by Experiment 9.

PORTFOLIO THEORY: MANAGING BIG DATA

Quantitative Analysis

Confidence Intervals Introduction

Building a Dynamic Two Dimensional Heat Transfer Model part #1

Point Estimation by MLE Lesson 5

Sampling Distributions & Estimators

Math 124: Lecture for Week 10 of 17

Models of Asset Pricing

Estimating Forward Looking Distribution with the Ross Recovery Theorem

Transcription:

Combiig imperfect data, ad a itroductio to data assimilatio Ross Baister, NCEO, September 00 rbaister@readigacuk The probability desity fuctio (PDF prob that x lies betwee x ad x + dx p (x restrictio o p (x x dx p (x expectatio value of f (x expectatio value of x (the mea x dx f (x p (x f (x x dx xp (x x jth momet of x aroud x x dx (x x j p (x (x x j j momet x dx (x x p (x x x 0 j momet (the variace x dx (x x p (x σ x The Gaussia (or ormal distributio is a commoly used example of p (x p (x N (µ, σ σ (x µ exp σ p (x σ For these otes, x may be cosidered to be a measuremet of some variable which is subject to a ormally distributed error with stadard deviatio σ If the measuremet error is ubiased, the the mea, µ, is the true value The PDF for a umber of imperfect observatios No measuremet is exact, ad so all measuremets have error The error is umeasureable, but we assume that we kow its statistics (the PDF We wish to combie N ubiased, ormally distributed measuremets to estimate the true value, ad its ucertaity Let the th measuremet be x, ad let the possible true value be x The PDF of this measuremet is p (x σ µ x The otatio p (x x meas the probability that measuremet x lies betwee x ad x + dx

give that the true value is x The combied PDF for N measuremets of the same quatity is p (x, x, x N x p (x x p (x x p N (x N x, N N σ p (x x, ( ( N N/, σ ( N Whe cosidered a fuctio of x, this PDF is called a likelihood fuctio We wish to calculate the value of x that maximizes this likelihood (the maximum likelihood estimate, The that maximizes p (x, x, x N is the same that maximizes l p (x, x, x N l p (x, x, x N l ( N/ N σ N The that maximizes l p (x, x, x N is the same x that miimizes l p (x, x, x N l p (x, x, x N l ( N/ + N costat + I (, σ + N (x (x, I ( (x where N I ( is sometimes called a cost fuctio The maximum likelihood estimate of to solvig the least squares problem above Miimizig the cost fuctio Differetiate I ( with respect to di d N x Set to zero for the miimum (the fuctio I (x is cocave N x 0, N x σ N σ is equivalet The iverse variaces as weights This problem does allow for the fact that some measuremets are more accurate tha others (eg more accurate istrumet more accurate measuremet Cosider the case for two measuremets x σ + x σ σ + σ larger value of σ

If measuremet has much better accuracy tha measuremet, the σ σ The x σ x σ, ad so measuremet will ot be cosidered very strogly by the procedure (automatically If the two measuremets have the same accuracy the the maximum likelihood estimate will be a arithmetic mea of the two x + x The variace of the maximum likelihood estimate Calculatig the variace of the maximum likelihood ca be doe without revertig to doig some difficult momet itegrals The error i the estimate is x The variace of the estimate, x N x σ N σ x N (x x σ N σ, is the mea-square of this error σ e ( N σ σ e ( N ( N (x x σ ( N σ m N σ (x x σ ( N m, (x m x σ m, σ σ m (x x (x m x The errors i each measuremet are assumed to be ucorrelated, so (x x (x m x δm σ σ e ( N σ N σ N σ Note that σ e has the property that it is smaller tha (or equal to if there is just oe observatio the variace of ay of the idividual observatios Agai, cosider the case of two measuremets σ e σ e σ + σ If measuremet has much better accuracy tha measuremet, the σ σ The σ e σ, ie the estimate is the same as measuremet (result foud before ad the variace of the estimate is the same as that of measuremet If the two measuremets have the same accuracy the the variace of the estimate is halved σ e σ If all N measuremets have the same accuracy the the followig classical result is foud σ e σ N, ie σ e σ N 3

Geeralizatios - itroductio to data assimilatio The above example is limited i the followig ways Oe quatity, x, is estimated May observatios are made The observatios are direct observatios of the ukow quatity The measuremet errors are ucorrelated The problem ca be geeralized to deal with may quatities to be estimated, measuremets which may observe the quatities idirectly ad whose errors may be correlated A idirect observatio is oe that measures some fuctio of the ukow quatities, istead of the quatities themselves Some example are as follows Measuremets of wid speed ad directio whe the orth/south, east/west wid compoets are required Measuremets of temperature ad pressure whe the potetial temperature is required Measuremets of the temperature over a large regio whe the local temperatures are required Measuremets from space of the thermal radiatio emitted by a colum of the atmosphere whe the vertical profile of temperature is required The followig otatio is used Symbol Meaig Referece y Vector of p observatios Observatio vector x Vector of q ukow quatities State vector h (x Simulated observatios accordig to x Observatio operator R Matrix of observatio error covariaces Observatio error covariace matrix x b Prior iformatio about x Backgroud or a-priori B Matrix of error covariaces of x b Backgroud error covariace matrix A least squares problem ca be costructed alog the same lies as the oe for the sigle ukow quatity case J (x (y h (xt R (y h (x p p p p The traspose operator turs the colum vector ito a row vector ad the above evaluates to a scalar quatity The problem is to miimize J (x to fid This ca be doe oly whe there is eough iformatio i the observatio vector to determie the state vector A ecessary (but ot sufficiet coditio coditio for this is p q If h (x is a liear fuctio the it may be represeted as the p q matrix H The the cost fuctio becomes J (x (y HxT R (y Hx The cost fuctio may be miimized by fidig the gradiet of J with respect to each elemet of x This is represeted by the vector x J, which is the followig q-elemet vector x J H T R (y Hx 4

Settig the gradiet to zero (to fid the x that miimizes J gives rise to the so-called 'ormal equatios' H T R H H T R y, (H T R H H T R y H T R H is a q q matrix The coditio for this solutio to exist lies i the properties of H T R H The coditio is that H T R H must be o-sigular (eg have o zero eigevalues The error covariace of, deoted A, is foud to be the followig (ot prove here A (H T R H I data assimilatio, there are usually very may more ukows i the state vector tha there are observatios i the observatio vector (p < q I this case, H T R H is sigular ad the best fit solutio caot be foud I this case extra iformatio is required, which comes from prior iformatio, x b This is called the 'backgroud state' or 'a-priori state' ad comes from a umerical forecast of the curret state of the atmosphere where this is available Its error covariace is deoted B The ew cost fuctio fits to the data ad to the a-priori simultaeously The miimum at J (x (x x b T B (x x b + (y h (xt R (y h (x x is x b + BH T (R + HBH T (y h (x b, where H is the liearizatio (Jacobia of h The error covariace of is Refereces A (B + H T R H, (I BH T (R + HBH T H B Kalay E, Atmospheric Modellig, Data Assimilatio ad Predictability, Ch 5 Daley R, Atmospheric Data Aalysis, Ch3 ECMWF, Data assimilatio course hadouts, http://wwwecmwfit/ewsevets/ traiig/lecture_otes/ln_dahtml 5