Combiig imperfect data, ad a itroductio to data assimilatio Ross Baister, NCEO, September 00 rbaister@readigacuk The probability desity fuctio (PDF prob that x lies betwee x ad x + dx p (x restrictio o p (x x dx p (x expectatio value of f (x expectatio value of x (the mea x dx f (x p (x f (x x dx xp (x x jth momet of x aroud x x dx (x x j p (x (x x j j momet x dx (x x p (x x x 0 j momet (the variace x dx (x x p (x σ x The Gaussia (or ormal distributio is a commoly used example of p (x p (x N (µ, σ σ (x µ exp σ p (x σ For these otes, x may be cosidered to be a measuremet of some variable which is subject to a ormally distributed error with stadard deviatio σ If the measuremet error is ubiased, the the mea, µ, is the true value The PDF for a umber of imperfect observatios No measuremet is exact, ad so all measuremets have error The error is umeasureable, but we assume that we kow its statistics (the PDF We wish to combie N ubiased, ormally distributed measuremets to estimate the true value, ad its ucertaity Let the th measuremet be x, ad let the possible true value be x The PDF of this measuremet is p (x σ µ x The otatio p (x x meas the probability that measuremet x lies betwee x ad x + dx
give that the true value is x The combied PDF for N measuremets of the same quatity is p (x, x, x N x p (x x p (x x p N (x N x, N N σ p (x x, ( ( N N/, σ ( N Whe cosidered a fuctio of x, this PDF is called a likelihood fuctio We wish to calculate the value of x that maximizes this likelihood (the maximum likelihood estimate, The that maximizes p (x, x, x N is the same that maximizes l p (x, x, x N l p (x, x, x N l ( N/ N σ N The that maximizes l p (x, x, x N is the same x that miimizes l p (x, x, x N l p (x, x, x N l ( N/ + N costat + I (, σ + N (x (x, I ( (x where N I ( is sometimes called a cost fuctio The maximum likelihood estimate of to solvig the least squares problem above Miimizig the cost fuctio Differetiate I ( with respect to di d N x Set to zero for the miimum (the fuctio I (x is cocave N x 0, N x σ N σ is equivalet The iverse variaces as weights This problem does allow for the fact that some measuremets are more accurate tha others (eg more accurate istrumet more accurate measuremet Cosider the case for two measuremets x σ + x σ σ + σ larger value of σ
If measuremet has much better accuracy tha measuremet, the σ σ The x σ x σ, ad so measuremet will ot be cosidered very strogly by the procedure (automatically If the two measuremets have the same accuracy the the maximum likelihood estimate will be a arithmetic mea of the two x + x The variace of the maximum likelihood estimate Calculatig the variace of the maximum likelihood ca be doe without revertig to doig some difficult momet itegrals The error i the estimate is x The variace of the estimate, x N x σ N σ x N (x x σ N σ, is the mea-square of this error σ e ( N σ σ e ( N ( N (x x σ ( N σ m N σ (x x σ ( N m, (x m x σ m, σ σ m (x x (x m x The errors i each measuremet are assumed to be ucorrelated, so (x x (x m x δm σ σ e ( N σ N σ N σ Note that σ e has the property that it is smaller tha (or equal to if there is just oe observatio the variace of ay of the idividual observatios Agai, cosider the case of two measuremets σ e σ e σ + σ If measuremet has much better accuracy tha measuremet, the σ σ The σ e σ, ie the estimate is the same as measuremet (result foud before ad the variace of the estimate is the same as that of measuremet If the two measuremets have the same accuracy the the variace of the estimate is halved σ e σ If all N measuremets have the same accuracy the the followig classical result is foud σ e σ N, ie σ e σ N 3
Geeralizatios - itroductio to data assimilatio The above example is limited i the followig ways Oe quatity, x, is estimated May observatios are made The observatios are direct observatios of the ukow quatity The measuremet errors are ucorrelated The problem ca be geeralized to deal with may quatities to be estimated, measuremets which may observe the quatities idirectly ad whose errors may be correlated A idirect observatio is oe that measures some fuctio of the ukow quatities, istead of the quatities themselves Some example are as follows Measuremets of wid speed ad directio whe the orth/south, east/west wid compoets are required Measuremets of temperature ad pressure whe the potetial temperature is required Measuremets of the temperature over a large regio whe the local temperatures are required Measuremets from space of the thermal radiatio emitted by a colum of the atmosphere whe the vertical profile of temperature is required The followig otatio is used Symbol Meaig Referece y Vector of p observatios Observatio vector x Vector of q ukow quatities State vector h (x Simulated observatios accordig to x Observatio operator R Matrix of observatio error covariaces Observatio error covariace matrix x b Prior iformatio about x Backgroud or a-priori B Matrix of error covariaces of x b Backgroud error covariace matrix A least squares problem ca be costructed alog the same lies as the oe for the sigle ukow quatity case J (x (y h (xt R (y h (x p p p p The traspose operator turs the colum vector ito a row vector ad the above evaluates to a scalar quatity The problem is to miimize J (x to fid This ca be doe oly whe there is eough iformatio i the observatio vector to determie the state vector A ecessary (but ot sufficiet coditio coditio for this is p q If h (x is a liear fuctio the it may be represeted as the p q matrix H The the cost fuctio becomes J (x (y HxT R (y Hx The cost fuctio may be miimized by fidig the gradiet of J with respect to each elemet of x This is represeted by the vector x J, which is the followig q-elemet vector x J H T R (y Hx 4
Settig the gradiet to zero (to fid the x that miimizes J gives rise to the so-called 'ormal equatios' H T R H H T R y, (H T R H H T R y H T R H is a q q matrix The coditio for this solutio to exist lies i the properties of H T R H The coditio is that H T R H must be o-sigular (eg have o zero eigevalues The error covariace of, deoted A, is foud to be the followig (ot prove here A (H T R H I data assimilatio, there are usually very may more ukows i the state vector tha there are observatios i the observatio vector (p < q I this case, H T R H is sigular ad the best fit solutio caot be foud I this case extra iformatio is required, which comes from prior iformatio, x b This is called the 'backgroud state' or 'a-priori state' ad comes from a umerical forecast of the curret state of the atmosphere where this is available Its error covariace is deoted B The ew cost fuctio fits to the data ad to the a-priori simultaeously The miimum at J (x (x x b T B (x x b + (y h (xt R (y h (x x is x b + BH T (R + HBH T (y h (x b, where H is the liearizatio (Jacobia of h The error covariace of is Refereces A (B + H T R H, (I BH T (R + HBH T H B Kalay E, Atmospheric Modellig, Data Assimilatio ad Predictability, Ch 5 Daley R, Atmospheric Data Aalysis, Ch3 ECMWF, Data assimilatio course hadouts, http://wwwecmwfit/ewsevets/ traiig/lecture_otes/ln_dahtml 5