Describing Uncertain Variables L7
Uncertainty in Variables Uncertainty in concepts and models Uncertainty in variables Lack of precision Lack of knowledge Variability in space/time
Describing Uncertainty - Verbal Between x and y Less than x Approximately x Very unlikely / very likely More likely than not, Beyond reasonable doubt Good enough
Describing uncertainty - numerical Probability density function (PDF) Continuous or discontinuous Cumulative distribution function (CDF) CDF is (PDF)
f(x) PDF and CDF 1.2 1 0.8 0.6 Normal distribution Cumulative normal distribution 0.4 0.2 0-3 -2-1 0 1 2 3 4 5 6 x
f(x) Example functions μ=1, σ=1 1.2 1 0.8 Normal Lognormal 0.6 Uniform Triangular 1 Triangular 2 0.4 0.2 0-3 -2-1 0 1 2 3 4 5 6 x
Nomenclature Variables in a model Parameters of a distribution Population all possible measurements Sample the measurements we have, a sub-set of the population (measurements may have been made on specimens/samples) Model a (simplified) description of reality Realisation a model calculated with a single set of values for the variables Ensemble a group of realisations Consequence, Probability, Risk
Assigning a Probability Density Function A PDF is a model of our data It is a simplification introduces some uncertainty by leaving out information Assign function Need to choose a distribution function Need to assign parameters to function Parameters derived from data and/or expert judgment
Types of Distributions Normal Log-normal Uniform Triangular Gamma Many variables far from zero follow this, e.g. porosity. Arithmetic mean follows this distribution Approximation to the distribution of positive variables close to zero. Geometric mean follows this distribution For example, the timing of a bounded event such as a leak Approximation for normal/lognormal distribution General purpose asymmetric distribution Exponential Poisson The time interval between random events, e.g. radioactive decays The number of random events in a specified time discrete function Binomial The number of successes from a trial, e.g. coin tossing discrete
Central tendency (Arithmetic) mean Geometric mean Harmonic mean Median Mode
Normal Distribution μ, σ Symmetrical about mean - to + Arithmetic mean = median = mode
f(x) Normal distribution 1.2 1 0.8 0.6 0.4 Normal distribution Cumulative normal distribution 0.2 0-3 -2-1 0 1 2 3 4 5 6 x
Log-normal distribution μ, σ Asymmetrical 0 to + Arithmetic mean > median > mode Think carefully about transformation!
f(x) Lognormal distribution 1.2 1 0.8 0.6 Lognormal Cumulative lognormal 0.4 0.2 0-1 1 3 5 7 9 x
Useful formulae x y ln 2 exp ˆ y y s 1 exp ˆ ˆ 2 2 2 y s 2 ˆ ln 2 s y y 1 ˆ ˆ ln 2 2 2 s y
Uniform distribution a, b Symmetrical about mean a to b Arithmetic mean = median, mode not defined
f(x) Uniform distribution 1.2 1 0.8 Uniform 0.6 Cumulative uniform 0.4 0.2 0-3 -2-1 0 1 2 3 4 5 6 x
Triangular Distribution a, b, c Symmetrical or asymmetrical a to c Arithmetic mean = median = mode, if symmetrical Arithmetic mean median mode, if asymmetrical
f(x) Triangular distributions 1.2 1 0.8 Triangular 1 Triangular 2 0.6 0.4 Cumulative triangular 1 Cumulative triangular 2 0.2 0-3 -2-1 0 1 2 3 4 5 6 x
Determining a PDF Sufficient data Enough data for statistical analysis, typically > 30 values Calculate PDF from data Limited data Too few for full analysis, <30 values Assume a type of PDF, estimate parameters from data Little or no data Elicitation Bayesian updating
Fitting PDFs to Data (1) Examine the data Spatial plots Time series plots Correlations Histograms and cumulative plots Quantile plots Statistical tests against distributions
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 More Frequency Histogram 25 120.00% 20 100.00% 15 10 Frequency Cumulative % 80.00% 60.00% 40.00% 5 20.00% 0 0.00% Value
Experimental Quantile plot normal distribution 4 3 2 1 0-3 -2-1 0 1 2 3-1 -2-3 Theoretical
Fitting PDFs to Data (2) Conceptual model of data Why do you believe that the distribution describes the data? Test the model against the data Outliers Multiple statistical populations
Fitting PDFs to Data (3) Criteria for assessing fit Limits Mean Standard deviation Minimum residual
Outliers Data transcription errors Measurement errors Unrepresentative samples Conceptual misunderstanding More complex model There must be a good reason to discard outliers!!
Complex Models Multiple statistical populations Contamination Mixtures Spatial variability Trend Geological variation Correlation Temporal variability Different analytical techniques